# GPU monitoring

## JupyterLab NVDashboard

NVDashboard is a JupyterLab extension for displaying NVIDIA GPU usage dashboards.  It enables users to visualize system hardware metrics within the same interactive environment they use for development and data analysis.

Supported metrics include:

* GPU compute utilization
* GPU memory consumption
* PCIe throughput
* NvLink throughput

### Installation

Select Terminal from JuptyterLab

<div align="center"><figure><img src="/files/O7gA5ixcbdWDyuZjQry4" alt="" width="375"><figcaption></figcaption></figure></div>

Install the package from PyPI

```
pip install jupyterlab_nvdashboard
```

Restart the JupyterLab application from the Denvr AI Cloud Dashboard to enable the package.

Read more:

* <https://pypi.org/project/jupyterlab-nvdashboard/>

## Linux command-line

NVIDIA drivers are preinstalled with the `nvidia-smi` command which can be used to find performance metrics of the GPUs.  This example shows a single A100 MIG 10 GB instance.

<div align="center"><figure><img src="/files/ACc3C5R02YgXRRKnaAvL" alt="" width="375"><figcaption></figcaption></figure></div>

You can get a continuous output of this view by running:

```
nvidia-smi -l
```

Another technique is to use the `watch` command to run `nvidia-smi` every X seconds, clearing the screen each time before displaying new output:

```
watch -n 5 nvidia-smi
```

The `nvidia-smi dmon` command can also stream output in tabular format which is more easily used for logging and automated monitoring.

<figure><img src="/files/vFTXjb1TKAHI5W4b1QUD" alt="" width="375"><figcaption></figcaption></figure>

The Linux man page has additional information how to configure `nvidia-smi dmon` to select specific output columns and to provide CSV-style formatting.

<figure><img src="/files/bnptjoqLilsDY9AccuEL" alt="" width="563"><figcaption></figcaption></figure>

## Promethesus exporter

NVIDIA provides a tool called DGCM-Exporter which streams metrics into Promethesus.  This code should be installed as a service by system administrators depending on your deployment preferences (Kubernetes, docker, local install).

```
$ docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.0-ubuntu22.04
```

You can verify metrics as follows:

```
$ curl localhost:9400/metrics
```

Read more:

* <https://github.com/NVIDIA/dcgm-exporter>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.denvrdata.com/docs/additional-information/faqs/gpu-monitoring.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
