# GPU monitoring

## JupyterLab NVDashboard

NVDashboard is a JupyterLab extension for displaying NVIDIA GPU usage dashboards.  It enables users to visualize system hardware metrics within the same interactive environment they use for development and data analysis.

Supported metrics include:

* GPU compute utilization
* GPU memory consumption
* PCIe throughput
* NvLink throughput

### Installation

Select Terminal from JuptyterLab

<div align="center"><figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2FSyLxAydqcFDS5TzqJf4W%2FJupyterlab-focus-terminal.png?alt=media&#x26;token=85583cb0-3b5c-4dbf-ab87-eecb2a8c3aab" alt="" width="375"><figcaption></figcaption></figure></div>

Install the package from PyPI

```
pip install jupyterlab_nvdashboard
```

Restart the JupyterLab application from the Denvr AI Cloud Dashboard to enable the package.

Read more:

* <https://pypi.org/project/jupyterlab-nvdashboard/>

## Linux command-line

NVIDIA drivers are preinstalled with the `nvidia-smi` command which can be used to find performance metrics of the GPUs.  This example shows a single A100 MIG 10 GB instance.

<div align="center"><figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2FoXyo7fSRg94kHJdedA5m%2FScreen%20Shot%202022-09-24%20at%2012.21.30%20PM.png?alt=media&#x26;token=cbf08e81-605a-4950-91dc-253c808dab83" alt="" width="375"><figcaption></figcaption></figure></div>

You can get a continuous output of this view by running:

```
nvidia-smi -l
```

Another technique is to use the `watch` command to run `nvidia-smi` every X seconds, clearing the screen each time before displaying new output:

```
watch -n 5 nvidia-smi
```

The `nvidia-smi dmon` command can also stream output in tabular format which is more easily used for logging and automated monitoring.

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2FgrNTxa5m99IxnplJmMwK%2FScreen%20Shot%202022-09-24%20at%2012.23.51%20PM.png?alt=media&#x26;token=713fcd08-3a14-41d6-a3f4-8e0c085c39be" alt="" width="375"><figcaption></figcaption></figure>

The Linux man page has additional information how to configure `nvidia-smi dmon` to select specific output columns and to provide CSV-style formatting.

<figure><img src="https://1008771031-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo84Tuz16JeqaVoFstgKj%2Fuploads%2F2KMNzcoBCzVoBtcxHpYA%2FScreen%20Shot%202022-09-24%20at%2012.41.42%20PM.png?alt=media&#x26;token=f8502ffd-36dc-480d-87fc-36dd4b445f26" alt="" width="563"><figcaption></figcaption></figure>

## Promethesus exporter

NVIDIA provides a tool called DGCM-Exporter which streams metrics into Promethesus.  This code should be installed as a service by system administrators depending on your deployment preferences (Kubernetes, docker, local install).

```
$ docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.0-ubuntu22.04
```

You can verify metrics as follows:

```
$ curl localhost:9400/metrics
```

Read more:

* <https://github.com/NVIDIA/dcgm-exporter>
