GPU monitoring
Monitoring GPU performance is critical to ensure maximum use of the hardware, and to identify bottlenecks and issues.
Last updated
Monitoring GPU performance is critical to ensure maximum use of the hardware, and to identify bottlenecks and issues.
Last updated
NVDashboard is a JupyterLab extension for displaying NVIDIA GPU usage dashboards. It enables users to visualize system hardware metrics within the same interactive environment they use for development and data analysis.
Supported metrics include:
GPU compute utilization
GPU memory consumption
PCIe throughput
NvLink throughput
Select Terminal from JuptyterLab
Install the package from PyPI
Restart the JupyterLab application from the Denvr Cloud Dashboard to enable the package.
Read more:
NVIDIA drivers are preinstalled with the nvidia-smi
command which can be used to find performance metrics of the GPUs. This example shows a single A100 MIG 10 GB instance.
You can get a continuous output of this view by running:
Another technique is to use the watch
command to run nvidia-smi
every X seconds, clearing the screen each time before displaying new output:
The nvidia-smi dmon
command can also stream output in tabular format which is more easily used for logging and automated monitoring.
The Linux man page has additional information how to configure nvidia-smi dmon
to select specific output columns and to provide CSV-style formatting.
NVIDIA provides a tool called DGCM-Exporter which streams metrics into Promethesus. This code should be installed as a service by system administrators depending on your deployment preferences (Kubernetes, docker, local install).
You can verify metrics as follows:
Read more: