Denvr AI Services Docs
  • Welcome to Denvr AI Services Docs!
  • OVERVIEW
    • Getting started
      • Launch a virtual machine
      • Secure Shell (SSH): Best Practices
      • API Usage samples
      • Registration
    • Data centers
    • Shared responsibility model
    • Technical support
    • What's new!
  • PLATFORM
    • Dashboard
    • Applications
    • Virtual machines
    • Bare metal
    • Storage
    • Networking
    • User management
    • Billing
  • API Reference
    • Authentication
    • Clusters
    • Applications
    • Virtual machines
    • VPCs
    • Bare metal
  • Additional Information
    • FAQs
      • Desktop vs data center GPUs
      • Differences of bare metal and virtual machines
      • GPU monitoring
      • Using Github with SSH keys
      • Data persistence and recovery
      • Do you support Kubernetes?
      • Installing GPU drivers
      • What is the network bandwidth?
      • What ports are publicly accessible?
      • What is persistent local storage?
      • Adding DAS to /etc/fstab
      • Provisioning States
    • Policies
      • Terms of Service
      • Privacy Policy
      • Maintenance policy
Powered by GitBook
On this page
  • JupyterLab NVDashboard
  • Installation
  • Linux command-line
  • Promethesus exporter
  1. Additional Information
  2. FAQs

GPU monitoring

Monitoring GPU performance is critical to ensure maximum use of the hardware, and to identify bottlenecks and issues.

PreviousDifferences of bare metal and virtual machinesNextUsing Github with SSH keys

Last updated 1 year ago

JupyterLab NVDashboard

NVDashboard is a JupyterLab extension for displaying NVIDIA GPU usage dashboards. It enables users to visualize system hardware metrics within the same interactive environment they use for development and data analysis.

Supported metrics include:

  • GPU compute utilization

  • GPU memory consumption

  • PCIe throughput

  • NvLink throughput

Installation

Select Terminal from JuptyterLab

Install the package from PyPI

pip install jupyterlab_nvdashboard

Restart the JupyterLab application from the Denvr Cloud Dashboard to enable the package.

Read more:

Linux command-line

NVIDIA drivers are preinstalled with the nvidia-smi command which can be used to find performance metrics of the GPUs. This example shows a single A100 MIG 10 GB instance.

You can get a continuous output of this view by running:

nvidia-smi -l

Another technique is to use the watch command to run nvidia-smi every X seconds, clearing the screen each time before displaying new output:

watch -n 5 nvidia-smi

The nvidia-smi dmon command can also stream output in tabular format which is more easily used for logging and automated monitoring.

The Linux man page has additional information how to configure nvidia-smi dmon to select specific output columns and to provide CSV-style formatting.

Promethesus exporter

NVIDIA provides a tool called DGCM-Exporter which streams metrics into Promethesus. This code should be installed as a service by system administrators depending on your deployment preferences (Kubernetes, docker, local install).

$ docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.0-ubuntu22.04

You can verify metrics as follows:

$ curl localhost:9400/metrics

Read more:

https://pypi.org/project/jupyterlab-nvdashboard/
https://github.com/NVIDIA/dcgm-exporter