Differences of bare metal and virtual machines
Bare metal
Bare metal clusters provide direct access to physical hardware for greater control and customization of the environment. This includes but is not limited to:
physical CPUs, not virtualized CPUs
no hypervisor overhead which requires host CPU and memory resources
ability to change symmetric multi threading (SMT) and NUMA BIOS settings
full access to local NMVe block devices to configure filesystems and RAID
enhanced security by removing the hypervisor layer
Denvr Cloud has limited observability into bare metal hosts which which requires additional responsibility of the tenant to monitor and report system issues for analysis.
Virtual machines
Virtual machines provide access to GPUs and compute resources with a minor overhead required for the hypervisor. The primary benefits of virtualization, even for full nodes, are:
choice of operating system images to use during provisioning
1-5 minutes to boot the instance (depends primarily on number of vCPUs assigned)
ability to resize allocated resources (CPU, memory disks) as requirements change
less likelihood of over-provisioning resources versus large bare metal hosts
usage of snapshots for backups and starting new instances
self-service management via Denvr Cloud console and APIs
GPU processing and InfiniBand/RoCE fabrics are not impacted by virtualization as the devices are passed-through to the guest O/S. This provides the same level of performance, control, and isolation benefits as a bare metal host.
Feature comparison
Performance
Overhead
✅ None
Hypervisor requires CPU and memory resources
Consistency
✅ High consistency in performance
Potential variability in performance
Security
Tenant isolation
✔️ Single tenant only
✔️ Single or multi-tenant
Multi-user access
May require additional software to manage user isolation and Quality of Service
✅ Quality of service is enforced
Security risk
✅ Lowest risk of cross-tenant impact
Higher risk due to shared infrastructure. Virtualization is a mature technology in use since the early 2000's.
Data privacy
✔️ Denvr has no mechanism to access the system, including for operational support
✔️ Denvr can monitor and maintain the system hardware instead of the user
Management
Node types
✔️ Full node only
✔️ Allows customization of machine resources including single-GPU instances
Scalability
Full node only
✅ Instance sizes can be changed dynamically without re-provisioning
Software and drivers
Denvr only installs the GPU and fabric drivers.
✅ Pre-configured images are available with required software dependencies.
Time to launch
Slower
✅ Fastest
Backup and restore
Difficult to backup the operating system for recovery
✅ Simple to snapshot, clone, copy, and duplicate machine images.
Last updated