Background Knowledge of GPU Sharing Modes in Volcano
Volcano supports two GPU sharing modes for virtual GPU (vGPU) scheduling:
1. HAMI-core (Software-based vGPU)
Description: Leverages VCUDA, a CUDA API hijacking technique to enforce GPU core and memory usage limits, enabling software-level virtual GPU slicing.
Use case: Ideal for environments requiring fine-grained GPU sharing. Compatible with all GPU types.
2. Dynamic MIG (Hardware-level GPU Slicing)
Description: Utilizes NVIDIA’s MIG (Multi-Instance GPU) technology to partition a physical GPU into isolated instances with hardware-level performance guarantees.
Use case: Best for performance-sensitive workloads. Requires MIG-capable GPUs (e.g., A100, H100).
GPU Sharing mode is a node configuration. Volcano supports heterogeneous cluster(i.e a part of node uses HAMi-core while another part uses dynamic MIG), See volcano-vgpu-device-plugin for configuration and details.
Installation
To enable vGPU scheduling, the following components must be set up based on the selected mode:
Common Requirements
Prerequisites:
- NVIDIA driver > 440
- nvidia-docker > 2.0
- Docker configured with
nvidiaas the default runtime - Kubernetes >= 1.16
- Volcano >= 1.9
Install Volcano:
- Follow instructions in Volcano Installer Guide
Install Device Plugin:
- Deploy
volcano-vgpu-device-plugin
- Deploy
Note: the vgpu device plugin yaml also includes the Node GPU mode and the MIG geometry configuration. Please refer to the vgpu device plugin config.
Validate Setup: Ensure node allocatable resources include:
volcano.sh/vgpu-memory: "89424" volcano.sh/vgpu-number: "8"Scheduler Config Update:
kind: ConfigMap apiVersion: v1 metadata: name: volcano-scheduler-configmap namespace: volcano-system data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: predicates - name: deviceshare arguments: deviceshare.VGPUEnable: true # enable vgpu plugin deviceshare.SchedulePolicy: binpack # scheduling policy. binpack / spread
Check with:
kubectl get node {node-name} -o yaml
HAMI-core Usage
Pod Spec:
metadata: name: hami-pod annotations: volcano.sh/vgpu-mode: "hami-core" spec: schedulerName: volcano containers: - name: cuda-container image: nvidia/cuda:9.0-devel resources: limits: volcano.sh/vgpu-number: 1 # requesting 1 gpu cards volcano.sh/vgpu-cores: 50 # (optional)each vGPU uses 50% volcano.sh/vgpu-memory: 3000 # (optional)each vGPU uses 3G GPU memory
Dynamic MIG Usage
- Enable MIG Mode:
If you need to use MIG (Multi-Instance GPU), you must run the following command on the GPU node.
sudo nvidia-smi -mig 1
Geometry Config (Optional): The volcano-vgpu-device-plugin automatically generates an initial MIG configuration, which is stored in the
volcano-vgpu-device-configConfigMap under thekube-systemnamespace. You can customize this configuration as needed. For more details, refer to the vgpu device plugin yaml.Pod Spec with MIG Annotation:
metadata: name: mig-pod annotations: volcano.sh/vgpu-mode: "mig" spec: schedulerName: volcano containers: - name: cuda-container image: nvidia/cuda:9.0-devel resources: limits: volcano.sh/vgpu-number: 1 volcano.sh/vgpu-memory: 3000
Note: Actual memory allocated depends on best-fit MIG slice (e.g., request 3GB → 5GB slice used).
Scheduler Mode Selection
Explicit Mode:
- Use annotation
volcano.sh/vgpu-modeto force hami-core or MIG mode. - If annotation is absent, scheduler selects mode based on resource fit and policy.
- Use annotation
Scheduling Policy:
- Modes like
binpackorspreadinfluence node selection.
- Modes like
Summary Table
| Mode | Isolation | MIG GPU Required | Annotation | Core/Memory Control | Recommended For |
|---|---|---|---|---|---|
| HAMI-core | Software (VCUDA) | No | No | Yes | General workloads |
| Dynamic MIG | Hardware | Yes | Yes | MIG-controlled | Performance-sensitive jobs |
Monitoring
Scheduler Metrics:
curl http://<volcano-scheduler-ip>:8080/metricsDevice Plugin Metrics:
curl http://<plugin-pod-ip>:9394/metrics
Metrics include GPU utilization, pod memory usage, and limits.
Issues and Contributions
- File bugs: Volcano Issues
- Contribute: Pull Requests Guide