Overview
This page describes how to enable and use the SchedulingGatesQueueAdmission feature to prevent cluster autoscalers (such as Cluster Autoscaler or Karpenter) from triggering unnecessary scale-ups when pods are waiting for Volcano queue capacity.
Problem
Volcano marks pods as Unschedulable for any allocation failure, whether it’s due to insufficient cluster resources (where autoscaling is appropriate) or queue capacity limits (where autoscaling is not needed). Cluster autoscalers cannot distinguish between these scenarios, causing unnecessary node scale-ups.
The problem is described in detail in the design document.
Solution
This feature uses Kubernetes schedulingGates to hold pods until the queue has capacity. While gated, pods are invisible to autoscalers. The gate is removed only after the queue capacity check passes and if the pod then cannot be scheduled due to missing nodes, it is marked as Unschedulable, allowing autoscalers to respond correctly.
Prerequisites
- Volcano v1.15+ with the
SchedulingGatesQueueAdmissionfeature gate enabled. - The
capacityplugin configured in the scheduler (the feature is implemented in the capacity plugin and will soon be integrated inproportionas well).
1. Enable the Feature Gate
The feature is Alpha and disabled by default. Enable it on both the scheduler and webhook-manager.
Using Helm
helm install volcano volcano/volcano --namespace volcano-system --create-namespace \
--set custom.scheduler_feature_gates="SchedulingGatesQueueAdmission=true" \
--set custom.admission_feature_gates="SchedulingGatesQueueAdmission=true"
Using kubectl apply
Add the following flag to both the volcano-scheduler and volcano-admission deployments:
--feature-gates=SchedulingGatesQueueAdmission=true
Optionally, configure the number of async gate removal workers (default: 5):
--gate-removal-worker-num=10
These workers asynchronously process gate removals — each worker picks up a pod whose queue capacity check has passed and removes its scheduling gate, allowing the pod to proceed to scheduling. Increasing this number can help throughput when many pods are being ungated concurrently.
2. Configure the Capacity Plugin
Ensure the capacity plugin is enabled in your scheduler configuration. The reserved resource tracking that prevents race conditions between gate removal and pod allocation is implemented in this plugin.
Example scheduler configuration:
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
- plugins:
- name: predicates
- name: capacity
- name: nodeorder
3. Opt-in Pods
The feature is opt-in per pod, and one can start using it by adding the following annotation to pods that should use gate-controlled queue admission:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
annotations:
# Opt-in annotation
scheduling.volcano.sh/queue-allocation-gate: "true"
spec:
schedulerName: volcano
containers:
- name: worker
image: nginx
resources:
requests:
cpu: "1"
memory: "1Gi"
When this pod is created:
- The Volcano webhook injects a
scheduling.volcano.sh/queue-allocation-gatescheduling gate. - The pod stays gated (invisible to autoscalers) until the queue has capacity.
- Once capacity is available, the scheduler removes the gate.
- If the pod can be placed on a node, it gets scheduled normally.
- If no node matches (e.g., needs a specific node type), it gets marked
Unschedulable, correctly triggering the autoscaler.
4. Verify the Feature is Working
After creating an opted-in pod, verify the gate was injected by the mutation webhook:
kubectl get pod my-pod -o jsonpath='{.spec.schedulingGates}'
Expected output (while waiting for queue capacity):
[{"name":"scheduling.volcano.sh/queue-allocation-gate"}]
Once the queue has capacity and the scheduler removes the gate, the field will be empty:
kubectl get pod my-pod -o jsonpath='{.spec.schedulingGates}'
# empty output
Interaction with other Scheduling Gates
If a pod has additional scheduling gates from other controllers (e.g., example.com/my-gate), Volcano will not remove its gate until the pod has only the Volcano gate remaining. This ensures Volcano does not interfere with other gate controllers and avoids reserving queue capacity for pods that are still blocked by external dependencies.
Limitations
- Once a pod’s gate is removed, it reserves queue capacity until it is scheduled or deleted. If the pod remains unschedulable (e.g., waiting for the autoscaler to add nodes), it continues to hold queue capacity, potentially blocking other pods. Additionally, the feature currently does not implement a timeout for reserved capacity. Operators should be aware that pods that have been ungated but remain unschedulable can hold queue capacity indefinitely.
- The feature is only implemented in the
capacityplugin. Users relying on theproportionplugin for queue resource management will still face false autoscaler scale-ups, as the scheduling gates mechanism is not yet integrated withproportion. Tracking issue: #5271.