Volcano completes security audit

Volcano is excited to announce the completion of our CNCF-funded security audit carried out by Ada Logics and facilitated by OSTIF in collaboration with the Volcano maintainers. The audit was scoped to cover the Volcano source code, supply-chain risks and fuzzing. The auditing team identified 10 security issues which the Volcano security team has fixed with the completion of the audit.

Volcano has addressed several infrastructure-level security issues by making targeted configuration changes that reduce risk and improve the default security posture of its default deployment. Below is a breakdown of each issue, the associated risks, and how Volcano resolved them, along with the resulting security improvements.

One issue involved several Volcano components running with root privileges by default. Containers running as root pose an increased security risk in that if compromised, an attacker gains access to capabilities they can use to escalate their privileges. Volcano fixed this by configuring all components - including the scheduler, admission controller, controllers, and dashboard - to run as non-root by default. This change limits the scope of what an attacker can do inside a container and helps contain breaches more effectively.

Another issue was the absence of seccomp profiles across Volcano’s workloads. Without seccomp, containers can invoke any Linux system call which increases the attack surface for kernel-level attacks and container escapes. Volcano addressed this by adding seccomp profiles, specifically using RuntimeDefault, which restricts containers to a safe subset of system calls. This reduces the kernel’s exposure and strengthens runtime isolation.

Volcano also lacked SELinux in its containers. SELinux manages access control at the kernel level and limits how processes can interact with files, system resources, and other processes. Volcano added SELinux to all its pods and containers.

In addition, Volcano had previously assigned containers with unnecessary Linux capabilities—fine-grained permissions that determine what a containerized process can do. For example, capabilities like CAP_NET_ADMIN or CAP_SYS_ADMIN grant significant power and are often unnecessary for typical application logic. Volcano mitigated this risk by removing non-essential capabilities using a “drop all” approach and only adding back specific permissions if needed. This reduces the attack surface and enforces the principle of least privilege.

Prior to the audit, Volcano allowed containers to escalate privileges during execution, which could permit non-privileged processes to gain additional privileges. Such privilege escalation increases the risk of bypassing container security controls. Volcano resolved this by setting allowPrivilegeEscalation: false in its containers and pods ensuring that processes run only with the privileges they were initially assigned.

These changes help contain potential attacks, reduce the avenues for privilege escalation or container breakout, and enhance the overall resilience of the system in multi-tenant and production environments.

On the application side, the auditors identified 5 issues, of which the most interesting was an issue where an attacker who had compromised an elastic service or an extender plugin in the cluster could cause denial of service of the Volcano scheduler. This issue was assigned CVE-2025-32777 of HIGH severity.

Fuzzing

During the audit, Ada Logics integrated volcano into Googles OSS-Fuzz project with two initial fuzz tests. OSS-Fuzz is an open source project that other critical open source projects can integrate into. Google runs integrated projects’ fuzzers on vast amounts of compute and reports any findings to the projects team via email. OSS-Fuzz’s reports contain information such as stack traces, steps to reproduce, which fuzz harness found the issue and more. Periodically, OSS-Fuzz reproduces the issue to assert that it still exists. If it can’t reproduce it, OSS-Fuzz automatically marks the issue fixed.

Getting involved in Volcano

Volcano is the industry’s first cloud-native batch computing engine and the sole batch computing project within the CNCF. It operates as a Kubernetes-native batch scheduling system, enhancing the standard kube-scheduler. Volcano provides comprehensive features to manage and optimize diverse batch and elastic workloads, including AI/ML/DL, Bioinformatics/Genomics, and other “Big Data” applications. It offers robust integration with frameworks such as Spark, Flink, Ray, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, and KubeGene. Drawing from over fifteen years of experience in high-performance workload operations, Volcano combines proven practices and innovative concepts to deliver a powerful and flexible scheduling solution.

We encourage you to join our community and contribute to Volcano’s development. Your participation is valuable, whether you’re asking questions, sharing experiences, or contributing code.

You can find the audit report here. We would like to thank all involved parties in the audit for their great work.