2 posts tagged with "inference"

Introducing Kthena: Redefining LLM Inference for the Cloud-Native Era

January 6, 2026 · 9 min read

Today, the Volcano community is proud to announce the launch of Kthena, a new sub-project designed for global developers and MLOps engineers.

Kthena is a cloud-native, high-performance system for LLM inference routing, orchestration, and scheduling, tailored specifically for Kubernetes. Engineered to address the complexity of serving LLMs at production scale, Kthena delivers granular control and enhanced flexibility. Through features like topology-aware scheduling, KV Cache-aware routing, and Prefill-Decode (PD) disaggregation, it significantly improves GPU/NPU utilization and throughput while minimizing latency.

As a sub-project of Volcano, Kthena extends Volcano’s capabilities beyond AI training, creating a unified, end-to-end solution for the entire AI lifecycle.

How volcano boosts distributed training and inference performance

April 1, 2025 · 3 min read

Volcano

The Growing Demand for LLM Workloads and Associated Challenges

The increasing adoption of large language models (LLMs) has led to heightened demand for efficient AI training and inference workloads. As model size and complexity grow, distributed training and inference have become essential. However, this expansion introduces challenges in network communication, resource allocation, and fault recovery within large-scale distributed environments. These issues often create performance bottlenecks that hinder scalability.

The Growing Demand for LLM Workloads and Associated Challenges​

The Growing Demand for LLM Workloads and Associated Challenges