Running AI models on Kubernetes has historically been inconsistent, with workloads behaving differently across cloud providers due to variations in GPUs, networking, and autoscaling. As organizations move AI from experimentation to production, standardization has become critical. In this episode of The New Stack Makers, Jonathan Bryce, Executive Director of The Cloud Native Computing Foundation shared that the Foundation’s Kubernetes AI conformance program aims to solve this by ensuring portability, predictability, and production readiness for AI workloads across environments.
The initiative reflects a broader industry shift: AI is moving from training-heavy workloads to inference at scale, with inference expected to dominate compute usage by the end of the decade. Unlike batch-based training, inference requires real-time, always-on performance, making Kubernetes an attractive platform due to its elasticity, GPU-aware autoscaling, and observability.
The conformance program establishes baseline standards for handling accelerators like GPUs and TPUs, reducing vendor lock-in and simplifying deployment. Early adopters include major cloud providers and ecosystem players, while new projects like llm-d aim to bridge orchestration and inference. As requirements evolve, ongoing collaboration and recertification will ensure the standards stay aligned with real-world needs.
Learn more from The New Stack about the latest developments around The Cloud Native Computing Foundation’s Kubernetes AI conformance program:
CNCF: Kubernetes is ‘foundational’ infrastructure for AI
Kubernetes Gets an AI Conformance Program — and VMware Is Already On Board
Join our community of newsletter subscribers to stay on top of the news and at the top of your game.