Technically Speaking with Chris Wright

Episódios Disponíveis

4 de 4

Inside distributed inference with llm-d ft. Carlos Costa
Scaling LLM inference for production isn't just about adding more machines, it demands new intelligence in the infrastructure itself. In this episode, we're joined by Carlos Costa, Distinguished Engineer at IBM Research, a leader in large-scale compute and a key figure in the llm-d project. We discuss how to move beyond single-server deployments and build the intelligent, AI-aware infrastructure needed to manage complex workloads efficiently. Carlos Costa shares insights from his deep background in HPC and distributed systems, including: • The evolution from traditional HPC and large-scale training to the unique challenges of distributed inference for massive models. • The origin story of the llm-d project, a collaborative, open-source effort to create a much-needed ""common AI stack"" and control plane for the entire community. • How llm-d extends Kubernetes with the specialization required for AI, enabling state-aware scheduling that standard Kubernetes wasn't designed for. • Key architectural innovations like the disaggregation of prefill and decode stages and support for wide parallelism to efficiently run complex Mixture of Experts (MOE) models. Tune in to discover how this collaborative, open-source approach is building the standardized, AI-aware infrastructure necessary to make massive AI models practical, efficient, and accessible for everyone.
--------
26:23
--------
26:23
Building more efficient AI with vLLM ft. Nick Hill
Explore what it takes to run massive language models efficiently with Red Hat's Senior Principal Software Engineer in AI Engineering, Nick Hill. In this episode, we go behind the headlines to uncover the systems-level engineering making AI practical, focusing on the pivotal challenge of inference optimization and the transformative power of the vLLM open-source project. Nick Hill shares his experiences working in AI including: • The evolution of AI optimization, from early handcrafted systems like IBM Watson to the complex demands of today's generative AI. • The critical role of open-source projects like vLLM in creating a common, efficient inference stack for diverse hardware platforms. • Key innovations like PagedAttention that solve GPU memory fragmentation and manage the KV cache for scalable, high-throughput performance. • How the open-source community is rapidly translating academic research into real-world, production-ready solutions for AI. Join us to explore the infrastructure and optimization strategies making large-scale AI a reality. This conversation is essential for any technologist, engineer, or leader who wants to understand the how and why of AI performance. You’ll come away with a new appreciation for the clever, systems-level work required to build a truly scalable and open AI future.
--------
20:52
--------
20:52
Scaling AI inference with open source ft. Brian Stevens
Explore the future of enterprise AI with Red Hat's SVP and AI CTO, Brian Stevens. In this episode, we delve into how AI is being practically reimagined for real-world business environments, focusing on the pivotal shift to production-quality inference at scale and the transformative power of open source. Brian Stevens shares his expertise and unique perspective on: • The evolution of AI from experimental stages to essential, production-ready enterprise solutions. • Key lessons from the early days of enterprise Linux and their application to today’s AI inference challenges. • The critical role of projects like vLLM in optimizing AI models and creating a common, efficient inference stack for diverse hardware. • Innovations in GPU-based inference and distributed systems (like KV cache) that enable AI scalability. Tune in for a deep dive into the infrastructure and strategies making enterprise AI a reality. Whether you're a seasoned technologist, an AI practitioner, or a leader charting your company's AI journey, this discussion will provide valuable insights into building an accessible, efficient, and powerful AI future with open source.
--------
29:39
--------
29:39
Technically Speaking w/ Chris Wright: Deeper tech, more insights
Ready to go deeper into the ever-evolving landscape of technology? Technically Speaking, hosted by Red Hat CTO and SVP of Global Engineering Chris Wright, is back and reimagined to guide you with more depth and candor. This series cuts through the noise, offering insightful, casual, and now even more in-depth conversations with leading experts from across the globe. Each discussion delves further into new and emerging technologies, helping you understand not just the 'what,' but the 'why' and 'how' these advancements will impact long-term strategic developments for your company and your career. From AI and open source innovation to cloud computing and beyond, Chris Wright and his guests humanize technology, providing an unparalleled insider’s look at what’s next with enhanced detail and open discussion. The revamped ""Technically Speaking with Chris Wright"" champions innovation and thought leadership, blending even deeper-dive discussions with updates on the latest tech news. Tune in for richer insights on complex topics, explore varied perspectives with greater nuance, and equip yourself to shape the future of technology. Discover how to turn today's emerging tech into tomorrow's strategic advantage.
--------
0:43
--------
0:43

Mais podcasts de Negócios

Podcasts em tendência em Negócios

Sobre Technically Speaking with Chris Wright

Struggling to keep pace with the ever-changing world of technology? For experienced tech professionals, making sense of this complexity to find real strategic advantages is key. This series offers a clear path, featuring insightful, casual conversations with leading global experts, innovators, and key voices from Red Hat, all cutting through the hype. Drawing from Red Hat's deep expertise in open source and enterprise innovation, each discussion delves into new and emerging technologies-- from artificial intelligence and the future of cloud computing to cybersecurity, data management, and beyond. The focus is on understanding not just the 'what,' but the important 'why' and 'how': exploring how these advancements can shape long-term strategic developments for your organization and your career. Gain an insider’s perspective that humanizes complex topics, helping you anticipate what’s next and make informed decisions. Equip yourself with the knowledge to turn today's emerging tech into valuable, practical strategies and apply innovative thinking in your work. Tune in for forward-looking discussions that connect the dots between cutting-edge technology and real-world application, leveraging a rich understanding of the enterprise landscape. Learn to navigate the future of tech with confidence.

Site de podcast

Negócios Tecnologia