Vanishing Gradients | Ouvir podcast online grátis

Episódios Disponíveis

5 de 63

Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)
Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely. Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops. They talk through: - The implications of models that can "self-heal" and fix their own code - The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems. - Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews - Why Needle in a Haystack benchmarks often fail to predict real-world performance - How to build agent harnesses that turn model capabilities into product velocity - The shift from measuring latency to managing time-to-compute for reasoning tasks LINKS From Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline) Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot) Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM) Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav
--------
1:00:12
--------
1:00:12
Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs
Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution. In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy. They talk through: How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription. The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents. The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice. Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products. The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip. LINKS Randy on LinkedIn (https://www.zenml.io/llmops-database) Wyrd Studios (https://thewyrdstudios.com/) Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc) 🎓 Learn more: In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us!
--------
59:04
--------
59:04
Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production
Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI. Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products. We talk through: - Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos - The essential MLOps hygiene (tracing and continuous evals) that most teams skip - The optimal (and very low) limit for the number of tools an agent can reliably use - How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains - The principle of using simple Python/RegEx before resorting to costly LLM judges LINKS The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc) 🎓 Learn more: -This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us!
--------
28:04
--------
28:04
Episode 60: 10 Things I Hate About AI Evals with Hamel Husain
Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems. Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust. We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap. LINKS Hamel's website and blog (https://hamel.dev/) Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51) Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill) The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share) Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
--------
1:13:15
--------
1:13:15
Episode 59: Patterns and Anti-Patterns For Building with AI
John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production. LINKS: Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents) The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems) Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/) Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/) Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf) Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk) Arcturus Labs (https://arcturus-labs.com/) Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) 🎓 Learn more: Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
--------
47:37
--------
47:37

Mais podcasts de Tecnologia

Podcasts em tendência em Tecnologia

Sobre Vanishing Gradients

A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson. It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.

Site de podcast

Tecnologia

Ouça Vanishing Gradients, Giro do Loop e muitos outros podcasts de todo o mundo com o aplicativo o radio.net

Obtenha o aplicativo gratuito radio.net

Guardar rádios e podcasts favoritos
Transmissão via Wi-Fi ou Bluetooth
Carplay & Android Audo compatìvel
E ainda mais funções

Abrir a app

Obtenha o aplicativo gratuito radio.net

Guardar rádios e podcasts favoritos
Transmissão via Wi-Fi ou Bluetooth
Carplay & Android Audo compatìvel
E ainda mais funções

Vanishing Gradients

Leia o código,
baixe o aplicativo,
ouça.