Powered by RND
PodcastsCiênciaDeep Papers
Ouça Deep Papers na aplicação
Ouça Deep Papers na aplicação
(1 200)(249 324)
Guardar rádio
Despertar
Sleeptimer

Deep Papers

Podcast Deep Papers
Arize AI
Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode p...

Episódios Disponíveis

5 de 40
  • LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods
    We discuss a major survey of work and research on LLM-as-Judge from the last few years. "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods" systematically examines the LLMs-as-Judge framework across five dimensions: functionality, methodology, applications, meta-evaluation, and limitations. This survey gives us a birds eye view of the advantages, limitations and methods for evaluating its effectiveness.  Read a breakdown on our blog: https://arize.com/blog/llm-as-judge-survey-paper/Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    28:57
  • Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies
    LLMs have revolutionized natural language processing, showcasing remarkable versatility and capabilities. But individual LLMs often exhibit distinct strengths and weaknesses, influenced by differences in their training corpora. This diversity poses a challenge: how can we maximize the efficiency and utility of LLMs?A new paper, "Merge, Ensemble, and Cooperate: A Survey on Collaborative Strategies in the Era of Large Language Models," highlights collaborative strategies to address this challenge. In this week's episode, we summarize key insights from this paper and discuss practical implications of LLM collaboration strategies across three main approaches: merging, ensemble, and cooperation. We also review some new open source models we're excited about. Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    28:47
  • Agent-as-a-Judge: Evaluate Agents with Agents
    This week, we break down the “Agent-as-a-Judge” framework—a new agent evaluation paradigm that’s kind of like getting robots to grade each other’s homework. Where typical evaluation methods focus solely on outcomes or demand extensive manual work, this approach uses agent systems to evaluate agent systems, offering intermediate feedback throughout the task-solving process. With the power to unlock scalable self-improvement, Agent-as-a-Judge could redefine how we measure and enhance agent performance. Let's get into it! Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    24:54
  • Introduction to OpenAI's Realtime API
    We break down OpenAI’s realtime API. Learn how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or enhancing real-time collaboration, we walk through the API’s capabilities, potential use cases, and best practices for implementation. Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    29:56
  • Swarm: OpenAI's Experimental Approach to Multi-Agent Systems
    As multi-agent systems grow in importance for fields ranging from customer support to autonomous decision-making, OpenAI has introduced Swarm, an experimental framework that simplifies the process of building and managing these systems. Swarm, a lightweight Python library, is designed for educational purposes, stripping away complex abstractions to reveal the foundational concepts of multi-agent architectures. In this podcast, we explore Swarm’s design, its practical applications, and how it stacks up against other frameworks. Whether you’re new to multi-agent systems or looking to deepen your understanding, Swarm offers a straightforward, hands-on way to get started.Read a Summary on the BlogWatch on YouTubeSign up for Upcoming Paper ReadingsLearn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    46:46

Mais podcasts de Ciência

Sobre Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 
Site de podcast

Ouça Deep Papers, Naruhodo e muitos outros podcasts de todo o mundo com o aplicativo o radio.net

Obtenha o aplicativo gratuito radio.net

  • Guardar rádios e podcasts favoritos
  • Transmissão via Wi-Fi ou Bluetooth
  • Carplay & Android Audo compatìvel
  • E ainda mais funções
Aplicações
Social
v7.1.1 | © 2007-2024 radio.de GmbH
Generated: 12/26/2024 - 8:58:42 PM