Powered by RND
PodcastsTechnologyLarge Language Model (LLM) Talk

Large Language Model (LLM) Talk

AI-Talk
Large Language Model (LLM) Talk
Latest episode

Available Episodes

5 of 61
  • Why We Think
    The "Why We Think" from Lilian Weng, examines improving language models by allocating more computation at test time, drawing an analogy to human "slow thinking" or System 2. By treating computation as a resource, the aim is to design systems that can utilize this test-time effort effectively for better performance. Key approaches involve generating intermediate steps like Chain-of-Thought, employing decoding methods such as parallel sampling and sequential revision, using reinforcement learning to enhance reasoning, enabling external tool use, and implementing adaptive computation time. This allows models to spend more resources on analysis, similar to human deliberation, to achieve improved results.
    --------  
    14:20
  • Deep Research
    Deep Research is an autonomous research agent built into ChatGPT. It performs multi-step online research over several minutes, behaving like a human researcher by searching, reading, analyzing, and synthesizing information from multiple sources. It produces detailed, cited reports. Unlike standard ChatGPT's single-step responses, Deep Research uses an agent architecture orchestrating specialized reasoning models (like o3-mini) and generalist models (like GPT-4).
    --------  
    11:35
  • vLLM
    vLLM is a high-throughput serving system for large language models. It addresses inefficient KV cache memory management in existing systems caused by fragmentation and lack of sharing, which limits batch size. vLLM uses PagedAttention, inspired by OS paging, to manage KV cache in non-contiguous blocks. This minimizes memory waste and enables flexible sharing, allowing vLLM to batch significantly more requests. As a result, vLLM achieves 2-4x higher throughput compared to state-of-the-art systems like FasterTransformer and Orca.
    --------  
    13:12
  • Qwen3: Thinking Deeper, Acting Faster
    Qwen3 models introduce both Mixture-of-Experts (MoE) and dense architectures. They utilize hybrid thinking modes, allowing users to balance response speed and reasoning depth for tasks, controllable via parameters or tags. Developed through a multi-stage post-training pipeline, Qwen3 is trained on a significantly expanded dataset of approximately 36 trillion tokens across 119 languages. This enhances its multilingual support for global applications. The models also feature improved agentic capabilities, notably excelling in tool calling, which increases their utility for complex, interactive tasks.
    --------  
    13:15
  • RAGEN: train and evaluate LLM agents using multi-turn RL
    RAGEN is a modular system for training and evaluating LLM agents using multi-turn reinforcement learning. Built on the StarPO framework, it implements the full training loop including rollout generation, reward assignment, and trajectory optimization. RAGEN serves as research infrastructure to analyze LLM agent training dynamics, focusing on challenges like stability, generalization, and the emergence of reasoning in interactive environments.
    --------  
    11:56

More Technology podcasts

About Large Language Model (LLM) Talk

AI Explained breaks down the world of AI in just 10 minutes. Get quick, clear insights into AI concepts and innovations, without any complicated math or jargon. Perfect for your commute or spare time, this podcast makes understanding AI easy, engaging, and fun—whether you're a beginner or tech enthusiast.
Podcast website

Listen to Large Language Model (LLM) Talk, Darknet Diaries and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v7.18.2 | © 2007-2025 radio.de GmbH
Generated: 5/24/2025 - 6:49:00 AM