PodcastsTechnologyLinear Digressions

Linear Digressions

Katie Malone
Linear Digressions
Latest episode

304 episodes

  • Linear Digressions

    ReAct and Tool Usage (The Agents Season, Episode 2)

    27/04/2026 | 23 mins.
    Before 2022, there was a wall between AI and the real world — models could reason impressively, but couldn't look anything up, run code, or check whether anything they said was actually true. This episode traces the moment that wall came down, through two landmark papers: ReAct, which showed what happens when you interleave reasoning and action in a loop, and Toolformer, which taught models to decide *for themselves* when to reach for a tool. Plus: what MCP actually is, and why a hobbyist project called Open Claw became the fastest-growing open source project in history.

    ---
    Website: https://lineardigressions.com
    Apple Podcasts: https://podcasts.apple.com/us/podcast/linear-digressions/id941219323
    Spotify: https://open.spotify.com/show/1JdkD0ZoZ52KjwdR0b1WoT
    Substack: https://substack.com/@lineardigressions
  • Linear Digressions

    What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1)

    20/04/2026 | 19 mins.
    AI agents are having a moment — and unpacking them properly takes more than a single conversation. This episode kicks off a dedicated multi-part season exploring AI agents from every angle, building up a complete picture piece by piece rather than skimming the surface. Think of it as a structured deep dive into one of the most talked-about (and most misunderstood) topics in machine learning right now. Buckle up — ten more episodes to go.

    ---
    Website: https://lineardigressions.com
    Apple Podcasts: https://podcasts.apple.com/us/podcast/linear-digressions/id941219323
    Spotify: https://open.spotify.com/show/1JdkD0ZoZ52KjwdR0b1WoT
    Substack: https://substack.com/@lineardigressions
  • Linear Digressions

    Unfaithful Chain of Thought

    13/04/2026 | 24 mins.
    What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations.

    Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in
    Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388

    Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025):
    https://www.anthropic.com/research/reasoning-models-dont-say-think
  • Linear Digressions

    Benchmark Bank Heist

    06/04/2026 | 12 mins.
    What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This episode digs into what actually happened, why it matters for how we measure AI progress, and what this very novel failure mode means for the already-tricky science of benchmarking language models.

    Links

    Anthropic's writeup on the BrowseComp reverse-engineering done by Claude Opus 4.6: https://www.anthropic.com/engineering/eval-awareness-browsecomp

    BrowseComp benchmark from OpenAI: https://openai.com/index/browsecomp/
  • Linear Digressions

    Benchmarking AI Models

    30/03/2026 | 29 mins.
    How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU, a 14,000-question multiple choice gauntlet spanning medicine, law, and philosophy, and SWE-bench, which throws real GitHub bugs at models to see if they can fix them. Along the way: Goodhart's Law, data contamination, canary strings, and why acing a test isn't always the same as being smart.

More Technology podcasts

About Linear Digressions

Demystifying AI for the intelligently curious
Podcast website

Listen to Linear Digressions, The AI Daily Brief: Artificial Intelligence News and Analysis and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v8.8.13| © 2007-2026 radio.de GmbH
Generated: 5/1/2026 - 3:20:39 AM