Podcasts ScienceVanishing Gradients

Listen to this podcast in the app for free:

radio.net

Sleep timer

Save favourites

Download for free in the App Store

Vanishing Gradients

Hugo Bowne-Anderson

Science Technology

Latest episode

78 episodes

The Future of Agentic Data Science
25/05/2026 | 1h 4 mins.
So I think we’re really at a historical moment, and the opportunity is massive. Almost 15 years ago, we were promised that data science was going to be this incredible thing and create all this value for people. And I think nowadays it’s mostly viewed as a cost center in most companies. I think we can really now fulfill that original promise with agentic data science.
Thomas Wiecki, Co-creator of PyMC and Founder at PyMC Labs, joins Hugo to talk about how agentic data science is finally fulfilling the promise of Decision Intelligence.
We Discuss:
* Decision Engines: Transform data science from a cost center providing cryptic answers into a real-time decision intelligence hub delivering actionable outcomes;
* Tame the “Garden of Forking Paths”: Overcome human shortcuts by running parallel analyses to provide an honesty check, revealing the true robustness of business conclusions;
* Multiplayer Data Science: Foster organizational learning by moving agents into team chats, democratizing “what-if” questions and reducing context-switching friction;
* The Full Agentic Data Science Stack: Beyond harness and skills, the full stack includes orchestration for parallel analyses and a causal eval layer to measure actual outcome improvement;
* Agentic Dashboards: Move beyond static BI; use chat interfaces to inquire into models and generate real-time, custom visualizations for specific follow-up questions;
* Encode Professional Judgment as Skills: Elevate agent performance by encoding expert domain standards and high-fidelity workflows into specific Agent Skills, rather than relying on LLM pre-training;
* Ground Decisions in Generative Processes: Prevent hallucinations by forcing agents to model underlying physical or behavioral processes, providing a programmatic guardrail aligned with market realities;
* Scripted Causal-Bayesian Workflows: Their methodologically structured nature—from prior elicitation to posterior predictive checks—makes Causal-Bayesian workflows inherently automatable for agents;
* Iterative Autonomy via Skills: Achieve autonomy iteratively: verify workflows with human oversight, then encode verifiable parts as skills to hand off trusted tasks;
You can also find the full episode on Spotify, Apple Podcasts, and YouTube.
You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!

👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts.👈
LINKS
* Thomas Wiecki on LinkedIn
* PyMC Labs
* Open-Sourcing Decision Lab: Scaling AI Judgment in Data Science (PyMC Labs blog)
* Decision AI Discord
* Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results (Sage Journals)
* The Agent Harness Reading List
* Show Us Your Agent Skills (GitHub)
* Agentic Data Science course with Hugo, Thomas, and Luca (10% off with code ADSVG10)
* Upcoming Events on Luma
* Vanishing Gradients on YouTube
* Watch the podcast video on YouTube

👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts👈

Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Agent-Harness.ipynb*
20/05/2026 | 1h 19 mins.
One thing that I don’t like about Claude is that you get into this weird mental state: oh, I think I trust the model. Let’s do the slot machine. Hit click, which puts you in an inactive mode of thinking. Maybe it’s better to use a worse model….
Vincent Warmerdam, senior data professional and prolific open-source maintainer (some packages with over a million downloads), now Engineer at marimo, joins Hugo to talk about how the Python notebook is evolving from a static scratchpad into a working agent harness, and what it takes to stay in the loop as a developer when agents are writing most of the code. This episode was originally a livestream Q&A with the Vanishing Gradients audience.
We Discuss:
* Shared Notebook Canvas: Notebooks act as a shared memory space where agents and humans co-exist, enabling real-time visual feedback by direct manipulation of global state and UI elements;
* Speed-of-Thought Models: Faster, open-weight models like Kimi K2 enhance exploratory flow by keeping humans more alert to the code, unlike frontier models that can induce passive thinking;
* Pi as a Harness: Vincent favors an agent harness where agents extend themselves rather than reach for MCP, and where hooks can rigidly constrain which files an agent is allowed to read or touch;
* Why PRDs Don’t Fit Notebooks: Notebook work is fundamentally exploratory, so the discipline that works for shipping web apps does not transfer cleanly; the one exception is reproducing a paper;
* Interactive Code Review: Interactive UIs (e.g., dragging integers) transform code into a physical object, incentivizing developers to actively review and understand agent logic;
* Modular “Lego” Components: Provide agents with high-level, well-tested components (”Lego” code) instead of raw boilerplate, creating systems that are easier to debug and modulate;
* Algorithm-Driven Visualization: Let the algorithm dictate the visualization needed, rather than choosing visualizations first, revealing the most interesting structures within the data;
* Don’t Outsource the Thinking: Pen and paper architectural planning, walks away from the keyboard, and protecting calm remain the most effective ways to keep producing good ideas in the age of AI-generated software.
* Agent Auto-Healing: A marimo-specific linter solved 60% of agent errors overnight by letting agents diagnose and fix their own “slop” without complex prompt engineering;
* Incremental Generation: Avoid monolithic LLM outputs; generate code one to two cells at a time to prevent laziness and ensure human oversight and learning;
Vincent closes on the idea that calm, not the latest frontier model, is the most underrated tool for building well, and that we should study LLM output the way chess players studied the engines that beat them.
Vincent gives several live demos toward the end of the episode. He describes them well enough to follow on audio, but the visuals are worth seeing, so check out the YouTube version here.
You can also find the full episode on Spotify, Apple Podcasts, and YouTube.
You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!
👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off.👈
Also join us for Ep. 3 of Show Us Your Agent Skills: with Vincent, Paul Iusztin (Decoding AI), Eleanor Berger (Elite AI-Assisted Coding), Alan Nichol (Rasa), Nico Gerold (amp), and Matthew Honnibal (spaCy, Explosion).
Register on lu.ma to join live, or catch the recording afterwards.
LINKS
* Vincent Warmerdam on LinkedIn
* Vincent’s website (koaning.io)
* Wiggly Stuff — Vincent’s widget library
* Marimo Gallery
* skills.sh
* Armin Ronacher on Pi (the minimal agent inside open claw)
* Building Agents That Build Themselves — Hugo’s workshop write-up with Ivan Leo
* Data Science Fiction: Winning at Metrics, Losing at AI Evals — Hugo’s blog post based on Vincent’s talk
* Isaac Flath’s project (on X)
* Braid (video game)
* Hugo’s earlier podcast with Akshay (marimo)
* Elite AI Assisted Coding — Eleanor Berger’s course (Vanishing Gradients community gets 25% off with code “HUGO”)
* GameMakers Toolkit (YouTube)
* Upcoming Events on Luma
* Vanishing Gradients on YouTube
* Come build the future of Agentic Data Science with us in our upcoming course (10% off) .
How You Can Support Vanishing Gradients
Vanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.
If you want to help keep it going:
* Become a paid subscriber, from $8/month
* Share this with a builder who’d find it useful
* Subscribe to our YouTube channel.

Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Agentic Engineering and the Lost Art of Verification
12/05/2026 | 1h 32 mins.
> I almost don’t read code now. My approach with Roborev is it’s like my code reader. The mantra is: Roborev reads every line of code that is generated. It gets read multiple times. And so, whenever I push up a pull request, the branch gets re-reviewed. And so by the time I’m merging a pull request into a repository, the code has all been read by agents four or five times minimum. I look at the code in terms of structural detail: does it look right?
— Wes McKinney (creator of pandas, POSIT)
Wes, Jeremiah Lowin (Prefect), and Randy Olson (Good Eye Labs) join Hugo and his cohost Thomas Wiecki (PyMC Labs) for the premiere of Show Us Your Agent Skills, a live session where guests walk us through the exact skills, workflows, and setups they use to work with agents every day.
We Discuss:
* Wes McKinney on why he barely writes, or even reads, code anymore, his “software factory” of parallel agents, and RoboRev, the background reviewer that reads every line four or five times before he merges;
* The shift from “vibe coding” to agentic engineering, and why verification, not reading, is the part that actually matters;
* Jeremiah Lowin on years of context engineering: trickling voice memos, recorded meetings, and morning briefs into his agent’s memory substrate as a true “second brain”;
* Why Jeremiah picked OpenCode specifically for how deeply he can customize its memory, and what he’s building with FastMCP, Prefab, and Cardboard;
* Randy Olson on encoding human judgment, like Tufte’s rules for data visualization, directly into agent skills, so the agents themselves perform the verification;
* The “digital twin” Randy loads into his agents as a thought partner that pushes back instead of agreeing;
* Skills as thin drivers, progressive disclosure, and managing context rot across extended sessions;
* The rise of ephemeral, “just for me” software that agents finally make viable.
Skills and workflows discussed and shown in the episode:
* Wes’s RoboRev background code reviewer, his “software factory” dashboard, and his agentic engineering setup built on the Superpowers skills framework;
* Jeremiah’s “explain” skill (which anchors every other skill he has), his voice memo memory pipeline, his FastMCP and Prefab projects, and Cardboard, his ephemeral presentation tool;
* Randy’s data visualization verifier skills, his digital twin thought partner prompt, his cron job reports for colleagues, and his reflect and improve skill design pattern.
Check out the GitHub repo where we’re starting to drop some of these skills and workflows for you to grab and try yourself.
You can also find the full episode on Spotify, Apple Podcasts, and YouTube.
You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!
Up next on Show Us Your Agent Skills: Hilary Mason (CEO, HiddenDoor), Bryan Bischof (Theory Ventures), Eric Ma (Research DS lead, Moderna Therapeutics), and Tomasz Tunguz (Theory Ventures). Register on lu.ma to join live, or catch the recording afterwards.
👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here.👈

LINKS
* spicytakes.org, Wes McKinney’s website
* RoboRev, Wes’s background code reviewer
* Agents View, Wes’s agent session database
* Middleman, Wes’s local GitHub dashboard
* Superpowers, Jesse Vincent’s skills framework that Wes builds on
* An Open Source Maintainer’s Guide to Saying No, by Jeremiah Lowin
* FastMCP
* Prefab, Jeremiah’s Python DSL for generative UIs
* Beautiful Charts with AI, by Randy Olson
* The Coding Agent is Dead, by Amp
* Building Effective Agents, by the Anthropic team
* Show Us Your Agent Skills, the GitHub repo where we are dropping skills and workflows from the show
* Upcoming Events on Luma
* Vanishing Gradients on YouTube
* Watch the podcast video on YouTube
* Come build the future of Agentic Data Science with us in our upcoming course.
How You Can Support Vanishing Gradients
Vanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.
If you want to help keep it going:
* Become a paid subscriber, from $8/month
* Share this with a builder who’d find it useful
* Subscribe to our YouTube channel.

Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Next Level AI Evals for 2026
23/04/2026 | 53 mins.
There are a lot of reasons why we should do AI evals. For many companies doing AI evals is the way to build the feedback loop into the product development lifecycle. So it is like your compass. We’re using AI evals as a compass to guide product development and also product iteration. And also, many times we need evals to function as the pass or fail gate in release decisions. Whether this product is good enough for release or whether it is good enough for experiment, evals are also used in that.
Stella Wenxing Liu, Head of Applied Science at ASU, and Eddie Landesberg, Staff Data Scientist at Google, join Hugo to talk about why AI evaluation is evolving from “vibe checks” into a rigorous, multi-disciplinary science and how causal inference will take AI evals to the next level in 2026.

Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

They Discuss:
* Team-Centric AI Evals, integrating product managers, data scientists, and SMEs under a “benevolent dictator” (or not!) to ensure comprehensive and effective evaluation;
* Custom Evaluation Metrics, moving beyond generic vendor metrics to analyze raw data and identify specific failure modes, avoiding generic product outcomes;
* AI as Policy Evaluation, framing AI evaluation as a causal inference problem to estimate counterfactual performance of new “policies” (prompts, models) and predict online AB test outcomes;
* Clear Product Constraints, defining what an AI product should not do with strict guardrails to prevent misuse, control costs, and avoid brand dilution;
* Calibrated LLM Judges, statistically aligning LLM-as-a-judge with human experts using causal inference to ensure valid proxies for human welfare and business objectives;
* Essential Data Curiosity, fostering a culture of manual data inspection to build intuition before relying on automated error analysis or agents, ensuring effective system design;
* Statistical AI Evaluation, shifting from unit-test thinking to non-deterministic distributions, using confidence intervals and power analysis to discern genuine improvements from statistical noise;
* Proactive Regulatory Compliance, developing rigorous, defensible internal evaluation standards now to gain a competitive advantage as vague AI regulations move towards enforced compliance;
* Human-Centric Benchmarking, grounding AI systems in human judgment and user values, moving beyond automated scores to build resilient and differentiated AI.
You can also find the full episode on Spotify, Apple Podcasts, and YouTube.
You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!

👉 Stella has just started teaching a cohort of her AI Evals and Analytics Playbook course starting this week. She’s kindly giving listeners of Vanishing Gradients 30% off with this link.👈
Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.

LINKS
* Stella Wenxing Liu on LinkedIn
* Eddie Landesberg on LinkedIn
* Stella’s AI Evals & Analytics Playbook course on Maven (30% community discount)
* CJE (Causal Judge Evaluation) package by Eddie
* Trillion Dollar Coach
* Goodhart’s Law
* Upcoming Events on Luma
* Vanishing Gradients on YouTube
* Watch the podcast video on YouTube
How You Can Support Vanishing Gradients
Vanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.
If you want to help keep it going:
* Become a paid subscriber, from $8/month
* Share this with a builder who’d find it useful
* Subscribe to our YouTube channel.

Thanks for reading Vanishing Gradients! This post is public so feel free to share it.

Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI
15/04/2026 | 1h 6 mins.
Katharine Jarmul, Privacy in ML/AI Expert & Author of Practical Data Privacy, joins Hugo to unpack why most AI privacy advice is theater: and what technical privacy actually looks like when you’re shipping LLMs, agents, and multimodal systems into the real world.
In this episode, we dig into how to build defensible systems in an era of AI agents and multimodal models: why system prompts (and your entire agent harness!) should be considered public by default, and why “privacy observability” is as critical as data observability for anyone building with LLMs today. Multimodal is what changes the threat model: identifiers hide in images, audio, and metadata, not just text, and the old anonymization playbook doesn’t cover it.
Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

We Discuss:
* No Convenience Tax, you don’t have to trade privacy for utility: high-utility AI products can be privacy-preserving through technical controls like privacy routing and input sanitization;
* Public Prompts and Harnesses: assume any instruction or secret in a system prompt or agent harness will be exfiltrated; don’t put sensitive info there in the first place;
* Privacy Observability, tag and track data flows so information is used only for its original intended purpose: catch design flaws before they become legal problems;
* Technical Privacy, implement mathematical and statistical constraints directly into ML systems and data flows so privacy is measurable and enforceable, not aspirational;
* Tiered Guardrails, a three-layer approach: deterministic filters for hard rules, algorithmic models for nuanced classification, and internal alignment training for behavioral baselines;
* Federated Learning Is Not Privacy, model updates in FL leak sensitive data on their own: you must layer differential privacy or encrypted computation on top, or you’re reverse-engineerable;
* Anonymization Spectrum, navigate the “grayscale” of privacy in multimodal AI, balancing data utility and individual risk as identifiers hide in non-obvious places;
* Privacy Champions, embed privacy accountability directly into development by training and incentivizing engineers inside product teams;
* Red Teaming as Ritual, your goal is to attack yourself: practice thinking like an attacker, and turn privacy testing into an organization-wide creative ritual rather than a siloed security task.
You can also find the full episode on Spotify, Apple Podcasts, and YouTube.
You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!
👉 Katharine is teaching her next cohort of Practical AI Privacy starting April 20. She’s kindly giving readers of Vanishing Gradients 10% off. Use this link. I’ll be taking it so hope to see you there!👈
Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.
LINKS
* Practical AI Privacy course on Maven (10% off with code build-with-privacy)
* Katharine Jarmul on LinkedIn
* Probably Private — Katharine’s website & newsletter
* Practical Data Privacy (Katharine’s book)
* Let’s Build an AI Privacy Router — Lightning Lesson
* Practical AI Privacy: Agents & Local LLMs (newsletter issue)
* A Deep Dive into Memorization in Deep Learning (kjamistan blog)
* Microsoft Presidio
* Llama Guard 3 8B on Hugging Face
* Nicholas Carlini
* From Magic to Malware: How OpenClaws Agent Skills Become an Attack Surface (1Password)
* Owning Ethics (Metcalf, Moss, boyd — Data & Society)
* Hugo on guardrails in LLM applications
* Upcoming Events on Luma
* Vanishing Gradients on YouTube
* Watch the podcast video on YouTube
How You Can Support Vanishing Gradients
Vanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.
If you want to help keep it going:
* Become a paid subscriber, from $8/month
* Share this with a builder who’d find it useful
* Subscribe to our YouTube channel.
Thanks for reading Vanishing Gradients! This post is public so feel free to share it.

Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

More Science podcasts

About Vanishing Gradients

A podcast for people who build with AI. Long-format conversations with people shaping the field about agents, evals, multimodal systems, data infrastructure, and the tools behind them. Guests include Jeremy Howard (fast.ai), Hamel Husain (Parlance Labs), Shreya Shankar (UC Berkeley), Wes McKinney (creator of pandas), Samuel Colvin (Pydantic) and more. hugobowne.substack.com

Podcast website

Science Technology