The MAD Podcast with Matt Turck

126 episodes

The Biggest AI Deployment Nobody Talks About | Samsara CEO Sanjit Biswas
30/07/2026 | 1h
Sanjit Biswas runs what may be the largest AI deployment in the physical world — and almost nobody in AI talks about it. Samsara (NYSE: IOT), the ~$20B company he co-founded after selling Meraki to Cisco for $1.2B, puts AI on millions of trucks, cranes, and industrial assets: 25 trillion data points a year, 99% of US roads driven every single day, ~$2B in ARR growing 30% profitably. In this episode we go through the entire physical AI stack — asset tags you can run over with a truck, a paper-thin disposable tracking label, engine fault codes, and dash cams running inference at the edge — then into agents, including the Agent Studio warranty agent that compresses an hour of human work into under a minute. We also get into the uncomfortable part (when AI watches you drive all day, is that coaching or surveillance — and why drivers actually want the cameras), mixed fleets of humans and robots, why autonomous trucking will take far longer than robotaxis, and a startling stat from the field: one utility building 3x more grid capacity in the next five years than it did in the previous 125, with 90% of that demand coming from data centers.

(00:00) Intro: The biggest AI deployment nobody talks about
(01:16) What is physical AI?
(03:04) From IoT dashboards to agentic action
(04:36) Why physical AI is harder than software AI
(06:07) Safety, cybersecurity, and real-world consequences
(07:11) What Samsara does
(08:22) $2B ARR, 25 trillion data points, and 380,000 crashes
(09:44) How AI can prevent road accidents
(11:28) From an MIT research project to Meraki
(13:42) Learning physical operations from scratch
(15:42) Samsara’s stack: sensors, intelligence, and action
(16:39) Inside Samsara’s industrial asset trackers
(18:36) Bluetooth, battery life, and connected infrastructure
(19:49) A disposable tracking device built like a sticker
(21:08) Vehicle gateways and engine diagnostics
(22:16) How AI dash cams coach drivers in real time
(23:28) Turning the dash cam into an AI interface
(24:49) Organizing physical-world data in the cloud
(26:32) Selling AI to traditional industries
(27:52) Is Samsara’s real-world data its AI moat?
(29:22) The network effects of covering 99% of U.S. roads
(31:23) Edge AI versus cloud AI
(32:35) The models running inside Samsara’s devices
(33:52) Generative AI and video reasoning
(35:57) Which AI models does Samsara use?
(36:50) Inside Samsara Agent Studio
(37:56) How an AI warranty agent works
(38:50) Starting with practical, lower-risk automation
(40:10) Combining agents, workflows, rules, and guardrails
(42:07) What today’s AI agents still cannot do
(43:07) AI ride-alongs and the future of driver coaching
(45:17) Is workplace AI becoming Big Brother?
(46:27) How cameras can protect and exonerate drivers
(48:48) When AI becomes the judge of your work
(50:37) Robots, humanoids, and mixed human-machine fleets
(53:19) Samsara’s role in autonomous operations
(54:50) How quickly will autonomous trucking arrive?
(56:32) AI data centers and America’s infrastructure boom
(58:16) Should lawyers become plumbers? Demand for tradespeople
(59:54) Closing thoughts
The Biggest Chip Ever Built — Why OpenAI Runs On It | Cerebras CEO Andrew Feldman
23/07/2026 | 1h 12 mins.
AI is no longer just a race to train smarter models. As AI moves into production, the bottleneck is increasingly inference: how fast models can generate tokens, use tools, reason, verify, and act. In this episode of the MAD Podcast, Matt Turck sits down with Andrew Feldman, co-founder and CEO of Cerebras, to explain why fast inference may define the next era of AI.

Cerebras is known for building a chip the size of a silicon wafer. But this conversation is not just about one company or one chip. It is a deep dive into the AI infrastructure stack: GPUs, ASICs, memory, HBM, SRAM, data centers, power, TSMC, AWS, OpenAI, agents, reasoning models, and why speed changes what AI products can become. Andrew explains why “tokens per second per user” matters, why generating a single word can require moving the equivalent of 100 HD movies through memory, why agents amplify latency, why GPUs struggle with certain inference workloads, and why fast AI may eventually reshape SaaS itself.

This is a reference conversation on fast inference, AI chips, and the next compute bottleneck.

(00:00) Cold open & Intro
(01:31) Why speed became the AI bottleneck
(02:32) Tokens per second per user, explained
(03:16) AI’s broadband moment and the Netflix analogy
(04:35) The AI chip landscape: GPUs, TPUs, Trainium, ASICs
(06:36) What is an ASIC?
(08:08) Nvidia, Groq, and the fast inference war
(09:16) OpenAI, Broadcom, and specialized silicon
(12:10) China, power, and sovereign AI infrastructure
(15:05) Is the AI infrastructure boom a bubble?
(18:56) The hidden bottlenecks: HBM, CoWoS, and 3nm
(22:57) Why agents are creating CPU demand
(25:36) Andrew Feldman’s path from SeaMicro to Cerebras
(26:13) Why Cerebras bet on AI in 2016
(31:14) SRAM vs. HBM: why inference is a memory problem
(33:19) What wafer-scale computing actually means
(34:28) The deep-tech “Everest” problem
(36:07) The moment the first Cerebras system worked
(36:49) Ringing the bell and surviving deep tech
(39:08) How a giant chip handles failure
(41:22) Why GPUs struggle with decode
(42:17) Prefill vs. decode explained
(44:01) The “100 HD movies” problem in AI inference
(45:04) How fast inference changes RL and training
(48:08) Reasoning models and why they cost more compute
(50:08) Verification, guardrails, and small models checking big models
(52:37) Multimodal AI and the path to video
(53:51) Cerebras’ business model: hardware, cloud, and API
(55:14) OpenAI’s 750MW inference deal
(55:36) Why data centers are measured in megawatts
(58:01) AWS Trainium + Cerebras decode
(59:29) Fast tokens as a cloud product
(01:00:52) Is CUDA still a moat?
(01:03:53) How TSMC helped Cerebras build the giant chip
(01:07:41) Why nobody cared in 2020
(01:08:15) Why chip supply chains are hard to diversify
(01:09:54) Why today’s AI models will be the worst you ever use
(01:10:38) What fast AI could do to SaaS
OpenAI’s Compute Chief: We Can’t Build Fast Enough | Sachin Katti
16/07/2026 | 43 mins.
Is the AI industry actually overbuilding, or is the physical world moving too slowly to keep up? In this episode of the MAD Podcast, OpenAI's Head of Industrial Compute, Sachin Katti, takes us inside the "belly of the beast" of what may be the largest infrastructure project in human history. We explore the staggering physical reality of the AI boom—from $50 billion supercomputers and liquid-cooled data centers that "turn electrons into tokens," to overhauling the U.S. power grid and exploring nuclear energy. Sachin also pulls back the curtain on OpenAI's Stargate strategy, their move into custom silicon with Project Jalapeno, and the mind-bending reality that AI is now beginning to design the very chips that will power its own future.

(00:00) — Cold open: “One of the largest things humanity has ever built”
(00:30) — Welcome: Sachin Katti, Head of Industrial Compute at OpenAI
(01:44) — Is this the biggest infrastructure buildout in history?
(03:41) — Why OpenAI is building a new industrial muscle
(04:54) — What an AI data center actually is
(05:27) — “Factories turning electrons into tokens”
(06:35) — Why AI data centers need liquid cooling everywhere
(08:10) — The power problem: grids, generation, transmission, substations
(10:43) — Behind-the-meter power and gas turbines
(11:02) — Why nuclear “can’t come soon enough”
(11:49) — Jalapeño: why OpenAI is designing its own AI chips
(13:19) — Tokens per watt: the new metric that matters
(13:38) — Why inference may now dominate AI compute
(14:58) — Is OpenAI overbuilding compute?
(16:47) — Why OpenAI thinks the bigger risk is not building fast enough
(17:55) — Communities, jobs, water, and the local data-center debate
(21:16) — How OpenAI chooses data-center sites
(22:25) — What “industrial compute” means inside OpenAI
(25:59) — Sachin’s path: Stanford, startups, Intel, OpenAI
(28:05) — OpenAI’s compute portfolio: Microsoft, hyperscalers, neoclouds
(29:37) — Stargate explained
(31:21) — Abilene, Oracle, and the next wave of AI data centers
(32:48) — How massive AI compute gets financed
(34:05) — How OpenAI designed Jalapeño so quickly
(35:59) — AI is starting to help design AI chips
(36:20) — MRC: the networking problem behind 100,000 GPUs
(38:47) — Bottlenecks: transformers, turbines, electricians, supply chains
(40:29) — Guaranteed capacity: intelligence as a supply unit
(42:08) — Will AI data centers move to space?
Stripe's AI Chief: How AI Agents Will Buy, Sell, and Pay
09/07/2026 | 1h 14 mins.
Is the internet ready for AI agents to take over our wallets and run their own businesses? In this episode of The MAD Podcast, Stripe's Emily Sands reveals how agentic commerce is rapidly shifting from a hypothetical concept to deployed financial infrastructure. From combating the rising existential threat of token theft to solving the bottleneck of "vibe deployment", Emily unpacks the shared payment tokens and real-time billing systems required to securely scale autonomous digital buyers and highlights a near future where agents operate as independent, end-to-end micro-firms.

(00:00) — Cold open & Intro
(01:24) — The rise of agentic e-commerce
(02:11) — The spectrum of agent-led purchases
(03:16) — How merchants adapt to AI-driven commerce
(05:50) — Defining the levels of autonomy in AI shopping
(07:08) — What is the Agent E-Commerce Protocol (AEP)?
(08:49) — Shared payment tokens and secure AI transactions
(09:58) — Who is adopting the Agent E-Commerce Protocol?
(11:38) — Can agents negotiate and sell products?
(13:32) — The macroeconomic impact of AI agents
(14:46) — The boom of solopreneurs and AI-driven business creation
(16:56) — Why building trust is the biggest roadblock for AI commerce
(20:19) — How link wallets improve payment security
(21:21) — Improving the user experience in AI shopping apps
(23:16) — How the Link Wallet sets guardrails for AI agents
(25:40) — One-time use virtual cards vs flexible AI wallets
(28:03) — Unpacking the shared payment token primitive
(29:59) — How stablecoins enable profitable AI microtransactions
(35:03) — Managing liability: Who is at fault if an agent goes haywire?
(36:38) — Why agent payments might be safer than human transactions
(37:41) — What is Vibe Deployment?
(40:13) — Why Stripe built Stripe Projects for agent deployment
(41:22) — Why Stripe cares about orchestrating app deployments
(42:50) — How tokens break the traditional SaaS billing model
(44:34) — Why AI companies are moving to hybrid and usage-based billing
(47:15) — Streaming payments and real-time token tracking
(48:42) — The massive data challenge for AI company accountants
(50:41) — Token theft: The fastest-growing fraud in the AI economy
(52:04) — The cottage industry of free trial and multi-account abuse
(54:16) — How fraudsters monetize stolen AI tokens on the dark web
(01:00:06) — How Stripe Radar uses network density to fight AI fraud
(01:01:15) — Tempo's role in the Agent E-Commerce Protocol
(01:04:12) — The AI startup ecosystem is accelerating business creation
(01:09:01) — The token cost shock: Are buyers getting carried away?
(01:11:19) — 2026 Predictions: Agents running businesses end-to-end
Inside Nemotron & NVIDIA’s AI Lab | Bryan Catanzaro
02/07/2026 | 1h 22 mins.
NVIDIA is a chip company. So why does it put hundreds of researchers on building AI models — and then give them away for free? Bryan Catanzaro is VP of Applied Deep Learning Research at NVIDIA and one of the people whose work quietly underpins modern AI: he helped create cuDNN (NVIDIA's first deep learning product), co-invented DLSS, and named and built Megatron, the framework behind how much of the industry trains large models. Today he leads Nemotron, NVIDIA's family of open models — and Nemotron 3 Ultra, released just weeks ago, is one of the strongest open-weights models to come out of the US.

Matt Turck sits down with Bryan for a genuinely deep conversation: the real business logic behind a chip company building its own models, the state of open vs. closed AI, and whether the US is falling behind China in open models. Then they go inside Nemotron itself — four-bit (NVFP4) pretraining, hybrid Mamba-Transformer architecture, mixture-of-experts, multi-token prediction, and multi-teacher distillation — all explained in plain language. Plus a rare look at how a modern AI research org actually runs, what it was like working alongside Andrew Ng and Dario Amodei at Baidu, why Bryan doesn't believe in the singularity, and his contrarian case that open AI is safer than closed.

A reference conversation for anyone trying to understand where AI is really headed.

(00:00) — Cold open & Intro
(01:33) — Is open source AI catching the frontier?
(05:29) — Do closed labs blocking distillation slow open source down?
(07:42) — Is the US falling behind China?
(10:30) — Why companies actually choose open models
(12:39) — A "crazy" 2008 bet: machine learning on GPUs
(15:33) — Working with Andrew Ng and Dario Amodei at Baidu
(17:41) — Coming back to NVIDIA: DLSS and the birth of Megatron
(21:55) — The real reason NVIDIA builds its own models
(24:28) — Is Moore's Law really dead?
(33:37) — The Nemotron family: Nano, Super, Ultra
(35:09) — Built for agents: why NVIDIA bets on speed
(36:02) — How you train a 550B model in 4 bits
(39:25) — Hybrid Mamba-Transformer, explained simply
(42:31) — Mixture of experts — and why NVIDIA built NVL72 around it
(47:26) — Why a 1-million-token context window matters
(49:26) — Multi-token prediction: how the model predicts 5 tokens at once
(52:47) — Multi-teacher distillation: teaching one model from many
(58:01) — Where reinforcement learning goes next
(01:00:16) — Inside NVIDIA's research org: "the mission is the boss"
(01:04:03) — How NVIDIA decides who gets the GPUs
(01:10:53) — Why NVIDIA still feels entrepreneurial after 33 years
(01:12:58) — Why Bryan doesn't believe in the singularity
(01:17:50) — The AI backlash
(01:19:18) — The controversial case: open AI is safer than closed

More Technology podcasts

Trending Technology podcasts

About The MAD Podcast with Matt Turck

The MAD Podcast with Matt Turck, is a series of conversations with leaders from across the Machine Learning, AI, & Data landscape hosted by leading AI & data investor and Partner at FirstMark Capital, Matt Turck.

Podcast website

Technology

Listen to The MAD Podcast with Matt Turck, Acquired and many other podcasts from around the world with the radio.net app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Open app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

The MAD Podcast with Matt Turck

Scan code,
download the app,
start listening.

The MAD Podcast with Matt Turck: Podcasts in Family