
Explaining Eval Engineering | Galileo's Vikram Chatterji
19/12/2025 | 37 mins.
You've heard of evaluations—but eval engineering is the difference between AI that ships and AI that's stuck in prototype.Most teams still treat evals like unit tests: write them once, check a box, move on. But when you're deploying agents that make real decisions, touch real customers, and cost real money, those one-time tests don't cut it. The companies actually shipping production AI at scale have figured out something different—they've turned evaluations into infrastructure, into IP, into the layer where domain expertise becomes executable governance.Vikram Chatterji, CEO and Co-founder of Galileo, returns to Chain of Thought to break down eval engineering: what it is, why it's becoming a dedicated discipline, and what it takes to actually make it work. Vikram shares why generic evals are plateauing, how continuous learning loops drive accuracy, and why he predicts "eval engineer" will become as common a role as "prompt engineer" once was.In this conversation, Conor and Vikram explore:Why treating evals as infrastructure—not checkboxes—separates production AI from prototypesThe plateau problem: why generic LLM-as-a-judge metrics can't break 90% accuracyHow continuous human feedback loops improve eval precision over timeThe emerging "eval engineer" role and what the job actually looks likeWhy 60-70% of AI engineers' time is already spent on evalsWhat multi-agent systems mean for the future of evaluationVikram's framework for baking trust AND control into agentic applicationsPlus: Conor shares news about his move to Modular and what it means for Chain of Thought going forward.Chapters:00:00 – Introduction: Why Evals Are Becoming IP01:37 – What Is Eval Engineering?04:24 – The Eval Engineering Course for Developers05:24 – Generic Evals Are Plateauing08:21 – Continuous Learning and Human Feedback11:01 – Human Feedback Loops and Eval Calibration13:37 – The Emerging Eval Engineer Role16:15 – What Production AI Teams Actually Spend Time On18:52 – Customer Impact and Lessons Learned24:28 – Multi-Agent Systems and the Future of Evals30:27 – MCP, A2A Protocols, and Agent Authentication33:23 – The Eval Engineer Role: Product-Minded + Technical34:53 – Final Thoughts: Trust, Control, and What's NextConnect with Conor Bronsdon:Substack – https://conorbronsdon.substack.com/LinkedIn – https://www.linkedin.com/in/conorbronsdon/X (Twitter) – https://x.com/ConorBronsdonLearn more about Eval Engineering:https://galileo.ai/evalengineeringConnect with Vikram Chatterji:LinkedIn – https://www.linkedin.com/in/vikram-chatterji/

Debunking AI's Environmental Panic | Andy Masley
26/11/2025 | 59 mins.
AI is destroying the planet—or so we've been told. This week on Chain of Thought, we tackle one of the most persistent and misleading narratives in the AI conversation.Andy Masley, Director of Effective Altruism DC, joins host Conor Bronsdon to fact-check the absurd AI environmental claims you've heard at parties, in articles, and even in bestselling books. Andy recently went viral for discovering what he calls "the single most egregious math mistake" he's ever seen in a book—a data center water usage calculation in Karen Hao's NYT Bestseller, Empire of AI, that was off by a factor of 4,500.In this conversation, Andy and Conor break down the myths around AI’s water and energy usage and explore:The viral Empire of AI error and what it reveals about the broader debateWhy most AI water usage statistics are misleading or flat-out wrongHow one ChatGPT prompt represents just 1/150,000th of your daily emissionsTrade-offs around data center cooling + decision makingWhy "tribal thinking" about AI is distorting environmental activismWhere AI might actually help the climate through deep learning optimizationIf you've ever felt guilty about using AI tools, been cornered at a party about AI's environmental impact, or simply want to understand what the data actually says, this episode, and Andy’s deep dive articles, arm you with the facts.Chapters:00:00 – Introduction: The Party Guilt Problem01:54 – Andy's Background and What Sparked This Work03:50 – The 4,500x Error in Empire of AI06:39 – Breaking Down the Math: Liters vs. Cubic Meters10:39 – The Unintended Consequence: Air Cooling vs. Water Cooling12:51 – Karen Hao's Response and What's Still Missing19:08 – Why Environmentalists Should Focus Elsewhere21:41 – The Danger of Tribal Thinking About AI25:49 – What Is Effective Altruism (And Why People Attack It)29:15 – EA, AI Risk, and P(doom)34:31 – Why Misinformation Hurts Your Own Side37:39 – Using ChatGPT Is Not Bad for the Environment42:14 – The Party Rebuttal: Practical Comparisons45:23 – Water Use Reality: 1/800,000th of Your Daily Footprint48:27 – The Personal Carbon Footprint Distraction53:38 – Data Centers: Efficiency vs. Whether to Build55:13 – AI's Net Climate Impact: The Positive Case59:34 – Deep Learning, Smart Grids, and Climate Optimization1:03:45 – Final ThoughtsKey referencesIEA Study: AI and climate change - https://www.iea.org/reports/energy-and-ai/ai-and-climate-change#abstract Nature: https://www.nature.com/articles/s44168-025-00252-3 The Empire of AI Error: https://andymasley.substack.com/p/empire-of-ai-is-wildly-misleading Using ChatGPT isn’t bad for the environment: https://andymasley.substack.com/p/a-short-summary-of-my-argument-thathttps://andymasley.substack.com/p/a-cheat-sheet-for-conversations-about Connect with Andy Masley: Substack – https://andymasley.substack.com/X (Twitter) – https://x.com/AndyMasleyConnect with Conor Bronsdon: Substack – https://conorbronsdon.substack.com/LinkedIn – https://www.linkedin.com/in/conorbronsdon/X (Twitter) – https://x.com/ConorBronsdon

The Critical Infrastructure Behind the AI Boom | Cisco CPO Jeetu Patel
19/11/2025 | 1h 18 mins.
AI is accelerating at a breakneck pace, but model quality isn’t the only constraint we face.. There are major infrastructure requirements, energy needs, security, and data pipelines to run AI at scale. This week on Chain of Thought, Cisco’s President and Chief Product Officer Jeetu Patel joins host Conor Bronsdon to reveal what it actually takes to build the critical foundation for the AI era.Jeetu breaks down the three bottlenecks he sees holding AI back today: • Infrastructure limits: not enough power, compute, or data center capacity • A trust deficit: non-deterministic models powering systems that must be predictable • A widening data gap: human-generated data plateauing while machine data explodesJeetu then shares how Cisco is tackling these challenges through secure AI factories, edge inference, open multi-model architectures, and global partnerships with Nvidia, G42, and sovereign cloud providers. Jeetu also explains why he thinks enterprises will soon rely on thousands of specialized models — not just one — and how routing, latency, cost, and security shape this new landscape.Conor and Jeetu also explore high-performance leadership and team culture, discussing building high-trust teams, embracing constructive tension, staying vigilant in moments of success, and the personal experiences that shaped Jeetu’s approach to innovation and resilience.If you want a clearer picture of the global AI infrastructure race, how high-level leaders are thinking about the future, and what it all means for enterprises, developers, and the future of work, this conversation is essential.Chapters:00:00 – Welcome to Chain of Thought0:48 - AI and Jobs: Beyond the Hype6:15 - The Real AI Opportunity: Original Insights10:00 - Three Critical AI Constraints: Infrastructure, Trust, and Data16:27 - Cisco's AI Strategy and Platform Approach19:18 - Edge Computing and Model Innovation22:06 - Strategic Partnerships: Nvidia, G42, and the Middle East29:18 - Acquisition Strategy: Platform Over Products32:03 - Power and Infrastructure Challenges36:06 - Building Trust Across Global Partnerships38:03 - US vs. China: The AI Infrastructure Race40:33 - America's Venture Capital Advantage42:06 - Acquisition Philosophy: Strategy First45:45 - Defining Cisco's True North48:06 - Mission-Driven Innovation Culture50:15 - Hiring for Hunger, Curiosity, and Clarity56:27 - The Power of Constructive Conflict1:00:00 - Career Lessons: Continuous Learning1:02:24 - The Email Question1:04:12 - Joe Tucci's Four-Column Exercise1:08:15 - Building High-Trust Teams1:10:12 - The Five Dysfunctions Framework1:12:09 - Leading with Vulnerability1:16:18 - Closing Thoughts and Where to ConnectConnect with Jeetu Patel:LinkedIn – https://www.linkedin.com/in/jeetupatel/ X(twitter) – https://x.com/jpatel41Cisco - https://www.cisco.com/Connect with ConorBronsdon Substack – https://conorbronsdon.substack.com/ LinkedIn – https://www.linkedin.com/in/conorbronsdon/X (twitter) – https://x.com/ConorBronsdon

Beyond Transformers: Maxime Labonne on Post-Training, Edge AI, and the Liquid Foundation Model Breakthrough
12/11/2025 | 52 mins.
The transformer architecture has dominated AI since 2017, but it’s not the only approach to building LLMs - and new architectures are bringing LLMs to edge devicesMaxime Labonne, Head of Post-Training at Liquid AI and creator of the 67,000+ star LLM Course, joins Conor Bronsdon to challenge the AI architecture status quo. Liquid AI’s hybrid architecture, combining transformers with convolutional layers, delivers faster inference, lower latency, and dramatically smaller footprints without sacrificing capability. This alternative architectural philosophy creates models that run effectively on phones and laptops without compromise.But reimagined architecture is only half the story. Maxime unpacks the post-training reality most teams struggle with: challenges and opportunities of synthetic data, how to balance helpfulness against safety, Liquid AI’s approach to evals, RAG architectural approaches, how he sees AI on edge devices evolving, hard won lessons from shipping LFM1 through 2, and much more. If you're tired of surface-level AI takes and want to understand the architectural and engineering decisions behind production LLMs from someone building them in the trenches, this is your episode.Connect with Maxime Labonne :LinkedIn – https://www.linkedin.com/in/maxime-labonne/ X (Twitter) – @maximelabonneAbout Maxime – https://mlabonne.github.io/blog/about.html HuggingFace – https://huggingface.co/mlabonne The LLM Course – https://github.com/mlabonne/llm-course Liquid AI – https://liquid.ai Connect with Conor Bronsdon :X (twitter) – @conorbronsdonSubstack – https://conorbronsdon.substack.com/ LinkedIn – https://www.linkedin.com/in/conorbronsdon/00:00 Intro — Welcome to Chain of Thought 00:27 Guest Intro — Maxime Labonne of Liquid AI 02:21 The Hybrid LLM Architecture Explained 06:30 Why Bigger Models Aren’t Always Better 11:10 Convolution + Transformers: A New Approach to Efficiency 18:00 Running LLMs on Laptops and Wearables 22:20 Post-Training as the Real Moat 25:45 Synthetic Data and Reliability in Model Refinement 32:30 Evaluating AI in the Real World 38:11 Benchmarks vs Functional Evals 43:05 The Future of Edge-Native Intelligence 48:10 Closing Thoughts & Where to Find Maxime Online

Architecting AI Agents: The Shift from Models to Systems | Aishwarya Srinivasan, Fireworks AI Head of AI Developer Relations
08/10/2025 | 53 mins.
Most AI agents are built backwards, starting with models instead of system architecture.Aishwarya Srinivasan, Head of AI Developer Relations at Fireworks AI, joins host Conor Bronsdon to explain the shift required to build reliable agents: stop treating them as model problems and start architecting them as complete software systems. Benchmarks alone won't save you. Aish breaks down the evolution from prompt engineering to context engineering, revealing how production agents demand careful orchestration of multiple models, memory systems, and tool calls. She shares battle-tested insights on evaluation-driven development, the rise of open source models like DeepSeek v3, and practical strategies for managing autonomy with human-in-the-loop systems. The conversation addresses critical production challenges, ranging from LLM-as-judge techniques to navigating compliance in regulated environments.Connect with Aishwarya Srinivasan:LinkedIn: https://www.linkedin.com/in/aishwarya-srinivasan/Instagram: https://www.instagram.com/the.datascience.gal/Connect with Conor: https://www.linkedin.com/in/conorbronsdon/00:00 Intro — Welcome to Chain of Thought00:22 Guest Intro — Ash Srinivasan of Fireworks AI02:37 The Challenge of Responsible AI05:44 The Hidden Risks of Reward Hacking07:22 From Prompt to Context Engineering10:14 Data Quality and Human Feedback14:43 Quantifying Trust and Observability20:27 Evaluation-Driven Development30:10 Open Source Models vs. Proprietary Systems34:56 Gaps in the Open-Source AI Stack38:45 When to Use Different Models45:36 Governance and Compliance in AI Systems50:11 The Future of AI Builders56:00 Closing Thoughts & Follow Ash OnlineFollow the hostsFollow AtinFollow ConorFollow VikramFollow Yash



Chain of Thought