⚡️CloudChef: Your Robot Chef - Michellin-Star food at $12/hr (w/ Kitchen tour!)
One of the new tracks at next week’s AI Engineer conference in SF is a new focus on LLMs + Robotics, ft. household names like Waymo and Physical Intelligence. However there are many other companies applying LLMs and VLMs in the real world!
CloudChef, the first industrial-scale kitchen robotics company with one-shot demonstration learning and an incredibly simple business model, will be serving tasty treats all day with Zippy (https://www.cloudchef.co/zippy ) their AI Chef platform.
This is a lightning pod with CEO Nikhil Abraham to preview what Zippy is capable of!
https://www.cloudchef.co/platform
See a real chef comparison: https://www.youtube.com/watch?v=INDhZ7LwSeo&t=64s
See it in the AI Engineer Expo at SF next week: https://ai.engineer
Chapters
00:00 Welcome and Introductions
00:58 What is Cloud Chef?
01:36 How the Robots Work: Culinary Intelligence
05:57 Commercial Applications and Early Success
07:02 The Software-First Approach
10:09 Business Model and Pricing
13:10 Demonstration Learning: Training the Robots
16:03 Call to Action and Engineering Opportunities
18:45 Final Thoughts and Technical Details
--------
The AI Coding Factory
We are joined by Eno Reyes and Matan Grinberg, the co-founders of Factory.ai. They are building droids for autonomous software engineering, handling everything from code generation to incident response for production outages. After raising a $15M Series A from Sequoia, they just released their product in GA!
https://factory.ai/
https://x.com/latentspacepod
Chapters
00:00:00 Introductions
00:00:35 Meeting at Langchain Hackathon
00:04:02 Building Factory despite early model limitations
00:06:56 What is Factory AI?
00:08:55 Delegation vs Collaboration in AI Development Tools
00:10:06 Naming Origins of 'Factory' and 'Droids'
00:12:17 Defining Droids: Agent vs Workflow
00:14:34 Live Demo
00:17:37 Enterprise Context and Tool Integration in Droids
00:20:26 Prompting, Clarification, and Agent Communication
00:22:28 Project Understanding and Proactive Context Gathering
00:24:10 Why SWE-Bench Is Dead
00:28:47 Model Fine-tuning and Generalization Challenges
00:31:07 Why Factory is Browser-Based, Not IDE-Based
00:33:51 Test-Driven Development and Agent Verification
00:36:17 Retrieval vs Large Context Windows for Cost Efficiency
00:38:02 Enterprise Metrics: Code Churn and ROI
00:40:48 Executing Large Refactors and Migrations with Droids
00:45:25 Model Speed, Parallelism, and Delegation Bottlenecks
00:50:11 Observability Challenges and Semantic Telemetry
00:53:44 Hiring
00:55:19 Factory's design and branding approach
00:58:34 Closing Thoughts and Future of AI-Native Development
--------
⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).
Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.
Chapters
00:00 Introduction and Episode Overview
02:01 Discussion on Cloud 4 and its Features
04:31 Reasoning and Tool Use in AI Models
07:01 Extended Thinking in Claude and Model Differences
09:31 Speculation on Claude's Extended Thinking
11:01 Challenges and Controversies in AI Model Training
13:31 Technical Highlights and Code Trustworthiness
16:01 Token Costs and Incentives in AI Models
18:31 Thinking Budgets and AI Effort
21:01 Safety and Ethics in AI Model Development
23:31 Anthropic's Approach to AI Safety
26:01 LLM Arena and Evaluation Challenges
28:31 Developing Taste and Direction in AI Research
31:01 Recent Research and Multi-Turn RL
33:31 Tools and Incentives in AI Model Development
36:01 Challenges in Evaluating AI Model Outputs
38:31 Model-Based Rewards and Future Directions
41:01 Wrap-up and Future Plans
--------
39:57
ChatGPT Codex: The Missing Manual
ChatGPT Codex is here - the first cloud hosted Autonomous Software Engineer (A-SWE) from OpenAI. We sat down for a quick pod with two core devs on the ChatGPT Codex team: Josh Ma and Alexander Embiricos to get the inside scoop on the origin story of Codex, from WHAM to its future roadmap.
Follow them: https://github.com/joshma and https://x.com/embirico
Chapters
- 00:00 Introduction to the Latent Space Podcast
- 00:59 The Launch of ChatGPT Codex
- 03:08 Personal Journeys into AI Development
- 05:50 The Evolution of Codex and AI Agents
- 08:55 Understanding the Form Factor of Codex
- 11:48 Building a Software Engineering Agent
- 14:53 Best Practices for Using AI Agents
- 17:55 The Importance of Code Structure for AI
- 21:10 Navigating Human and AI Collaboration
- 23:58 Future of AI in Software Development
- 28:18 Planning and Decision-Making in AI Development
- 31:37 User, Developer, and Model Dynamics
- 35:28 Building for the Future: Long-Term Vision
- 39:31 Best Practices for Using AI Tools
- 42:32 Understanding the Compute Platform
- 48:01 Iterative Deployment and Future Improvements
--------
53:31
Claude Code: Anthropic's CLI Agent
More info: https://docs.anthropic.com/en/docs/claude-code/overview
The AI coding wars have now split across four battlegrounds:
1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc).
2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months.
3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results)
4. The cli-based agents: after Aider’s initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used.
Since we covered all three of the first categories, today’s guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it’s this piece from Boris: Claude Code is not a product as much as it’s a Unix utility.
This fits very well with Anthropic’s product principle: “do the simple thing first.” Whether it’s the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy:
Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day!
If you’re building AI developer tools, there’s also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy!
Timestamps
[00:00:00] Intro
[00:01:59] Origins of Claude Code
[00:04:32] Anthropic’s Product Philosophy
[00:07:38] What should go into Claude Code?
[00:09:26] Claude.md and Memory Simplification
[00:10:07] Claude Code vs Aider
[00:11:23] Parallel Workflows and Unix Utility Philosophy
[00:12:51] Cost considerations and pricing model
[00:14:51] Key Features Shipped Since Launch
[00:16:28] Claude Code writes 80% of Claude Code
[00:18:01] Custom Slash Commands and MCP Integration
[00:21:08] Terminal UX and Technical Stack
[00:27:11] Code Review and Semantic Linting
[00:28:33] Non-Interactive Mode and Automation
[00:36:09] Engineering Productivity Metrics
[00:37:47] Balancing Feature Creation and Maintenance
[00:41:59] Memory and the Future of Context
[00:50:10] Sandboxing, Branching, and Agent Planning
[01:01:43] Future roadmap
[01:11:00] Why Anthropic Excels at Developer Tools
The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space