This week on GAEA Talks, Graeme Scott sits down with Robert Nishihara - co-founder of Anyscale, creator of the open source Ray project, UC Berkeley PhD in machine learning and distributed systems, Harvard mathematics graduate, and one of the architects of the software infrastructure powering AI at OpenAI, Amazon, Cohere, Hugging Face, NVIDIA, Uber, Spotify and Visa.Robert's journey is the story of how modern AI is actually built. As a PhD student at UC Berkeley working with Michael Jordan and Ion Stoica, he and his co-founders kept hitting the same wall - they wanted to do research on algorithms but ended up spending all their time on distributed systems just to run their experiments. That frustration became Ray, the open source compute framework they built to make distributed AI accessible. In 2019 they founded Anyscale to commercialise Ray, and today it powers mission-critical AI workloads at many of the largest AI companies on earth.In this episode, recorded live at HumanX 2026 in San Francisco, Robert takes us inside the real engineering reality behind the AI boom - from the mindset shift that "the code is not the artifact" to the quiet revolution in data curation that has replaced architecture innovation as the frontier of model quality. He explains why the thirty-year lag from demo to production still haunts robotics and AI, why every serious AI company now runs across hyperscalers and neoclouds to scrounge for capacity, how teams manage rack-level GPU failures with "bad GPU" lists and suspected-bad lists, and why learning outside the model - through context engineering - may matter as much as training itself. This is essential listening for anyone building, funding, or betting on the infrastructure that will decide the next phase of AI.What you'll take away from this conversation:- The "code is not the artifact" mindset shift - why AI research code can be throwaway because the model, not the software, is the real deliverable- Why the thirty-year gap from demo to production is the defining challenge of AI reliability - and why autonomous driving is the canonical example- How data curation and synthetic data generation have quietly replaced architectures and optimisers as the true frontier of model quality- Why reinforcement learning is the next scaling frontier - data efficient, compute hungry, and a way to keep scaling when labelled data plateaus- Why the next leap in intelligence will come from learning outside the model - context engineering, mental models, and closing the reasoning-to-learning loop- The hardware reality no one talks about - 72-GPU racks, long-tail failure rates, and the scheduling gymnastics required to run unreliable hardware reliably- The "bad GPU" and "suspected-bad GPU" lists production teams actually maintain to keep training jobs alive- Why every serious AI team now runs across a hyperscaler and one or more neoclouds - and why advertised cloud capacity is effectively fiction- Why training and inference must share compute - statically partitioning your cluster is a cost trap that hits you at peak inference demand- Why text is a minuscule fraction of the world's data - and the shift from SQL on tabular data to inference on arbitrary data types will happen fast- Why the infrastructure team has to optimise for performance, cost AND researcher productivity - and why velocity is often what separates winners from losers- Robert's two biggest bets for the next wave of AI - compute-driven data generation, and systems that learn outside the model weights