Zihan Wang is an AI researcher at Northwestern University, where he works on vision-language models, robotics, and reinforcement learning. Previously, he interned at DeepSeek, contributing to projects like DeepSeek-V2.Zihan's homepage: Â https://zihanwang314.github.io/(00:00) - Introduction
(01:13) - Zihan's Background, CS and AI Research in China
(11:09) - DeepSeek; Human capital flow from PRC to US
(16:07) - DeepSeek, Open Source and AI Research
(31:52) - Model Size and Performance Constraints
(33:01) - Data Bottleneck in Pre-trained Models
(34:12) - Transformer Architecture and Scaling Laws
(36:30) - Efficiency in Model Training
(47:44) - Chain of Experts Architecture
(01:01:06) - Future of AI and Robotics
Music used with permission from Blade Runner Blues Livestream improvisation by State Azure.–Steve Hsu is Professor of Theoretical Physics and of Computational Mathematics, Science, and Engineering at Michigan State University. Previously, he was Senior Vice President for Research and Innovation at MSU and Director of the Institute of Theoretical Science at the University of Oregon. Hsu is a startup founder (SuperFocus.ai, SafeWeb, Genomic Prediction, Othram) and advisor to venture capital and other investment firms. He was educated at Caltech and Berkeley, was a Harvard Junior Fellow, and has held faculty positions at Yale, the University of Oregon, and MSU. Please send any questions or suggestions to
[email protected] or Steve on X @hsu_steve.