Join the Tool Use Discord: https://discord.gg/PnEGyXpjaX
Unlock the power of AI fine-tuning for image and video models with Greg Schoeninger, CEO of Oxen.ai. In this episode, we explore how to move beyond simple prompt engineering to training custom open-source models for as little as $1. Discover the technical strategies for building high-quality datasets, the trade-offs between LoRAs and full fine-tunes, and how to achieve consistent characters and styles in generative video.
We dive into real-world examples, including how to generate massive product catalogs for a fraction of the cost of enterprise APIs and the story behind the viral "Isometric NYC" project built with Claude Code. Greg breaks down the entire AI lifecycle—from data curation and labeling with Vision Language Models (VLMs) like Qwen and Gemini to deploying efficient, specialized models that outperform general-purpose giants. Whether you are looking to optimize GPU costs, automate video workflows, or build your own AI tools, this conversation provides the blueprint for scaling your AI capabilities.
Links from the episode:
Oxen AI: https://www.oxen.ai/
Fine Tuning Fridays: https://luma.com/oxen
Isometric NYC Project: https://cannoneyed.com/projects/isometric-nyc
Connect with usÂ
https://x.com/ToolUsePodcast
https://x.com/MikeBirdTechÂ
https://x.com/gregschoeninger
00:00:00 - Intro
00:01:47 - When to Switch from Prompt Engineering to Fine-Tuning
00:11:05 - How to Build a Dataset for AI Fine-Tuning
00:21:52 - LoRA vs. Full Fine-Tuning Explained
00:27:53 - Case Study: Fine-Tuned Qwen 3 VL vs. Google Gemini
00:36:33 - Case Study: Isometric NYC with Claude Code
Subscribe for more insights on AI tools, productivity, and fine-tuning.
Tool Use is a weekly conversation with the top AI experts.