Powered by RND
PodcastsEducation80,000 Hours Podcast
Listen to 80,000 Hours Podcast in the App
Listen to 80,000 Hours Podcast in the App
(7,438)(250,057)
Save favourites
Alarm
Sleep timer

80,000 Hours Podcast

Podcast 80,000 Hours Podcast
Rob, Luisa, and the 80,000 Hours team
Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them. Subscribe by searching for '80000 Hours' wherev...

Available Episodes

5 of 281
  • #214 – Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway
    Most AI safety conversations centre on alignment: ensuring AI systems share our values and goals. But despite progress, we’re unlikely to know we’ve solved the problem before the arrival of human-level and superhuman systems in as little as three years.So some are developing a backup plan to safely deploy models we fear are actively scheming to harm us — so-called “AI control.” While this may sound mad, given the reluctance of AI companies to delay deploying anything they train, not developing such techniques is probably even crazier.Today’s guest — Buck Shlegeris, CEO of Redwood Research — has spent the last few years developing control mechanisms, and for human-level systems they’re more plausible than you might think. He argues that given companies’ unwillingness to incur large costs for security, accepting the possibility of misalignment and designing robust safeguards might be one of our best remaining options.Links to learn more, highlights, video, and full transcript.As Buck puts it: "Five years ago I thought of misalignment risk from AIs as a really hard problem that you’d need some really galaxy-brained fundamental insights to resolve. Whereas now, to me the situation feels a lot more like we just really know a list of 40 things where, if you did them — none of which seem that hard — you’d probably be able to not have very much of your problem."Of course, even if Buck is right, we still need to do those 40 things — which he points out we’re not on track for. And AI control agendas have their limitations: they aren’t likely to work once AI systems are much more capable than humans, since greatly superhuman AIs can probably work around whatever limitations we impose.Still, AI control agendas seem to be gaining traction within AI safety. Buck and host Rob Wiblin discuss all of the above, plus:Why he’s more worried about AI hacking its own data centre than escapingWhat to do about “chronic harm,” where AI systems subtly underperform or sabotage important work like alignment researchWhy he might want to use a model he thought could be conspiring against himWhy he would feel safer if he caught an AI attempting to escapeWhy many control techniques would be relatively inexpensiveHow to use an untrusted model to monitor another untrusted modelWhat the minimum viable intervention in a “lazy” AI company might look likeHow even small teams of safety-focused staff within AI labs could matterThe moral considerations around controlling potentially conscious AI systems, and whether it’s justifiedChapters:Cold open |00:00:00|  Who’s Buck Shlegeris? |00:01:27|  What's AI control? |00:01:51|  Why is AI control hot now? |00:05:39|  Detecting human vs AI spies |00:10:32|  Acute vs chronic AI betrayal |00:15:21|  How to catch AIs trying to escape |00:17:48|  The cheapest AI control techniques |00:32:48|  Can we get untrusted models to do trusted work? |00:38:58|  If we catch a model escaping... will we do anything? |00:50:15|  Getting AI models to think they've already escaped |00:52:51|  Will they be able to tell it's a setup? |00:58:11|  Will AI companies do any of this stuff? |01:00:11|  Can we just give AIs fewer permissions? |01:06:14|  Can we stop human spies the same way? |01:09:58|  The pitch to AI companies to do this |01:15:04|  Will AIs get superhuman so fast that this is all useless? |01:17:18|  Risks from AI deliberately doing a bad job |01:18:37|  Is alignment still useful? |01:24:49|  Current alignment methods don't detect scheming |01:29:12|  How to tell if AI control will work |01:31:40|  How can listeners contribute? |01:35:53|  Is 'controlling' AIs kind of a dick move? |01:37:13|  Could 10 safety-focused people in an AGI company do anything useful? |01:42:27|  Benefits of working outside frontier AI companies |01:47:48|  Why Redwood Research does what it does |01:51:34|  What other safety-related research looks best to Buck? |01:58:56|  If an AI escapes, is it likely to be able to beat humanity from there? |01:59:48|  Will misaligned models have to go rogue ASAP, before they're ready? |02:07:04|  Is research on human scheming relevant to AI? |02:08:03|This episode was originally recorded on February 21, 2025.Video: Simon Monsour and Luke MonsourAudio engineering: Ben Cordell, Milo McGuire, and Dominic ArmstrongTranscriptions and web: Katy Moore
    --------  
    2:16:03
  • 15 expert takes on infosec in the age of AI
    "There’s almost no story of the future going well that doesn’t have a part that’s like '…and no evil person steals the AI weights and goes and does evil stuff.' So it has highlighted the importance of information security: 'You’re training a powerful AI system; you should make it hard for someone to steal' has popped out to me as a thing that just keeps coming up in these stories, keeps being present. It’s hard to tell a story where it’s not a factor. It’s easy to tell a story where it is a factor." — Holden KarnofskyWhat happens when a USB cable can secretly control your system? Are we hurtling toward a security nightmare as critical infrastructure connects to the internet? Is it possible to secure AI model weights from sophisticated attackers? And could AI might actually make computer security better rather than worse?With AI security concerns becoming increasingly urgent, we bring you insights from 15 top experts across information security, AI safety, and governance, examining the challenges of protecting our most powerful AI models and digital infrastructure — including a sneak peek from an episode that hasn’t yet been released with Tom Davidson, where he explains how we should be more worried about “secret loyalties” in AI agents. You’ll hear:Holden Karnofsky on why every good future relies on strong infosec, and how hard it’s been to hire security experts (from episode #158)Tantum Collins on why infosec might be the rare issue everyone agrees on (episode #166)Nick Joseph on whether AI companies can develop frontier models safely with the current state of information security (episode #197)Sella Nevo on why AI model weights are so valuable to steal, the weaknesses of air-gapped networks, and the risks of USBs (episode #195)Kevin Esvelt on what cryptographers can teach biosecurity experts (episode #164)Lennart Heim on on Rob’s computer security nightmares (episode #155)Zvi Mowshowitz on the insane lack of security mindset at some AI companies (episode #184)Nova DasSarma on the best current defences against well-funded adversaries, politically motivated cyberattacks, and exciting progress in infosecurity (episode #132)Bruce Schneier on whether AI could eliminate software bugs for good, and why it’s bad to hook everything up to the internet (episode #64)Nita Farahany on the dystopian risks of hacked neurotech (episode #174)Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (episode #194)Nathan Labenz on how even internal teams at AI companies may not know what they’re building (episode #176)Allan Dafoe on backdooring your own AI to prevent theft (episode #212)Tom Davidson on how dangerous “secret loyalties” in AI models could be (episode to be released!)Carl Shulman on the challenge of trusting foreign AI models (episode #191, part 2)Plus lots of concrete advice on how to get into this field and find your fitCheck out the full transcript on the 80,000 Hours website.Chapters:Cold open (00:00:00)Rob's intro (00:00:49)Holden Karnofsky on why infosec could be the issue on which the future of humanity pivots (00:03:21)Tantum Collins on why infosec is a rare AI issue that unifies everyone (00:12:39)Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI (00:16:23)Nova DasSarma on the best available defences against well-funded adversaries (00:22:10)Sella Nevo on why AI model weights are so valuable to steal (00:28:56)Kevin Esvelt on what cryptographers can teach biosecurity experts (00:32:24)Lennart Heim on the possibility of an autonomously replicating AI computer worm (00:34:56)Zvi Mowshowitz on the absurd lack of security mindset at some AI companies (00:48:22)Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices (00:49:54)Bruce Schneier on why it’s bad to hook everything up to the internet (00:55:54)Nita Farahany on the possibility of hacking neural implants (01:04:47)Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (01:10:48)Nova DasSarma on exciting progress in information security (01:19:28)Nathan Labenz on how even internal teams at AI companies may not know what they’re building (01:30:47)Allan Dafoe on backdooring your own AI to prevent someone else from stealing it (01:33:51)Tom Davidson on how dangerous “secret loyalties” in AI models could get (01:35:57)Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology (01:52:45)Nova DasSarma on politically motivated cyberattacks (02:03:44)Bruce Schneier on the day-to-day benefits of improved security and recognising that there’s never zero risk (02:07:27)Holden Karnofsky on why it’s so hard to hire security people despite the massive need (02:13:59)Nova DasSarma on practical steps to getting into this field (02:16:37)Bruce Schneier on finding your personal fit in a range of security careers (02:24:42)Rob's outro (02:34:46)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Katy Moore and Milo McGuireTranscriptions and web: Katy Moore
    --------  
    2:35:54
  • #213 – Will MacAskill on AI causing a “century in a decade” – and how we're completely unprepared
    The 20th century saw unprecedented change: nuclear weapons, satellites, the rise and fall of communism, third-wave feminism, the internet, postmodernism, game theory, genetic engineering, the Big Bang theory, quantum mechanics, birth control, and more. Now imagine all of it compressed into just 10 years.That’s the future Will MacAskill — philosopher, founding figure of effective altruism, and now researcher at the Forethought Centre for AI Strategy — argues we need to prepare for in his new paper “Preparing for the intelligence explosion.” Not in the distant future, but probably in three to seven years.Links to learn more, highlights, video, and full transcript.The reason: AI systems are rapidly approaching human-level capability in scientific research and intellectual tasks. Once AI exceeds human abilities in AI research itself, we’ll enter a recursive self-improvement cycle — creating wildly more capable systems. Soon after, by improving algorithms and manufacturing chips, we’ll deploy millions, then billions, then trillions of superhuman AI scientists working 24/7 without human limitations. These systems will collaborate across disciplines, build on each discovery instantly, and conduct experiments at unprecedented scale and speed — compressing a century of scientific progress into mere years.Will compares the resulting situation to a mediaeval king suddenly needing to upgrade from bows and arrows to nuclear weapons to deal with an ideological threat from a country he’s never heard of, while simultaneously grappling with learning that he descended from monkeys and his god doesn’t exist.What makes this acceleration perilous is that while technology can speed up almost arbitrarily, human institutions and decision-making are much more fixed.In this conversation with host Rob Wiblin, recorded on February 7, 2025, Will maps out the challenges we’d face in this potential “intelligence explosion” future, and what we might do to prepare. They discuss:Why leading AI safety researchers now think there’s dramatically less time before AI is transformative than they’d previously thoughtThe three different types of intelligence explosions that occur in orderWill’s list of resulting grand challenges — including destructive technologies, space governance, concentration of power, and digital rightsHow to prevent ourselves from accidentally “locking in” mediocre futures for all eternityWays AI could radically improve human coordination and decision makingWhy we should aim for truly flourishing futures, not just avoiding extinctionChapters:Cold open (00:00:00)Who’s Will MacAskill? (00:00:46)Why Will now just works on AGI (00:01:02)Will was wrong(ish) on AI timelines and hinge of history (00:04:10)A century of history crammed into a decade (00:09:00)Science goes super fast; our institutions don't keep up (00:15:42)Is it good or bad for intellectual progress to 10x? (00:21:03)An intelligence explosion is not just plausible but likely (00:22:54)Intellectual advances outside technology are similarly important (00:28:57)Counterarguments to intelligence explosion (00:31:31)The three types of intelligence explosion (software, technological, industrial) (00:37:29)The industrial intelligence explosion is the most certain and enduring (00:40:23)Is a 100x or 1,000x speedup more likely than 10x? (00:51:51)The grand superintelligence challenges (00:55:37)Grand challenge #1: Many new destructive technologies (00:59:17)Grand challenge #2: Seizure of power by a small group (01:06:45)Is global lock-in really plausible? (01:08:37)Grand challenge #3: Space governance (01:18:53)Is space truly defence-dominant? (01:28:43)Grand challenge #4: Morally integrating with digital beings (01:32:20)Will we ever know if digital minds are happy? (01:41:01)“My worry isn't that we won't know; it's that we won't care” (01:46:31)Can we get AGI to solve all these issues as early as possible? (01:49:40)Politicians have to learn to use AI advisors (02:02:03)Ensuring AI makes us smarter decision-makers (02:06:10)How listeners can speed up AI epistemic tools (02:09:38)AI could become great at forecasting (02:13:09)How not to lock in a bad future (02:14:37)AI takeover might happen anyway — should we rush to load in our values? (02:25:29)ML researchers are feverishly working to destroy their own power (02:34:37)We should aim for more than mere survival (02:37:54)By default the future is rubbish (02:49:04)No easy utopia (02:56:55)What levers matter most to utopia (03:06:32)Bottom lines from the modelling (03:20:09)People distrust utopianism; should they distrust this? (03:24:09)What conditions make eventual eutopia likely? (03:28:49)The new Forethought Centre for AI Strategy (03:37:21)How does Will resist hopelessness? (03:50:13)Video editing: Simon MonsourAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongCamera operator: Jeremy ChevillotteTranscriptions and web: Katy Moore
    --------  
    3:57:36
  • Emergency pod: Judge plants a legal time bomb under OpenAI (with Rose Chan Loui)
    When OpenAI announced plans to convert from nonprofit to for-profit control last October, it likely didn’t anticipate the legal labyrinth it now faces. A recent court order in Elon Musk’s lawsuit against the company suggests OpenAI’s restructuring faces serious legal threats, which will complicate its efforts to raise tens of billions in investment.As nonprofit legal expert Rose Chan Loui explains, the court order set up multiple pathways for OpenAI’s conversion to be challenged. Though Judge Yvonne Gonzalez Rogers denied Musk’s request to block the conversion before a trial, she expedited proceedings to the fall so the case could be heard before it’s likely to go ahead. (See Rob’s brief summary of developments in the case.)And if Musk’s donations to OpenAI are enough to give him the right to bring a case, Rogers sounded very sympathetic to his objections to the OpenAI foundation selling the company, benefiting the founders who forswore “any intent to use OpenAI as a vehicle to enrich themselves.”But that’s just one of multiple threats. The attorneys general (AGs) in California and Delaware both have standing to object to the conversion on the grounds that it is contrary to the foundation’s charitable purpose and therefore wrongs the public — which was promised all the charitable assets would be used to develop AI that benefits all of humanity, not to win a commercial race. Some, including Rose, suspect the court order was written as a signal to those AGs to take action.And, as she explains, if the AGs remain silent, the court itself, seeing that the public interest isn’t being represented, could appoint a “special interest party” to take on the case in their place.This places the OpenAI foundation board in a bind: proceeding with the restructuring despite this legal cloud could expose them to the risk of being sued for a gross breach of their fiduciary duty to the public. The board is made up of respectable people who didn’t sign up for that.And of course it would cause chaos for the company if all of OpenAI’s fundraising and governance plans were brought to a screeching halt by a federal court judgment landing at the eleventh hour.Host Rob Wiblin and Rose Chan Loui discuss all of the above as well as what justification the OpenAI foundation could offer for giving up control of the company despite its charitable purpose, and how the board might adjust their plans to make the for-profit switch more legally palatable.This episode was originally recorded on March 6, 2025.Chapters:Intro (00:00:11)More juicy OpenAI news (00:00:46)The court order (00:02:11)Elon has two hurdles to jump (00:05:17)The judge's sympathy (00:08:00)OpenAI's defence (00:11:45)Alternative plans for OpenAI (00:13:41)Should the foundation give up control? (00:16:38)Alternative plaintiffs to Musk (00:21:13)The 'special interest party' option (00:25:32)How might this play out in the fall? (00:27:52)The nonprofit board is in a bit of a bind (00:29:20)Is it in the public interest to race? (00:32:23)Could the board be personally negligent? (00:34:06)Video editing: Simon MonsourAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongTranscriptions: Katy Moore
    --------  
    36:50
  • #139 Classic episode – Alan Hájek on puzzles and paradoxes in probability and expected value
    A casino offers you a game. A coin will be tossed. If it comes up heads on the first flip you win $2. If it comes up on the second flip you win $4. If it comes up on the third you win $8, the fourth you win $16, and so on. How much should you be willing to pay to play?The standard way of analysing gambling problems, ‘expected value’ — in which you multiply probabilities by the value of each outcome and then sum them up — says your expected earnings are infinite. You have a 50% chance of winning $2, for '0.5 * $2 = $1' in expected earnings. A 25% chance of winning $4, for '0.25 * $4 = $1' in expected earnings, and on and on. A never-ending series of $1s added together comes to infinity. And that's despite the fact that you know with certainty you can only ever win a finite amount!Today's guest — philosopher Alan Hájek of the Australian National University — thinks of much of philosophy as “the demolition of common sense followed by damage control” and is an expert on paradoxes related to probability and decision-making rules like “maximise expected value.”Rebroadcast: this episode was originally released in October 2022.Links to learn more, highlights, and full transcript.The problem described above, known as the St. Petersburg paradox, has been a staple of the field since the 18th century, with many proposed solutions. In the interview, Alan explains how very natural attempts to resolve the paradox — such as factoring in the low likelihood that the casino can pay out very large sums, or the fact that money becomes less and less valuable the more of it you already have — fail to work as hoped.We might reject the setup as a hypothetical that could never exist in the real world, and therefore of mere intellectual curiosity. But Alan doesn't find that objection persuasive. If expected value fails in extreme cases, that should make us worry that something could be rotten at the heart of the standard procedure we use to make decisions in government, business, and nonprofits.These issues regularly show up in 80,000 Hours' efforts to try to find the best ways to improve the world, as the best approach will arguably involve long-shot attempts to do very large amounts of good.Consider which is better: saving one life for sure, or three lives with 50% probability? Expected value says the second, which will probably strike you as reasonable enough. But what if we repeat this process and evaluate the chance to save nine lives with 25% probability, or 27 lives with 12.5% probability, or after 17 more iterations, 3,486,784,401 lives with a 0.00000009% chance. Expected value says this final offer is better than the others — 1,000 times better, in fact.Ultimately Alan leans towards the view that our best choice is to “bite the bullet” and stick with expected value, even with its sometimes counterintuitive implications. Where we want to do damage control, we're better off looking for ways our probability estimates might be wrong.In this conversation, originally released in October 2022, Alan and Rob explore these issues and many others:Simple rules of thumb for having philosophical insightsA key flaw that hid in Pascal's wager from the very beginningWhether we have to simply ignore infinities because they mess everything upWhat fundamentally is 'probability'?Some of the many reasons 'frequentism' doesn't work as an account of probabilityWhy the standard account of counterfactuals in philosophy is deeply flawedAnd why counterfactuals present a fatal problem for one sort of consequentialismChapters:Cold open {00:00:00}Rob's intro {00:01:05}The interview begins {00:05:28}Philosophical methodology {00:06:35}Theories of probability {00:40:58}Everyday Bayesianism {00:49:42}Frequentism {01:08:37}Ranges of probabilities {01:20:05}Implications for how to live {01:25:05}Expected value {01:30:39}The St. Petersburg paradox {01:35:21}Pascal’s wager {01:53:25}Using expected value in everyday life {02:07:34}Counterfactuals {02:20:19}Most counterfactuals are false {02:56:06}Relevance to objective consequentialism {03:13:28}Alan’s best conference story {03:37:18}Rob's outro {03:40:22}Producer: Keiran HarrisAudio mastering: Ben Cordell and Ryan KesslerTranscriptions: Katy Moore
    --------  
    3:41:31

More Education podcasts

About 80,000 Hours Podcast

Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them. Subscribe by searching for '80000 Hours' wherever you get podcasts. Hosted by Rob Wiblin and Luisa Rodriguez.
Podcast website

Listen to 80,000 Hours Podcast, Begin Again with Davina McCall and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v7.13.1 | © 2007-2025 radio.de GmbH
Generated: 4/5/2025 - 3:02:28 AM