Powered by RND
PodcastsTechnologyAI Safety Fundamentals: Alignment
Listen to AI Safety Fundamentals: Alignment in the App
Listen to AI Safety Fundamentals: Alignment in the App
(7,438)(250,057)
Save favourites
Alarm
Sleep timer

AI Safety Fundamentals: Alignment

Podcast AI Safety Fundamentals: Alignment
BlueDot Impact
Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment

Available Episodes

5 of 83
  • Constitutional AI Harmlessness from AI Feedback
    This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators.Everything in this paper is relevant to this week's learning objectives, and we recommend you read it in its entirety. It summarises limitations with conventional RLHF, explains the constitutional AI approach, shows how it performs, and where future research might be directed.If you are in a rush, focus on sections 1.2, 3.1, 3.4, 4.1, 6.1, 6.2.A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
    --------  
    1:01:49
  • Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
    This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators.Everything in this paper is relevant to this week's learning objectives, and we recommend you read it in its entirety. It summarises limitations with conventional RLHF, explains the constitutional AI approach, shows how it performs, and where future research might be directed.If you are in a rush, focus on sections 1.2, 3.1, 3.4, 4.1, 6.1, 6.2.A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
    --------  
    32:19
  • Illustrating Reinforcement Learning from Human Feedback (RLHF)
    This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is applied to neural networks.While reading, consider which parts of the technical implementation correspond to the 'values coach' and 'coherence coach' from the previous video.A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
    --------  
    22:32
  • Eliciting Latent Knowledge
    In this post, we’ll present ARC’s approach to an open problem we think is central to aligning powerful machine learning (ML) systems: Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.In these cases, the prediction model “knows” facts (like “the camera was tampered with”) that are not visible on camera but would change our evaluation of the predicted future if we learned them. How can we train this model to report its latent knowledge of off-screen events?We’ll call this problem eliciting latent knowledge (ELK). In this report we’ll focus on detecting sensor tampering as a motivating example, but we believe ELK is central to many aspects of alignment. Source:https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
    --------  
    1:00:27
  • Deep Double Descent
    We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction.Source:https://openai.com/research/deep-double-descentNarrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
    --------  
    8:27

More Technology podcasts

About AI Safety Fundamentals: Alignment

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment
Podcast website

Listen to AI Safety Fundamentals: Alignment, Hard Fork and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

AI Safety Fundamentals: Alignment: Podcasts in Family

Social
v7.1.1 | © 2007-2024 radio.de GmbH
Generated: 12/21/2024 - 8:34:57 AM