Media Summary: We discuss our new paper, "Natural emergent misalignment from Thanks to our friends at Future of Life Institute for supporting today's episode. To learn more about FOL and this year's winners, ... On May 5, 2016, Eliezer Yudkowsky gave a talk at

Stanford Cs221 I The Ai Alignment Problem Reward Hacking Negative Side Effects I 2023 - Detailed Analysis & Overview

We discuss our new paper, "Natural emergent misalignment from Thanks to our friends at Future of Life Institute for supporting today's episode. To learn more about FOL and this year's winners, ... On May 5, 2016, Eliezer Yudkowsky gave a talk at

Photo Gallery

Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023
What is Al "reward hacking"—and why do we worry about it?
Stanford CS221 I Encoding Human Values I 2023
Stanford CS221 | Autumn 2025 | Lecture 18: AI & Society
“An AI Tried to Commit Murder”? What the Tests REALLY Show (reward hacking, safety, governance)
The Dark Art of AI: Reward Hacking and Alignment Faking Explained
Scientists Discuss the AI Alignment Problem
AI History | Stanford CS221: AI (Autumn 2021)
ALIGNMENT PROBLEM DOES not EXIST But The Training Game is Broken .
Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start
Overview Artificial Intelligence Course | Stanford CS221: Learn AI (Autumn 2019)
Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)
Sponsored
Sponsored
View Detailed Profile
Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023

Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023

For more information about

What is Al "reward hacking"—and why do we worry about it?

What is Al "reward hacking"—and why do we worry about it?

We discuss our new paper, "Natural emergent misalignment from

Sponsored
Stanford CS221 I Encoding Human Values I 2023

Stanford CS221 I Encoding Human Values I 2023

For more information about

Stanford CS221 | Autumn 2025 | Lecture 18: AI & Society

Stanford CS221 | Autumn 2025 | Lecture 18: AI & Society

For more information about

“An AI Tried to Commit Murder”? What the Tests REALLY Show (reward hacking, safety, governance)

“An AI Tried to Commit Murder”? What the Tests REALLY Show (reward hacking, safety, governance)

An

Sponsored
The Dark Art of AI: Reward Hacking and Alignment Faking Explained

The Dark Art of AI: Reward Hacking and Alignment Faking Explained

ArtificialIntelligence #MachineLearning #AIsafety #AlignmentFaking #RewardHacking #LLM #Claude3 #Anthropic ...

Scientists Discuss the AI Alignment Problem

Scientists Discuss the AI Alignment Problem

Thanks to our friends at Future of Life Institute for supporting today's episode. To learn more about FOL and this year's winners, ...

AI History | Stanford CS221: AI (Autumn 2021)

AI History | Stanford CS221: AI (Autumn 2021)

For more information about

ALIGNMENT PROBLEM DOES not EXIST But The Training Game is Broken .

ALIGNMENT PROBLEM DOES not EXIST But The Training Game is Broken .

DESCRIPTION: What if the real

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

On May 5, 2016, Eliezer Yudkowsky gave a talk at

Overview Artificial Intelligence Course | Stanford CS221: Learn AI (Autumn 2019)

Overview Artificial Intelligence Course | Stanford CS221: Learn AI (Autumn 2019)

For more information about

Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)

Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)

For more information about