Navigated to Takes on "Alignment Faking in Large Language Models"

Takes on "Alignment Faking in Large Language Models"

Dec 18, 2024
1h 27m

Episode Description

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/

See all episodes