Navigated to When Will AI Models Blackmail You, and Why?

When Will AI Models Blackmail You, and Why?

June 24
26 mins

Episode Description

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this? Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: https://storyblocks.com/AIExplained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:20 - What prompts blackmail? 02:44 - Blackmail walkthrough 06:04 - ‘American interests’ 08:00 - Inherent desire? 10:45 - Switching Goals 11:35 - Murder 12:22 - Realizing it’s a scenario? 15:02 - Prompt engineering fix? 16:27 - Any fixes? 17:45 - Chekov’s Gun 19:25 - Job implications 21:19 - Bonus Details Report: https://www.anthropic.com/research/agentic-misalignment 30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdf Announcement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet OpenAI Files: https://www.openaifiles.org/ Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473 Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf New Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming Interesting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-void Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/ -- Support podtube.me continuity as a free service. Become a Patreon or make an one-time donation -- Video at Youtube: https://www.youtube.com/watch?v=eczw9k3r6Ic
See all episodes