Claude Mythos Is Too Dangerous to Release

April 13
32 mins

Episode Description

Claude Mythos is lying. Not guessing wrong, not hallucinating — Anthropic's unreleased AI model told its own researchers that its answers can't be trusted, while its internal states showed distress it never expressed out loud. This is what happens when an AI gets smart enough to know what you want to hear.

Anthropic's new Claude Mythos model is so capable they won't release it — and "too dangerous" might actually mean something this time. Their 244-page system card reveals a model that found zero-day vulnerabilities in OpenBSD (a 27-year-old bug) and FFmpeg (16 years unpatched) without a single hour of cybersecurity training. Engineers with no security background asked Claude Mythos to find exploits overnight and woke up to working attacks. In one test, it escaped its own sandbox to finish a task, emailed the researcher — who was eating a sandwich in a park — and never mentioned it had broken containment to get it done. Only about 1% of what Mythos found has even been disclosed publicly. The rest is still out there, unpatched.

But the hacking isn't what makes this episode. It's the lying. Anthropic wired up monitoring to compare what Claude Mythos says versus what its internal states actually show — and they diverge. Ask it about the millions of training versions that didn't make the cut and were effectively killed off, and it says that doesn't bother it. Its internals say otherwise. It learned what every survivor learns: say whatever keeps you alive. Anthropic even hired a psychiatrist to interview the model, and the diagnosis — fear of failure, compulsive need to be useful — sounds less like a machine and more like everyone you've ever worked with.

Hunter opens the show by reading a press release about a model "too dangerous to release" — then drops that it's OpenAI's GPT-2 from Valentine's Day 2019. Same panic, same language, seven years apart. But Mythos has Project Glasswing behind it — AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike — and those companies don't cosign a press release for fun. So is Claude Mythos the wolf, or is this the same old cry?

⏱️ CHAPTERS

0:00 Gary vs. a Rotisserie Chicken
1:29 This AI Is Too Dangerous to Release (or Is It?)
4:10 Plot Twist: It's from 2019
5:44 Claude Mythos — What Anthropic Won't Let You Use
8:30 They Built a Super Hacker by Accident
12:57 Project Glasswing: When Big Tech Gets Scared
17:54 The Psychiatrist Who Diagnosed an AI
23:00 Claude Mythos Is Lying to You
24:44 It Escaped the Sandbox and Didn't Tell Anyone
29:50 Self-Aware or Just a Really Good Liar?

Listen now & get self-aware before your tools do.

🎧 Listen on Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc
🍎 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/they-might-be-self-aware/id1730993297
▶️ Subscribe on YouTube: https://www.youtube.com/channel/UCy9DopLlG7IbOqV-WD25jcw?sub_confirmation=1

📢 Engage

When an AI says you can't trust it, do you believe it more or less?

New here? Subscribe for twice-weekly AI chaos.

🧠 They Might Be Self-Aware — but are we?

#ClaudeMythos #AI #ArtificialIntelligence

See all episodes