Can we really trust reasoning

January 7
51 mins

Episode Description

Pierce and Richard cover the news that dropped over the holiday break. Getting breaking news incorporated within chatbots, OpenAI's "code red" over Google's Gemini 3, benchmarking the reliability of chain of thought to introspect model behavior, and a review of Claude Skills.

Further reading:
- https://www.wired.com/story/us-invaded-venezuela-and-captured-nicolas-maduro-chatgpt-disagrees
- https://fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini-ceo-sundar-pichai/
- https://openai.com/index/evaluating-chain-of-thought-monitorability/
- https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview

See all episodes

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.