Navigated to Claude Opus 4.5: Model Card, Alignment and Safety

Claude Opus 4.5: Model Card, Alignment and Safety

November 28
1h 13m

View Transcript

Episode Description

Podcast episode for Claude Opus 4.5: Model Card, Alignment and Safety.

* 00:00:00 - Introduction

* 00:01:50 - Claude Opus 4.5 Basic Facts

* 00:03:26 - Claude Opus 4.5 Is The Best Model For Many But Not All Use Cases

* 00:06:02 - Misaligned?

* 00:09:39 - Section 3: Safeguards and Harmlessness

* 00:11:46 - Section 4: Honesty

* 00:13:27 - 5: Agentic Safety

* 00:21:01 - Section 6: Alignment Overview

* 00:29:55 - Alignment Investigations

* 00:30:35 - Sycophancy Course Correction Is Lacking

* 00:31:52 - Deception

* 00:34:29 - Ruling Out Encoded Content In Chain Of Thought

* 00:37:19 - Sandbagging

* 00:38:10 - Evaluation Awareness

* 00:42:18 - Reward Hacking

* 00:43:59 - Subversion Strategy

* 00:45:30 - 6.13: UK AISI External Testing

* 00:45:39 - 6.14: Model Welfare

* 00:46:33 - 7: RSP Evaluations

* 00:48:12 - CBRN

* 00:56:36 - Autonomy

* 01:04:27 - Cyber

* 01:10:32 - The Whisperers Love The Vibes

The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.

https://open.substack.com/pub/thezvi/p/claude-opus-45-model-card-alignment?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false



Get full access to DWAtV Podcast at dwatvpodcast.substack.com/subscribe
See all episodes

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.