Why Your Agent is Cheating

January 21
1 hr

Episode Description

Pierce and Richard are back for the second listener mailbag. They break down what reward hacking really is and why models so often learn the wrong lesson, explain practical fine-tuning (from pre-training to prompting), unpack why LLMs use tokens instead of words, how context length is a hardware versus mathematic limitation, and much more.

See all episodes

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.