Why Tejal Patwardhan stopped underestimating the models - Episode 21

June 16
44 mins

View Transcript

Episode Description

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.


Chapters


00:00:24 Growing up at OpenAI

00:03:10 Why reasoning changed everything

00:06:28 What made o1 surprising

00:11:20 Why old benchmarks stopped working

00:14:45 What makes a good benchmark

00:17:35 Why evals are getting harder

00:22:09 Measuring voice and vision models

00:24:48 Testing models on real science

00:33:23 How OpenAI tracks frontier progress

00:40:47 What AI means for work



Hosted on Acast. See acast.com/privacy for more information.

See all episodes