Episode Description
AI benchmarks saturate quickly, struggle to capture what we care about, and cost more than ever to build. But are they doomed? Greg Burnham, who leads Epoch's benchmarking team, and Tom Adamczewski, who developed MirrorCode, push back on the pessimism and dig into what the next generation of AI benchmarks could look like.