995: End-to-End Foundation Models for the Energy Industry, with Jazmia Henry

May 26
1h 9m

View Transcript

Episode Description

Jazmia Henry joins Jon Krohn to break down what it actually takes to build end-to-end foundation models for the energy industry. From wrangling decades of handwritten oil-and-gas documents into usable training data, to bespoke tokenizers, reinforcement learning, and inference at scale, Jazmia walks through every stage of the stack. Along the way she explains why reinforcement learning models are "bursty," what reward hacking is and how her Grounded Continuous Evaluation framework fixes it, and revisits the 2023 NeurIPS paper that argued, to widespread skepticism at the time, that scaling bad data degrades model performance.


Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.superdatascience.com/995⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠


Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.


In this episode you will learn:

  • (10:06) The User Agnosticism Tenet
  • (20:02) The Zillow Offers parable
  • (23:25) Why workflows should come before agents
  • (29:57) Why data engineering is the bedrock of AI
  • (52:41) Why velocity is the only durable moat
See all episodes