Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

May 1

42 mins

View Transcript

Episode Description

Baseten CEO and co-founder Tuhin Srivastava sits down with Sarah Guo and Elad Gil to discuss the rapid growth of AI inference demand, Baseten’s 30x growth, and why inference is becoming the strategic “last market.” Tuhin Srivastava argues the application layer will persist because companies with unique user signals can encode value into workflows and post-train specialized models, citing examples like Abridge and support workflows. The conversation covers GPU capacity constraints, Baseten’s multi-cloud fabric across 18 clouds and 90 clusters, long-term contracting dynamics, the importance of the software layer for stickiness, evolving workloads, multichip possibilities, and operational lessons at scale.

Chapters:

00:31 Baseten growth

01:55 Why the app layer wins

05:57 Serving frontier customers

07:55 Open source model mix

09:21 Chinese models and geopolitics

13:07 Custom inference dominates

14:22 Post training acquisition

17:10 When to invest in custom models

18:35 Supply crunch and data centerse

22:25 Longer GPU Contracts

24:09 What Makes a Winner

26:07 Multi Chip Future

28:19 Runtime Roadmap

31:08 Scaling Edge Cases

33:48 Hiring and Leadership

36:44 Operations Pager Culture

38:19 Efficiency Drives Demand

40:41 Concierge Everything Future

42:34 Conclusion

See all episodes

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

View Transcript

Episode Description

Never lose your place, on any device