Episode Description
Training gets the headlines.
Inference is where the money is.
In Episode 37 of Built This Week, we sit down with Mitesh, CEO of Positron AI, to break down one of the biggest bottlenecks in AI today: inference infrastructure.
While the world focuses on trillion-parameter models and frontier labs, the real constraint isn’t intelligence — it’s memory, bandwidth, energy, and cost.
We cover:
• Why inference is where 90% of AI spend happens
• The memory wall problem in large models
• Why GPUs weren’t designed for text generation
• How Positron is building terabyte-plus memory chips
• The economics of 10 trillion parameter models
• Why memory bandwidth utilization matters
• Why CPUs are suddenly back in demand
• The difference between speed-optimized and cost-optimized AI systems
• The slider bar future of AI infrastructure
We also dive into:
• OpenAI’s $122B valuation
• Anthropic vs OpenAI secondary market dynamics
• Why Nvidia isn’t going anywhere
• Why commodity memory might beat premium stacks in certain use cases
• The rise of agentic workflows and what that means for compute
If you care about the future of AI, silicon, infrastructure, or trillion-dollar companies — this episode is for you.
New episodes every Friday.
⏱ TIMESTAMPS
(0:00) Why inference is the real AI bottleneck
(2:00) What Positron AI is building
(4:30) The memory problem in trillion-parameter models
(6:30) Why GPUs struggle with inference economics
(9:00) Energy, bandwidth, and supply chain constraints
(12:00) Memory capacity vs memory speed tradeoffs
(16:00) The “slider bar” model of AI infrastructure
(18:30) OpenAI’s $122B valuation discussion
(21:00) Anthropic vs OpenAI secondary markets
(23:30) CPUs making a comeback
(26:00) Agentic workflows and compute demand explosion
(28:00) Closing thoughts on AI infrastructure