Maximizing GPU Utilization: Heterogeneous Pipelines with Ray and Kubernetes

May 6
58 mins

Episode Description

Summary
In this episode Robert Nishihara, co-founder of Anyscale and co-creator of Ray, talks about maximizing hardware utilization for AI and data-intensive workloads. He explores Ray’s evolution alongside Kubernetes and PyTorch, and why consolidation at these layers has enabled a new generation of complex, heterogeneous workloads. Robert explains how data preparation has shifted to GPU- and inference-heavy, multimodal pipelines; where Ray fits compared to Spark and workflow orchestrators; and why Ray excels at composing heterogeneous pools of compute, handling failures, and scaling complex systems like multi-node LLM inference and reinforcement learning. He digs into practical strategies for boosting GPU utilization across training and inference, elasticity and prioritization of workloads, topology-aware scheduling, and the importance of fast failure recovery as hardware scales from nodes to racks. If you’re wrestling with expensive GPUs, multimodal data curation, or cross-node LLM inference, this conversation offers concrete mental models and architectural guidance.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Your host is Tobias Macey and today I'm interviewing Robert Nishihara about the challenges of maximizing the utility of your available hardware for AI applications
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of the major contributors to wasted or idle compute?
  • Why does it matter if the available compute isn't being maximized?
  • What are some of the typical ad-hoc methods that teams might use to try to get the most out of their available hardware (especially GPUs)? 
  • What are the most interesting, innovative, or unexpected ways that you have seen Ray used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Ray and distributed compute for data and AI?
  • When is Ray the wrong choice?
  • What do you have planned for the future of Ray?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
See all episodes