KubeFM

·S8 E5

Faster EKS Node and Pod Startup, with Jan Ludvik

February 17
21 mins

Episode Description

Kubernetes nodes on EKS can take over a minute to become ready, and pods often wait even longer — but most teams never look into why.

Jan Ludvik, Senior Staff Reliability Engineer at Outreach, shares how he cut node startup from 65 to 45 seconds and reduced P90 pod startup by 30 seconds across ~1,000 nodes — by tackling overlooked defaults and EBS bottlenecks.

In this episode:

  • Why Kubelet's serial image pull default quietly blocks pod startup, and how parallel pulls fix it

  • How EBS lazy loading can silently negate image caching in AMIs — and the critical path workaround

  • A Lambda-based automation that temporarily boosts EBS throughput during startup, then reverts to save cost

  • The kubelet metrics and logs that expose pod and node startup latenc,y most teams never monitor

Every second saved translates to faster scaling, lower AWS bills, and better end-user experience.

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

See all episodes