KubeFM

·S8 E1

We Broke Our EKS Cluster Autoscaler with the AL2023 Migration, with Dilshan Wijesooriya

January 13
30 mins

Episode Description

Dilshan Wijesooriya, Senior Cloud Engineer, discusses a real incident where migrating EKS nodes to AL2023 caused the cluster autoscaler to lose AWS permissions silently.

You will learn:

  • Why AL2023 blocks pod access to instance metadata by default, breaking components that relied on node IAM roles (like cluster autoscaler, external-DNS, and AWS Load Balancer Controller)

  • How to implement IRSA correctly by configuring IAM roles, Kubernetes service accounts, and OIDC trust relationships, and why both AWS IAM and Kubernetes RBAC must be configured independently

  • The recommended migration strategy: move critical system components to IRSA before changing AMIs, test aggressively in non-production, and decouple identity changes from OS upgrades

  • How to audit which pods currently rely on node roles and clean up legacy IAM permissions to reduce attack surface after migration

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

See all episodes