Navigated to Building Systems That Work Even When Everything Breaks with Ben Hartshorne

Building Systems That Work Even When Everything Breaks with Ben Hartshorne

January 15
36 mins

View Transcript

Episode Description

When AWS has a major outage, what actually happens behind the scenes? Ben Hartshorne, a principal engineer at Honeycomb, joins Corey Quinn to discuss a recent AWS outage and how they kept customer data safe even when their systems couldn't fully work. Ben explains why building services that expect things to break is the only way to survive these outages. Ben also shares how Honeycomb used its own tools to cut their AWS Lambda costs in half by tracking five different things in a spreadsheet and making small changes to all of them.


About Ben Hartshorne:
 

Ben has spent much of his career setting up monitoring systems for startups and now is thrilled to help the industry see a better way. He is always eager to find the right graph to understand a service and will look for every excuse to include a whiteboard in the discussion.

Show highlights: 

(02:41)Two Stories About Cost Optimization

(04:20) Cutting Lambda Costs by 50%

(08:01) Surviving the AWS Outage

(09:20) Preserving Customer Data During the Outage

(13:08) Should You Leave AWS After an Outage?

(15:09) Multi-Region Costs 10x More

(18:10) Vendor Dependencies

(22:06) How LaunchDarkly's SDK Handles Outages

(24:40) Rate Limiting Yourself

(29:00) How Much Instrumentation Is Too Much?

(34:28) Where to Find Ben


Links: 

Linkedin: https://www.linkedin.com/in/benhartshorne/

GitHub: https://github.com/maplebed


Sponsored by:
duckbillhq.com

See all episodes

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.