From Notebooks to Production: Xorq’s lockfile Approach for Reproducible, Portable ML Pipelines

January 29
57 mins

Episode Description

In this episode, Hussain shares the story behind xorq: a “lockfile for ML pipelines” that makes notebook work easier to reproduce, debug, and ship. We talk about why the research→production path is still so manual, how schemas (and Arrow) become the contract between systems, and what it takes to run the same pipeline across engines like Snowflake and Databricks. We also dig into escape hatches for imperative code, why feature stores didn’t become the default, and how xorq fits alongside other technologies like Iceberg.

Chapters

00:00 Hussain's Journey in Data Science

06:00 The Need for xorq: Bridging Research and Production

10:38 Challenges in Machine Learning Deployment

17:40 The Role of Lock Files in Data Pipelines

29:51 Understanding Schema Management in Data Systems

34:40 Navigating Declarative and Imperative Transformations

36:39 The Developer's Journey with xorq

38:34 Feature Stores vs. xorq: A Comparative Analysis

43:43 The Future of Feature Stores and Machine Learning

51:41 Reproducibility in Data Pipelines: xorq vs. Git-like Operations

55:47 The Future of xorq and the Data Ecosystem

See all episodes

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.