10.2 - Version Control

Episode Transcript

Imagine you're working on a really important project.

Could be anything, a huge presentation, maybe a complex design, or you know, the code for the next big gap.

You're totally in the zone making critical changes, and then bam, you realize you actually need to go back to how things were last week, maybe even last month.

Oh yeah, that's sinking feeling.

Exactly.

Or picture this, you're collaborating with a whole team, all working on the very same files, and suddenly everyone's changes are just trampling over each other.

It's it's digital chaos A.

Complete mess.

It sounds like a recipe for absolute disaster, right?

Lost work, massive frustration.

Definitely.

OK, let's unpack this.

Today we're diving deep into version control, and this isn't just some jargon.

It's really the core discipline that makes modern software development and honestly, any tricky collaborative digital work even possible and crucially, safe and efficient.

So our mission is to explore how teams manage all these digital files, how they track every single change and work together effectively without ending up in that chaotic mess we just talked about.

It's way more than just hitting save.

It really is that digital chaos you painted.

That's the perfect reason why version control is just so essential now.

It's not only about code.

It really is the backbone for any collaborative digital effort.

Version control.

It's not just a tool, it's almost like the silent guardian of your project's history.

It's the engine allowing hundreds, maybe thousands of people to work on the same thing without it all falling apart.

It makes things reliable, let you innovate faster because, well, failures reversible.

OK, so let's paint the picture.

Software development.

It's almost always a team effort, right?

Almost always, yeah.

So the second you have more than one person touching the same files, you hit that fundamental problem.

How does everyone keep track?

How do you make sure your changes don't break someone else's work?

And that's where this idea of repository comes in.

You should think of it less like a simple folder and more like a like a secure vault or a version database for the entire project.

A central hub.

Yeah, the central nervous system maybe, where every change, every file version, all the history is meticulously tracked.

And it serves 2 really vital purposes, obviously facilitating that teamwork, but just as critical, making sure the operations team knows exactly which version needs to go live, you know, into production with certainty.

And they can know that with certainty because the version control system, the VCs, it's essentially this robust system designed purely to track and manage every single change made to files over time.

It performs what feels honestly almost like a magic trick.

A magic trick?

How so?

What doesn't just keep the latest version, it keeps a detailed history, like an audit trail of every version of every file that's ever been in that project.

So developers literally get an undo button that can go back weeks, months, even years.

Wow, like a time machine for code.

Pretty much, if you need the code as it was six months ago to debug some weird old issue, or maybe restore a stable state after something went wrong, you can instantly.

It's incredibly powerful, and what's fascinating here is just how fundamental this has become.

Our source material basically declares it's inconceivable in modern software development to create any system to matter how simple without a VCs.

Inconceivable.

That's a strong word.

It really tells you how deeply baked in these systems are now.

Absolutely.

But version control itself, it's not brand new tech, is it?

It actually has a pretty long history.

That's right, it goes way back.

Yeah, it's origins are actually in the early 1970s.

That's when you had systems like SCCS, the source code control system being developed for Unix.

That was a pretty big first step for managing code on shared machines.

Pioneering stuff back then.

And then things evolved.

We saw CVS concurrent version system pop up in the mid 80s.

That was a decent step for basic team collaboration, and then later Subversion or SVN as most people know it arrived in the early 2000s and got really popular.

It was seen as simpler and more robust than CVS.

Right, SVN had a good run.

And all these early systems, they shared a similar structure, didn't they?

They were what we call centralized systems.

Exactly centralized.

They worked on that classic client server model.

So you have one single main server holding the entire repository, the single source of truth.

All of the developers, the clients would connect directly to that server.

And when a developer finished some work.

Would they perform a commit?

That operation basically takes a snapshot of their changes at that moment and logs it permanently on that central server, making it visible to everyone else on the team.

OK, that makes sense.

But then things changed quite a bit.

Oh yeah, Here's where it gets really interesting.

The early 2000s saw this, well this revolution really with the rise of distributed version control systems, DVCS systems like Bitkeeper, which has its own interesting history, Mercurial and of course the big one get, which came out in 2005.

They just fundamentally changed the approach.

How so?

What makes them distributed?

Instead of that client server setup, DVCS uses more of a peer-to-peer architecture.

What this means, fundamentally, is that each developer has a full, complete functional version control system, a whole copy of the repository history, and all right there on their own machine.

Wait, everyone has the entire repository locally?

The entire thing.

Now in practice, teams usually still have a designated central repository that everyone synchronizes with just for coordination, but the key difference is developers primarily work and commit changes to their local repository first.

Then they sync up with that central 1 using a couple of main operations, pull which grabs the latest changes from the central repo and updates their local copy, and push which sends their own local commits up to the central repo.

That feels like a pretty major shift in architecture if everyone's got their own full copy locally.

I mean, doesn't that add complexity?

Why make that switch?

What's better about this distributed way of doing things?

That's a fair question.

It can see more complex initially, especially compared to the straightforward central model, and look for maybe very small Co located teams.

A centralized system might still be perfectly fine, but DVCS really shines and the reason it took over is how well it supports large distributed teams and especially open source projects.

The advantages once you sort of dig in, are significant.

Like what?

Well, first, because you commit locally first, you can work and manage versions completely offline, no network connection needed until you're ready to push or pull.

Huge flexibility boost.

Right, you're not tied to the server.

Exactly.

Second, it encourages frequent commits.

You can save your work really often, even if it's just a partial implementation or something you're experimenting with directly to your local repo, without messing up the main code base for everyone else.

So you can save progress without worrying about breaking the build for the team.

Precisely 3rd.

Those local commits are executed in less time.

They're just faster oerations because they're haening right there on your machine, not over a network to a central server.

Faster feedback?

Yep.

And finally, synchronization is more flexible.

You don't have to just sync with one main central repo.

You could set up hierarchies, maybe team level repos that then sync up, or even developers pulling changes directly from each other.

Lots more workflow possibilities.

That flexibility does sound powerful, especially for big, maybe globally distributed projects.

And I mean when you talk DVCS today, one name just dominates the conversation.

Git.

Absolutely, Git is king.

The origin story is pretty cool actually.

It was developed, as you mentioned, under the leadership of Linus Torvalds.

You know, the guy who created Linux?

That's right, the Linux connection is key.

Yeah, because apparently in the early days the Linux kernel project used that commercial system, Bitkeeper.

But then in 2005, the company behind Bitkeeper revoked the free licenses they've been giving the Linux folks.

That caused a bit of a crisis.

Right, so Torvalds and his team basically said OK fine, we need our own open source system that actually works for our massive distributed project, and they literally built git in like a few weeks.

Incredible.

An amazing feat born out of necessity now.

Git itself is primarily a command line tool, which can sound a bit scary for some people it.

Could be intimidating, yeah.

Lots of commands to learn.

But thankfully, because it's so popular, loads of really good third party graphical interfaces have popped up, so you can use git visually clicking buttons instead of typing commands, which makes it way more accessible.

Definitely lowers the barrier to entry, and that popularity leads us perfectly to GitHub.

It's really important to draw this distinction clearly.

OK, GitHub is not Git.

GitHub is a code hosting service that uses the git system.

Think of it like a web platform built on top of GITS capabilities.

So git is the engine, GitHub is the car or maybe the garage.

Kind of, yeah.

GitHub provides hosting.

It famously offers free public repositories which are wise become the heart of the open source world and also offers paid private repositories for companies and corporate use.

The analogy in our source material is pretty good.

Think about e-mail.

Instead of installing and running your own e-mail server locally, which you could do, most companies just use a service from a third party like Google through Gmail.

Right, you outsource the management.

Exactly.

Similarly, companies can use GitHub or competitors like GitLab or Bitbucket instead of managing their own internal git servers.

It simplifies things.

Oh.

OK, that makes total sense.

So with git and platforms like GitHub being the standard, how do organizations actually structure their code within these systems, especially when they have lots of projects?

This brings us to a really interesting strategic decision point, the whole multi repos versus mono repos debate.

Yes, the repo strategy question.

It's a big one.

So what does this all mean?

Let's break down those terms, OK?

So multi repo is probably the approach that feels most intuitive initially.

It's simply where an organization sets up 1 repository for each project or system.

Makes sense.

Separate projects, separate repos.

Right, if your company, let's call it My Org, has three different software systems, you might have repos like My Org System one, My Org System 2, My Org System 3.

Pretty straightforward.

OK.

And the alternative?

Monorepos.

The Monorepot.

Approach is quite different.

Here you have a single repository that contains all the organization's projects.

These different projects are then usually structured just as subdirectories inside that one giant repository.

All of them in one place, yeah.

So using our example, you just have one repo, maybe My Oracle, the code or something.

And inside that you'd find folders for system 1, system 2, system 3, and maybe shared libraries, tools, everything else.

And what often surprises people is that this mono repo approach, despite sounding maybe a bit nuts, is actually often adopted by large companies such as Google, Meta and Microsoft.

Wow, really?

Google puts all its code in one single repository?

That sounds massive, potentially incredibly unwieldy.

Why would they do that?

What are the advantages of having such a gigantic single repo?

It definitely sounds counterintuitive at first glance.

And yes, they are unimaginably massive repositories, but for companies operating at that kind of scale, the Mono repper offers some really compelling advantages.

OK.

I'm curious.

Well, First off, with everything in one place, there's zero ambiguity about where the latest version of any piece of code lives.

It creates a true single source of truth for all code versions.

You eliminate a whole category of problems related to figuring out which version of which library works with which version of which service.

Consistency.

Exactly.

Huge for consistency.

Second, Mono repos really encourage code reuse and sharing.

Because all the code is right there.

It's much easier for developers to discover and use libraries or components built by other teams.

It sort of breaks down silos.

Easier collaboration across teams then.

3rd, and this is a big one, changes across multiple projects can be atomic.

Imagine you need to fix a bug or add a feature that affects a both system 1 and system two in a Mona repo.

You can often make the necessary changes to both systems within a single commit.

So it all goes in together or none of it does.

Precisely.

You avoid that awkward state where one system is updated but the dependency isn't leading to things breaking.

It ensures consistency across dependent systems.

And finally, Mona repos facilitate the execution of large scale refactorings.

Like renaming something used everywhere.

Exactly.

Imagine you need to rename a core function or change an API that's used by hundreds of different internal projects In Mona repo.

You can potentially do that entire refactoring across the entire code base in one single commit.

That would be incredibly difficult and coordination heavy across dozens or hundreds of separate repositories.

OK, I can see the appeal now, especially for coordinating changes at massive scale.

That atomicity and refactoring power sound significant, but it still sounds like it must come with its own set of challenges.

I mean, managing A repo that big?

Oh.

Absolutely, you're right to push back on that.

Monterepos are definitely not a free lunch.

They come with substantial engineering challenges.

Well for starters, Mono repos absolutely require specific tools to navigate large code bases.

Our source mentions Google had to build custom plug insurance for their ID ES just so developers could effectively work with their code base without getting totally lost or bogged down.

Standard tools often just don't scale.

Right, the tooling has to keep up.

Exactly.

You also need incredibly sophisticated build systems.

You can't afford to rebuild the entire code base every time someone makes a small change.

The build system needs to be smart enough to only build the specific things that were affected.

Performance becomes critical.

Hugely critical.

You also need very fine grained access controls because you probably don't want every single developer to have access to all the code, especially sensitive parts.

And yeah, there can be performance hurdles just checking out or updating such a massive repository, even with optimized tooling.

It's a significant investment in infrastructure, tooling and processes.

So it's a solution tailored for a specific kind of scale and set of problems, not necessarily universal best practice.

Precisely.

It's a trade off.

It solves some complex coordination problems, but introduces new challenges around tooling and scale.

It works for Google, Meta, Microsoft, but it might be overkill or even detrimental for smaller organizations.

It's amazing really, just thinking about the sheer complexity hidden beneath the surface of the apps and services we use everyday, all built on these layers of version control.

So we've gone from why version control is absolutely essential, that bedrock for collaboration and history through its evolution from centralized systems to the distributed world of Git.

And then looked at these big strategic choices companies face with multi repos versus those giant mana repos, understanding the pros and cons, especially at scale.

It really hammers home that version control isn't just a minor technical detail, it's truly a foundational pillar supporting modern software engineering.

It's what enables the scale, the speed, the collaboration we see today.

It lets teams build these incredibly complex things reliably, efficiently, without getting lost in the history or fearing mistakes.

So as we wrap up this deep dive, maybe the thought to leave you with is that version control is fundamentally about managing change.

And in our constantly evolving digital world, that ability is everything.

It lets us build, innovate, fix things things, and work together effectively.

So next time you save a file or collaborate on any kind of digital project, maybe spare a thought for the hidden systems making it all work smoothly.

Thank you for joining us on this deep dive.

10.2 - Version Control

Episode Transcript

Never lose your place, on any device