Navigated to Squashing Compilers - Transcript

Squashing Compilers

Episode Transcript

Matt Godbolt

Hey, Ben.

Ben Rady

Hey, Matt.

Matt Godbolt

I left that a bit long, didn't I, this time?

That was a bit later in the intro.

I was like, oh, because I was chatting to you pre-show and then, yeah, got distracted.

Ben Rady

How long could you go not saying that before people would just be like, is this thing broken?

Matt Godbolt

Are we on the right podcast here?

Ben Rady

Is this the right podcast?

I don't even know.

Matt Godbolt

I have a funny story about that.

Maybe we've already said it, but one of my pals, one of my C++ pals, his name is Ben, Ben Deane, right?

And he's also British.

And so he was listening to the first episode of the podcast and he just nearly fell off his chair because the first thing I do is go, hi, Ben.

And he's like, how?

What?

I mean, I'm sure that other people called Ben exist.

I appreciate that.

But it was just him telling the story that made me laugh.

Ben Rady

So it's like, did I forget to leave a phone call or a Google Meet somewhere?

And it's just Matt being like, yeah, hi.

Matt Godbolt

Now someone is just talking to me.

Hey, you bum dialed me.

Ben Rady

You left your phone on.

Matt Godbolt

Yeah.

Ben Rady

Yeah, exactly.

Matt Godbolt

We've all been there shouting as loud as we can, trying to get someone's attention.

But yeah, not that.

But anyway, I wonder how long we could get away with.

But it's what we do.

It's what we say.

Oh, another thing.

Ben Rady

Mm-hmm.

Matt Godbolt

I met somebody who is a listener or our listener, singular.

Ben Rady

The listener.

You met the listener.

Matt Godbolt

And it occurred to me that's about the fourth person now who's told me that they are our listener.

Ben Rady

Oh.

Matt Godbolt

So we might have to accept that we have more than one now.

Ben Rady

So we have four listeners.

Matt Godbolt

It's so funny.

Given how the internet is all about tracking and every YouTube video tells you how many people have watched it and all that kind of good stuff, it's really hard to know how many listeners there really are out there for us.

Ben Rady

Oh yeah.

Because...

Matt Godbolt

It seems broken in the podcast world.

Ben Rady

Because of one of the classic problems of computer science, cache invalidation.

Matt Godbolt

Yeah.

Ben Rady

Yeah.

Matt Godbolt

Everyone wants to cache your...

I mean, I suppose that's the thing, like videos, nobody would want to cache gigabytes of videos, but you're like, hey, a couple of megs of MP3 file, sure.

Ben Rady

Right.

Matt Godbolt

Spotify, we'll take a copy of that and we'll hand it out to everyone when they press play.

Ben Rady

Yeah.

Matt Godbolt

And then we get you to try and sign up for our Spotify podcast publisher account that then lets you see how many plays you've had on Spotify.

That's great.

But you can also listen to it on Apple and Google and YouTube and off of our website.

And you can pay, there are these places that will charge you decent amounts of money to tell you how many people listen to your podcast.

But I'm just like, somebody should write a web scraper that goes to all of these places and gets them all in one place, right?

And by somebody, I mean, probably an LLM that I could just tell it to do that thing.

Ben Rady

Yeah.

Matt Godbolt

Anyway, that's not what we're here to talk about today.

Ben Rady

No, we're not.

What are we talking about today?

Matt Godbolt

We're talking about the last three days of my life, which have been supposedly preparing for conference presentations, which I have in September.

Ben Rady

Oh, okay.

Matt Godbolt

I've got three conference presentations coming up, which is great.

And while you're sort of writing slides and throwing out ideas, you're kind of like, well, I want a background window that I can kind of just tap into occasionally.

And there's tons of stuff in Compiler Explorer.

which is just like, hey, run this thing, wait for it to finish, have a look at the output, see if it makes sense, push it to production if it does, updating libraries, that kind of good stuff.

And so we've hit August now.

which I believe will be actually when this podcast comes out in a rare departure from the norm.

Ben Rady

Yeah, that's right.

Matt Godbolt

But we've hit August, and I've long since had a calendar reminder to say, hey, we should really upgrade to Ubuntu 24 for all of the production nodes for Compiler Explorer.

Ben Rady

Mm-hmm.

Matt Godbolt

And, you know, we tend to drag our feet a little bit because we've been bitten before.

Ben Rady

Mm-hmm.

Matt Godbolt

this is where we should see a big foreshadowing thing that should pop up here.

Ben Rady

Oh, boy.

Matt Godbolt

And in fact, 10 years ago, we were bitten by an issue.

I was doing some research trying to work out what happened and I found my own blog post from 10 years ago.

Ben Rady

Oh, wow.

Yeah.

Matt Godbolt

You know, that kind of thing where you're like, oh, who had this problem?

Ben Rady

Yes.

Matt Godbolt

Oh, I had this problem.

Ben Rady

I did.

Former me.

Uh-huh.

Matt Godbolt

So anyway, how hard can it be to upgrade the operating system?

We have everything scripted through the wazoo.

We use Packer to build our images, all of our images, apt install all the things they need and then they shut themselves down.

Great.

And then we make an image, a machine image out of that.

And that's what starts up and is a Compiler Explorer node.

Pretty straightforward.

And we do this fairly often because if you don't, then it rots and you're relying on a ton of stuff that's, you know, install this from this random website.

Ben Rady

Right.

Yeah.

Matt Godbolt

So anyway, that wasn't a problem.

So in theory, it was just change a 22 to 24 and then rerun the Packer.

Ben Rady

Mm-hmm.

Matt Godbolt

And

Ben Rady

Mm-hmm.

Matt Godbolt

it failed the first time for an easy to diagnose reason.

Ben Rady

Yeah.

Matt Godbolt

Luckily I'd already hit this before because I started a 24 upgrade for a different part of the system and hit it.

I was like, oh, I must remember this.

And of course, completely failed to remember to do it this time around.

Which was to do with the way AppArmor has been updated and some of the jailing things that we do.

Obviously, we run things in secure environments, and AppArmor has opinions about which things can run.

Ben Rady

Yeah.

Matt Godbolt

And certainly, our own jailing code needs to be configured so that it can do the things that are usually dodgy.

Like, hey, I want to make a whole new namespace.

I want to make all of these sort of isolated environments.

That shouldn't just be allowed to happen randomly.

So AppArmor comes in and tells you, no, you can't do that.

And then Compiler Explorer doesn't work.

So we fixed that.

Cool.

Deployed.

You know, anecdotally, it took a while to start up.

And I was like, ah, watched pot, you know, never boils kind of thing.

Ben Rady

Mm-hmm.

Mm-hmm.

Matt Godbolt

But to sort of make this slightly less of a shaggy dog story than it already is going to be.

Ben Rady

Mm-hmm.

Matt Godbolt

It turns out that our boot up time went from not a great couple of minutes to, you know, four or five minutes.

It was timing things out.

Ben Rady

Hmm.

Yeah.

Matt Godbolt

And even more interestingly, while the node was booting up, it was unresponsive.

Like SSH would time out.

And I'm like, what is going on here?

What on earth could it be doing at startup?

Ben Rady

Oh.

Matt Godbolt

that could um you know bring it to its knees like that.

Ben Rady

Yeah.

Mm-hmm.

Matt Godbolt

So what it ended up being is that at startup, we mount around 2000 SquashFS images.

And I don't know if we've talked about this before on the podcast, but Compiler Explorer has a very unusual...

Ben Rady

Don't think so.

Matt Godbolt

Yeah.

Okay.

So let's do a bit of an interlude.

This is, this is definitely a therapy session, by the way.

Ben Rady

Yeah.

Matt Godbolt

Thank you, Ben, for being my therapist.

And thank you, listener, for being ah my therapist.

Ben Rady

Well, yeah, you spent three days of your life on this.

You deserve to be able to vent about it, I feel like.

Matt Godbolt

I'm just, yeah, I'm going to feel better.

Ben Rady

That's just like your moral right.

Matt Godbolt

It's because catharsis.

Ben Rady

Yeah.

Matt Godbolt

Yes.

Ben Rady

Yeah.

Uh-huh.

Matt Godbolt

So Compiler Explorer has many compilers.

They are immutable because once I've installed GCC 12.1, I never need to do anything with it again.

Again, with a massive footnote there, sometimes we break and we have to redo them, whatever.

But um they also need to be shared amongst up to about 40 different machines.

And that constitutes about two and a half, three terabytes of binary files.

Ben Rady

Yeah.

Yeah, yeah.

Matt Godbolt

And it's there's just not really a good solution for I'm sharing that amount of data with the kind of access patterns that are, well, it's a compiler, it's an executable, I'm going to run it from

Ben Rady

Right.

Matt Godbolt

NFS or wherever you're storing it.

Ben Rady

Yeah, yeah.

Matt Godbolt

So, you know, the first thing, you know, we when we first started out with Compiler Explorer, every node, it was all the compilers were built into that AMI image.

So that the image that I said I was building from 22 to 24 would actually contain the compilers.

They were actually apt installed at one stage and That was okay, but it does not scale because, you know, effectively every build gets slightly slower than the previous builders.

Ben Rady

Yeah.

Matt Godbolt

you More and more and more images, more compiler images get sort of unpacked onto it.

And, that once it took more than 24 hours to to complete the AMI image, I knew that we were in trouble and I thought we have to come up with a different solution.

Ben Rady

Yeah.

Matt Godbolt

So we used NFS and NFS was great for a while.

We hit some funny performance issues with NFS just out of the gate, but ultimately we got that settled down.

And then things like Boost, which is a C++ header only library.

It makes a lot of use of the fact that you can include a file from itself.

You can self-reference the file and so it has a bunch of pre-processed trickery that includes the file multiple times and hash defines something to be N plus one.

So it can include the same file 50 times to get expanding out of certain things that you can't do in the template system.

Ben Rady

Yeah.

Matt Godbolt

Those kinds of tricks.

Either way, what that is, is tiny text files, including tiny other text files, which is like the worst case scenario for NFS.

Ben Rady

Yeah.

Matt Godbolt

You've got this massive...

even if you cache the file contents, NFS will always go and fetch the, or pretty quickly will go and fetch the metadata to see if the thing has changed on the remote side before it will serve up the cached content it has locally.

Ben Rady

Right.

Right.

Matt Godbolt

And at that point, you might as well have just read the darn file because it's only 80 bytes long.

Ben Rady

Yeah.

Matt Godbolt

so

Ben Rady

Right.

Yeah.

Matt Godbolt

Latency, massive problem.

And so Boost was timing out.

And so our first solution to this was we would rsync a few libraries to the local disk when we started up so that they were local and then substitute the path.

Great, lovely, but not scalable because we have thousands of libraries and then the boot up time was getting longer and longer as well.

So take two was for every compiler or library that is

Ben Rady

yeah

Matt Godbolt

like a final build, like a 12.1 released compiler, we install it on NFS, but then we also build a SquashFS image of that compiler.

Ben Rady

Okay.

Matt Godbolt

That SquashFS image is also on NFS, which so far you're thinking this is just manifestly worse, which maybe it is.

Ben Rady

Okay.

Matt Godbolt

And then at startup, we mount the SquashFS image over top of the NFS location where it is also stored.

So it looks like we have one unified /opt/compiler-explorer, all the compilers in the world, but some of those directories in there have actually been mounted over top and are actually being served from a SquashFS image that's on NFS as well.

Ben Rady

Right.

And the intention here is to sort of solve the tiny file problem.

Matt Godbolt

Right.

Ben Rady

And instead of shipping across an 80 byte file, you're shipping across a SquashFS for a particular compiler and all of the bits come along with it in one swell foop.

Matt Godbolt

In one swell foop, yes: it's actually better than that because the way SquashFS works is yes, it does do packing of smaller files into an area.

Ben Rady

Wow.

Matt Godbolt

also compresses them.

Ben Rady

Mm-hmm.

Matt Godbolt

And as far as the kernel is concerned, SquashFS is like a block device file system, like a hard disk.

And so the kernel is caching and loading things as like 4K pages, 8K, whatever size it's reading and writing.

Ben Rady

Right.

Matt Godbolt

SquashFS doesn't know that the underlying image is actually on a mutable network file system.

And so it caches them forever.

It's like, yeah, fine.

I'll keep this in my page cache, right?

I read page seven of this file system to hand to SquashFS that then unpacks it.

Ben Rady

Uh-huh.

Uh-huh.

Matt Godbolt

And then the files are also cached, right?

But the raw block level accesses are cached essentially forever, We've laundered the fact that behind the scenes, it's an NFS drive.

And so we get much better caching, much better performance of like reading the metadata for things because yeah, like you read the directory contents and it gives you like a 4K block that probably has all of the directories for everything that's inside that SquashFS image.

Ben Rady

Mm-hmm.

Matt Godbolt

And then the tiny files are all packed together as well.

So like we're winning on every level and it was a huge improvement in the compile time.

But it's not free to mount the image in the first place.

And obviously it takes up a little bit of memory on the machine to have like...

4,000 or 2,000 mounts, right?

Ben Rady

Yeah.

Yeah.

Matt Godbolt

So each mount you know takes up a certain amount of kernel space and you know part of the acceleration is that pre-caching effectively of that sort of top level of the directory of every of every compiler and library.

So over the years, you know we went from tens to hundreds to thousands of compilers and our boot up time became dominated by, well, it takes 50 milliseconds to mount each SquashFS image

Ben Rady

Yeah.

Matt Godbolt

Times 2000.

Suddenly that's an appreciable amount of time.

Ben Rady

Yeah, yeah.

Matt Godbolt

And so then you're like, what if we do this in the background while we're starting up?

And then it didn't that didn't work out.

Now I know why.

Ben Rady

Yeah, OK.

Ah.

Matt Godbolt

So anyway, that was the situation that we're in.

That's why we mount thousands of files at startup.

And we have any number of ways of thinking about consolidating the number of SquashFS images we have so that we can combine all the GCCs together in one image or whatever.

But there are a number of other issues with that, which is a whole other podcast episode.

Ben Rady

Hmm.

Matt Godbolt

And every time I describe this problem to people, by the way, um everyone's like, yeah, I don't know what, I can't think of a solution to this problem.

ah This is the general problem of like, I need to ship immutable binaries with low latency to many places.

And I can't, and I need to take advantage of the immutability as much as possible.

And um yeah.

And then the management of that, right.

Ben Rady

Yeah, yeah, yeah, yeah.

Matt Godbolt

I've got thousands of these things, right.

Anyway.

So, so, the The problem turned out to be when you mount a an image on and a modern Ubuntu, systemd comes along and says, oh, you've added a mount.

I need to make an ad hoc unit, which is kind of what it's kind of dependency tracking node in its graph.

I don't know much about systemd.

I've learned a lot more about it in the last few days, but...

Ben Rady

Heh

Matt Godbolt

What it effectively does is it creates a node in some dependency graph so that when you shut the system down, it knows to unmount it and it knows who depends on it and things like that, right?

Ben Rady

Okay, yeah, right.

Matt Godbolt

Sort of that stuff.

Because you can also phrase the entirety of like, /etc/fstab or certain services to say, hey, this service needs this thing.

And this thing is a network mount and that network mount needs networking and networking needs to be.

So systemd can like follow this graph and say, well, if you're turning on this service, I'll make sure the mounts it needs are in place.

Ben Rady

Yeah.

Matt Godbolt

And when you turn it off, I can unmount them as well.

Ben Rady

Yeah, yeah.

Matt Godbolt

So it makes a ton of sense.

Ben Rady

I discovered actually, side note, the other day that systemd also does that for shared memory directories,

Matt Godbolt

Interesting.

Ben Rady

which which bit me in a very painful way ah because of a technology that you're familiar with that uses shared memory for storing data.

Matt Godbolt

I'm aware of that.

Ben Rady

And it suddenly disappeared when we turned the service off.

And we're like, what just happened?

Matt Godbolt

Oh my gosh.

Ben Rady

Yeah.

Matt Godbolt

Yeah.

systemd has its fingers in a lot of pies.

Ben Rady

Mm-hmm.

Matt Godbolt

So when you mount an ad hoc thing, be it apparently shared memory or a SquashFS this image, which itself is mounted through loop back, which is another sort of mount, which is, you know, all that kind of stuff.

Ben Rady

Right.

Yeah.

Matt Godbolt

systemd tracks it and I don't know what systemd is doing to take so long because this is the rub systemd essentially takes a hundred percent CPU and twice over.

So on our two core machine that we run these things on, I can run top that when I actually got it, I said to you, the machine was unresponsive, right?

Because this is all in kernel land locks are being taken out left, right, and center.

Ben Rady

Yeah, yeah.

Yeah.

Mm-hmm.

Matt Godbolt

Um, you know, we're trying to mount these things in parallel at sensible levels because we want to try and mount them and deal with the latency.

If it takes 50 milliseconds and most of that is, is network latency.

I should be at a fire off two or three mounts at once.

and get the SquashFS to read the root directory and then have the mounts, whatever.

Even if the kernel is sequencing them, you'd think I'd like to have some mounts on the go at once.

But no, every time that comes in, systemd does a ton of work.

And so PID 1, which is, you know, "init", but it's called systemd on these systems, ha ha takes 100%

Ben Rady

Right.

Matt Godbolt

And a second process called systemd, which is the one that is per user, I think, uh, takes a hundred percent CPU while this is going on.

And especially, and, and that is not the case on Ubuntu 22.

It does take some CPUs like I measured 30 and 40% on those, which again, it's like, what on earth are you doing?

But I'm sure it's got this massive dependency graph that it's running through.

Uh, So yeah, the short answer is Ubuntu 24, either the systemd has changed or some aspect of the configuration.

And every time you mount something, it takes a little bit of time, which is probably not a big deal, but unless you're doing 3000 of them back to back or trying to do four of them at a time.

Ben Rady

Right.

Right.

Matt Godbolt

And then it sort of jams the system up completely.

And the knock-on effects for this when I rolled out the 24 was, A, our machines took ages to boot, but they did eventually come in just under the threshold of them getting whacked by...

machine did not come you know responsive exactly.

Ben Rady

Yeah.

The timeout essentially.

Yeah.

Yeah.

Matt Godbolt

But of course, they've been chewing 200% CPU for like those three minutes while they were booting up.

And the way that we do our auto-scaling...

for our cluster is we take the average CPU of the, which is a terrible metric, but it's also the simplest one to do in AWS.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

Now we've got, ah one of our ah committers is is working hard on getting a much better way of scaling up and scaling down and using metrics that make sense.

But the only sensible one that you can go to that just has a dropdown entry in AWS is add or remove nodes from the cluster to keep the average CPU at blah.

Ben Rady

Yeah, yeah.

Matt Godbolt

And so we've got that set to like 30%, 25% right now, so which is, again, not ideal.

But it does mean that now suddenly you get into this runaway situation where a little bit of load comes in, you fire up a new node, and for the three minutes, it's taking 100% CPU, which pulls the average up even further.

Ben Rady

Yep, yep.

Right.

ah

Matt Godbolt

And before you know it, you have 40 nodes that are booted up.

Ben Rady

right

Matt Godbolt

And then once it hits that maximum of 40 [nodes]: obviously the CPU thing plunges down, and so it quickly stops dropping them all.

Ben Rady

Mm-hmm.

Matt Godbolt

But then, yeah.

so So we rolled back to Ubuntu 22.

And I spent the last two days or day and a half trying to turn off systemd, try to disable this part of systemd, try to add every mount option known to mankind to the end of the SquashFS thing to say, for the love of God and all that's holy, don't track this.

I don't care.

I'm going to mount it and then I'm going to throw it away.

Ben Rady

Yeah, yeah.

Matt Godbolt

And neither me, my internet searches or any of the LLMs that I ask could come up with a way.

Ben Rady

Yeah.

Matt Godbolt

It just doesn't seem like there's a way to do it.

In fact, by the end of one session, Claude was saying, you really need to file this as a bug.

Ben Rady

Yeah.

Matt Godbolt

And I'm like, I don't, I don't, I'm probably doing it wrong.

I still think I'm doing it wrong.

Ben Rady

Yeah.

Matt Godbolt

Right.

Right.

Um, was and you know, for the longest time, having 3000 things mounted is not really an ideal situation.

And so that was fun.

Yeah.

You're pulling the face of, Oh, uh,

Ben Rady

yeah I'm just thinking, is there really no way to tell systemd just not to track these things?

Matt Godbolt

I couldn't find it.

You can do all these sort of repressions and things.

and it still didn't you know It was still being thrown through it and it's it's monitoring some mount thing.

Ben Rady

Ugh, it didn't work.

Yeah.

Matt Godbolt

ah The best I could do was "kill -STOP" on the second systemd process, like the user level systemd process, which essentially is like a break pointing it.

Ben Rady

Yeah.

Matt Godbolt

then mount them all, and then "kill -CONT" the process.

And that got rid of one of the 100 percenters.

But that process is sort of lazily on demand created.

So until you start doing work that needs it, it's not there.

And so my scripts were dying because they were trying to send it to stop.

And it was like, well, that PID doesn't exist yet.

And then I'd log onto the machine and finally get through.

Yeah, computers, man.

Ben Rady

I mean, how do they even work?

Matt Godbolt

I mean, how even...

how even And so what I would ideally like is to not...

Well, first of all, I would like to have a much cleverer way of managing a large nest of SquashFS images and...

Ben Rady

Wow.

Mm-hmm.

Matt Godbolt

In fact, this whole approach to mounting SquashFS images through NFS or whatever was something that I was doing around the same time that the company-wide solution at Aquatic was being developed.

So there is no surprise that the thing I've just been describing to you is familiar to you, at least in part, because some of the...

Ben Rady

ah Yeah.

Yeah, right.

Matt Godbolt

For our audience, and I don't believe it's revealing any IP.

We will have to do some very creative cutting if it is.

But Aquatic has a solution for storing environments by effectively putting them in SquashFS images.

And, you know, that's not new either.

That's what snap images are.

That's what Flatpaks are or various types of things.

Ben Rady

Yeah.

Matt Godbolt

They just mount them.

So like this is this is one of the ways that one solves a, hey, I just want an immutable bundle of things.

Ben Rady

Yeah.

Matt Godbolt

And then one of the people who worked on that from Aquatic, who is also a Compiler Explorer committer, has come up with some solutions that are a bit more clever about having a list of symlinks that point from the file system to a well-known path and that well-known path has AutoFS that auto mounts the thing on demand.

And so you still present this, uh, this apparent file system that looks like it's got or every compiler known to mankind, but only when you actually cd into the directory or try and run it.

Does the symlink get resolved?

And now that SquashFS image gets mounted and it appears in that position and then you can carry on with your life.

And that's great.

And obviously if we designed it from the beginning for that, we'd be able just retrofit.

we would It would have been fine, but retrofitting it into our current

Ben Rady

Yeah.

Matt Godbolt

ah setup is really, really, really difficult.

And then you still have these problems.

So there's that that gives you on-demand mounting, which is one thing you can do.

And you could try and configure AutoFS to do this, but there are a number of reasons why it's difficult, which are much too complicated to go into now.

So it doesn't just work out of the gate, although it sounds like it ought to.

But we don't really want 3,000 SquashFS images.

That's a pain.

I do want to have, like, here are all the GCCs.

And what you could do is mount sub parts of those images into this unified tree, which means that like, Hey, I've got all the GCCs, but now GCC 15 has just come out.

I don't want to have to rebuild that image because that's 500 gigabytes of GCC and SquashFS is immutable.

Ben Rady

Yeah.

Matt Godbolt

You can't add things after the fact you have to unpack it and then repack it again.

Ben Rady

Right.

Matt Godbolt

So if you're, if you're, uh, your solution for all, I'm adding one more GCC is, unpack all the GCCs and then repack all the GCCs with the new one, then you're kind of back to that original AMI problem we had that is it's going to get incrementally worse every time you add a new thing.

So what you really want to be able to do again, is like have this, well, all of the GCCs for the last, you know, 10 years are in "older GCCs", but they are mounted in each one individually.

Ben Rady

yeah, yeah.

Matt Godbolt

It's one image that has like the old ones.

Ben Rady

yeah Right.

Matt Godbolt

And then periodically you add kind of layers.

This is sort of like a LayerFS thing.

Ben Rady

Yeah, right.

Matt Godbolt

And then you...

you consolidate the layers.

You can have a process that goes away and says, hey, layers three through nine, I can now net them out and make a new layer three, and then I rewrite the file system to be these things.

Ben Rady

Yeah.

Matt Godbolt

So that was that's where we want to go ultimately with this.

But there isn't a quick MVP for it that gets me out of my my current hot potato situation right now.

Ben Rady

Right.

So you're back to Ubuntu 22.

Matt Godbolt

And we've tried it.

We're back to Ubuntu 22.

Ben Rady

Yeah.

Matt Godbolt

Although, although...

although

Ben Rady

yeah

Matt Godbolt

I had a sort of an idea.

So while banging my head on my keyboard, trying to go like, how would even does ah systemd notice when I mount things or when I do stuff to the system?

Ben Rady

Mm-hmm.

Matt Godbolt

I was like, wait a second.

What if I wrote something that looked at file system accesses?

So I have, for every file that's stored in a SquashFS image somewhere, it is also available naked in NFS because if SquashFS isn't around or whatever, or for those things we genuinely update quicker than the SquashFS images, it allows me to have access to the files that are just in that /opt/compiler-explorer, right?

Ben Rady

right

Matt Godbolt

So if I could post hoc, that is, run some trace through the whole system and say, hey, anytime I notice somebody accesses a file that is on NFS, that is inside one of the directories that I have a SquashFS image for, that's when I'm going to choose to mount it.

Ben Rady

Hmm.

Matt Godbolt

And for the first few times, they're still going, they're still reading from NFS, but once that mount has finished, we sweep in, sweep, that's not a word, swap in or flip in or something, I don't know, one those, sweep in.

Ben Rady

Yeah.

Yeah.

Right.

Yeah.

But eventually when it mounts, yeah.

Uh huh.

Sweep in.

Fly in.

Matt Godbolt

fly in the mounted SquashFS image over the top and so that is what I've been doing for the last two hours which is why I was slightly like let's just talk about it, it's top of mind for me right now and that's showing some early promise so in this world what we would do is we would not mount anything

Ben Rady

Yeah.

Matt Godbolt

And then we'd run this daemon, and all it does is sit there and watch file accesses and then sort of lazily bring in the SquashFS images.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

And obviously, in the worst case, a Compiler Explorer node will eventually mount all of the images.

But likely as not, it'll never get close.

Ben Rady

I was going to actually, was going to ask you about that is like, does that mean that as the Compiler Explorer nodes are running, they're just slowly accumulating these SquashFS mount points.

And do you need to like, clean them up on any sort of regular basis or do you just restart the machines every once in a while?

Like, how does that work?

Matt Godbolt

Well, so in the current situation we just mount them all at startup and they're up forever.

So that's the end, right?

Ben Rady

Okay.

Matt Godbolt

So this would be a mark.

Ben Rady

and And the only memory they're really consuming in that state is just that small amount of kernel memory that you were talking about before.

Matt Godbolt

Yeah, which isn't that small.

I can't remember what it was, but it was like a trivial enough, not sorry, non-trivial enough that you can see that the machine's like memory has gone down having finished mounting everything.

Ben Rady

Yeah.

Matt Godbolt

Oh, well, which is less than ideal.

Ben Rady

Mm-hmm.

Matt Godbolt

So yeah, the moment we pay the cost for all of them, even though, you know, we have some like GCC 1.23.

Ben Rady

Yeah.

Matt Godbolt

How often do people use that?

Probably not very often.

And again, we have 40 nodes.

They're recycled really quickly.

It's very unlikely that any one node will will need all 3000 in its lifetime.

Ben Rady

Mm-hmm.

Mm-hmm.

Matt Godbolt

So in this new world order, the way that I was imagining it is at least for V1, we just mount them and leave them up because it's no worse than what we have before.

Ben Rady

Mm-hmm.

Matt Godbolt

you know, it might take an extra...

half a second, even a second to mount the access the first time.

But we're not holding up the compile in that case, it's just going through the slow NFS path.

Ben Rady

Yeah, yeah.

Matt Godbolt

And then, and also, obviously the SquashFS images, extra NFS accesses that we didn't need to do otherwise, but the hope is it'll net out pretty quickly.

And then by the time we either run it again, or even by the time it's finished, the first reading of like the ELF and it's starting to look at the DLLs that it needs to load in, then it's going to pull the DLLs from the SquashFS image.

So that is the hope.

We'll see how it goes.

Ben Rady

Mm-hmm.

Matt Godbolt

I have already found one situation where the SquashFS image is not actually up to date with respect to the changes on the disk, which is like, oh, well, this is going to throw a wrench, a spanner in the works.

Ben Rady

OK.

yeah.

Yeah.

Matt Godbolt

So that is ah an issue.

um

Ben Rady

Interesting.

I guess you could have the demon that you're writing actually do that, right?

Matt Godbolt

It would...

Ben Rady

Maybe.

Matt Godbolt

I guess so.

I think I'm just going to go kind of caveat emptor.

ah We'll find them as we hit them, or maybe it'll be something we do as a post ah process.

Ben Rady

Yeah.

Matt Godbolt

That's just running and looking for things that are out of date, you know, and yeah, go ahead.

Ben Rady

Yeah.

I guess the, Well, I guess the other thing you could do with that is if you, and you probably have this already, but you could, you could farm that thing for usage statistics, right?

Matt Godbolt

Yes.

In fact, about a year and a half ago, we changed the um the privacy policy and our our um ah backend to track statistics because you know we don't like to track things.

That's not what we're into.

Ben Rady

Yeah.

Matt Godbolt

I don't care what you're doing with it, really.

But it is incredibly useful to say, how often do we use this compiler versus that compiler, which I think is a fair use of non-identifiable information.

Ben Rady

Right.

For exactly like problems like this, where you're just like, I'm optimizing these things.

I want to optimize them based on the usage, not on like, you know, random things.

Matt Godbolt

Exactly.

And so certainly we can do things like down the line, we can do things like, hey, let's have a cluster that only does legacy compilers, right?

And then that cluster just sits there lives there forever.

Ben Rady

Yeah.

Matt Godbolt

There's two machines that run all the time.

They sit there in their old timey world and request for GCC 1, 2, 3.

They're on the front porch with their shotgun across the lap.

Ben Rady

and Right, with the rocking chairs.

Matt Godbolt

Yeah, that's right.

Waiting for a you know the...yea.

And then we could even have, you know, conversely, the some faster nodes that are serving the GCC, you know, 15.1s that have just come out and the trunk builds and things like that.

Ben Rady

Mm-hmm.

Matt Godbolt

and But our management is not good enough.

our ah are At the moment, having multiple clusters is painful for us.

And so we have kind of two or maybe three clusters, you know one for the GPU things, one for the ARM compilers, and then one for everything else.

Ben Rady

Yeah.

Matt Godbolt

And retrofitting in everything without breaking a site is so hard.

you know This is why you know when we did our talk, a conversation, and you you talked a little bit about your sort of branch-based deployment, spinning up that.

Ben Rady

yeah Yeah.

Yeah.

Matt Godbolt

That was like, oh man, I wish.

I wish we'd thought of that ahead of times.

Ben Rady

Right.

Just twist, twist the knife.

I mean, you know, I, I think it sounds like at the scale that you guys are at right now, just spinning up one of those environments would be prohibitive in terms of cost, but maybe you could structure in a way where it wasn't quite so bad, you know?

Matt Godbolt

So cost is not such a big deal anymore.

And I say that with a massive footnote.

Ben Rady

Yeah.

Matt Godbolt

People are surprised at how relatively cheap Compiler Explorer is to run.

We're currently at around about three grand a month ah burn rate of AWS stuff, although it's just gone up.

Ben Rady

yeah

Matt Godbolt

But it's gone up for good reasons.

The good reasons is that I've sent a message out, a blog post, in fact.

That's what we call these things, a message.

I've made a blog post kind of explaining

Ben Rady

You made one internet and you sent it out into the webs

Matt Godbolt

I did.

I sent it out into the webs about the cost breakdown of Compiler Explorer.

I did a big sort of dive into it so I could justify, you know, we're very lucky.

We have a lot of ah commercial sponsors.

We have a lot of people who ah donate on Patreon and GitHub sponsors and...

And my dog's barking, which I can't be bothered to edit out.

So we have a lot of money coming, which is fantastic.

And I like to be very upfront and open about what we do with the money as much as I can within the reasonableness of the fact that it's still

Ben Rady

hu

Matt Godbolt

my private finances at some level still, right?

Ben Rady

Right.

Matt Godbolt

I'm sort of hand-waving and gesturing about this stuff stuff.

um So I like people to know where the money's going.

And so telling people like, this is what we spend it on and this is how much it breaks down to.

it was interesting.

And so it ended up on Hacker News, which was great.

And one person said, have you ever considered talking to, you know, the Grafana people or the...

SolarWinds or whatever, you know, the people that we pay money to for subscriptions for like monitoring stuff.

And I was like, yeah, kind of, but you know, there's something to be said for it's not that much.

It's not a huge, you know, that's costing me, you know, 40 bucks a month.

um That's kind of noise.

um And I don't want to give up too much of my, ah you know, I don't want wanna to sell out.

I would rather pay 40 bucks a month than them say, Hey, you have to put an ad on the top.

Ben Rady

Right.

Matt Godbolt

If I say, say thank you to, but and I'm like, ah, I don't know if they would do that, but that was, but it went, we went back and forth on that.

Ben Rady

Yeah, yeah.

Right.

Matt Godbolt

And then he was like, Oh, you do know that AWS have an open source budget.

And I'm like the what now?

Ben Rady

Oh, yeah.

Matt Godbolt

And so I looked it up and it was a blog post from like 2012 that made some mention of yeah you know, like, hey, if you're an open source project, contact us.

We might be able to help you.

Here's a form to fill in.

Ben Rady

Yeah.

Matt Godbolt

And I'm like, okay.

Now the form has looked complicated.

So given how old the blog post was, I just emailed the email address that it said and said, hey, is this thing still on, right?

Ben Rady

yeah Right.

Matt Godbolt

I'm, you know, I'm the creator of Compiler Explorer.

I'm interested in talking if this thing's still on.

I immediately got an email bounce and I thought, well, there you go.

That tells me, I'm glad I did this rather than spending all the time to look at.

Ben Rady

Yeah.

Matt Godbolt

So I thought nothing more of it.

An hour and a half later, somebody replied going, "oh, wow.

Yes.

We use Compiler Explorer.

We'd love to help definitely fill in the form and send it to us." And I look back at the bounce and it was like, oh, it must be one person's inbox is full.

Ben Rady

OK.

Oh, yeah, OK.

Matt Godbolt

of the distribution list that it ultimately ended up.

Ben Rady

Yeah.

Matt Godbolt

So I filled it in and thought, and it asked for what is a year's amount of money that your site or your, your project might need.

And so I was like, I guess three grand times 12 or four grand.

I can't remember exactly.

I think it was three, three, three and a half grand.

so So 36 grand, which, you know, it's like monstrous amount of money.

Ben Rady

Yeah.

Matt Godbolt

And I was expecting him to just say "hahahahah", no, seriously, how much do you need?

Um, And I thought, again, nothing more.

they that Someone much more business oriented got back to me and said, we'll be back.

We'll get back to you with it.

Thank you for your you know your email.

We'll be back to you within 30 days.

Ben Rady

Yeah.

Matt Godbolt

And I thought, all right, that's this is the now it's gone into bureaucracy.

We'll never see it again.

Ben Rady

Right.

Matt Godbolt

And ah I just happened to log into my AWS account and I saw 36 grand credit had been applied to

Ben Rady

What?

Yeah.

Matt Godbolt

it And I had to email them back and say like, okay, before I even talk to anyone, what are there?

Do I have to tell people about this?

Do I have to go out of my way to like, thank you?

Do I?

What's the deal here?

Ben Rady

Yeah.

Matt Godbolt

And the woman eventually got back to Oh, sorry.

I meant to tell you that you'd been approved.

I'm like, were you just going to let that hang?

Wow.

And so the short version, this is such a, off topic thing is that AWS are now funding the cost that I put forward, the year cost as it was last year, which means immediately you're like oh maybe we can start running more instances and whatever because I have the cash for it now so it feels like I'm treating it like a subsidy but what I'm mindful of is they may not renew at the end of this year in which case I have to be able to scale everything back again so there's a bit of thought

Ben Rady

Yeah, yeah, yeah.

Matt Godbolt

I don't know.

it's ah It's an interesting situation to be in, but it's an amazing situation in the short term.

It means like I'm looking at like Redis caching, whereas before I was like, ah I don't really think that I can justify you know another $100 a month just to have something that I might not use or these kinds of things.

Ben Rady

Yeah.

Matt Godbolt

So it's very exploitative.

So I'm excited about that.

But how do we get to this?

Ben Rady

Oh, that's cool.

That's very cool, actually.

Matt Godbolt

I'm forgetting.

Oh, yeah, you were saying about how expensive it might be.

Ben Rady

Yeah, you know, the branch based stuff might be too expensive and, you know.

Matt Godbolt

But now it might be okay.

Honestly, I look at my load balancer now.

Ben Rady

Yeah.

Matt Godbolt

So load balancers cost, what, $10 a month plus the transfer?

Maybe a little more...

Ben Rady

Yeah, it's the data that you really pay for there, right?

Yeah.

Matt Godbolt

It is, yeah.

And I've often wanted to have multiple load balancers, you know, one for each.

I used to have subdomains for like our staging environment and things like that.

Ben Rady

Yeah.

Mm.

Mm-hmm.

Matt Godbolt

And then I made it so that every subdomain ends up in godbolt.org, you know, comes to Compiler Explorer.

So you have to be like saying, www.staging.godbolt.org.

Ben Rady

Mm-hmm.

Matt Godbolt

And then you're into multi-level DNS and that's a pain in the bum because you can't do wildcards and all that kind of stuff.

You know, you know these things.

Ben Rady

Yeah.

Yeah.

Yeah.

Matt Godbolt

um But I stopped doing that because I was originally using it to route to a different um load balancer.

But you know to have one load balancer per environment was expensive.

So everything now goes to the same load balancer and it's URL match to go off to its its merry way.

And that's not scaling all that well now.

Ben Rady

Right.

Yeah, yeah.

Matt Godbolt

I've got multiple of them and there's other things on there.

And you know there was a time when 10 bucks a month for another load balancer was like meaningful.

And now...

Ben Rady

Right.

Matt Godbolt

not to, you know, put too far to border.

Ben Rady

Yeah.

Matt Godbolt

That's noise.

I don't worry about it.

So maybe I should go, maybe I should explore branch based development.

Ben Rady

you can spend 10 bucks a month to make your life a little easier, I would suggest that you do it.

Matt Godbolt

Yeah.

Yeah.

That is, that is the cost.

I think right now is, you know, the trade-off between the cost of time.

Ben Rady

Yeah.

Matt Godbolt

And I have supposedly three presentations to prepare for and not be doing Compiler Explorer stuff.

And then I've got, kind of two and a half clear months before I have to get my, go and work for a real job.

Ben Rady

What - a job?!

Matt Godbolt

I know.

Ben Rady

That sounds terrible.

Matt Godbolt

I know it does.

Ben Rady

Yeah.

Matt Godbolt

Doesn't it?

It sounds awful.

So yeah, I'm sort of very much top of mind thinking about how we're going to, uh, how going to go back to work, Ben.

don't know.

Well, that no

Ben Rady

Yeah.

I think in the first week you're going to be like, this is awesome.

That's what I predict.

You're just going to be like, oh oh, right.

Matt Godbolt

I reckon so too.

Ben Rady

I remember why I love this.

Matt Godbolt

Yeah, I think you're absolutely right.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

I'm pretty bullish about it.

I check in with my new gig from time to time.

And, you know, I always come away feeling excited.

Buoyed [in a british acccents "boid"], as I would say, or Booid, would you say, as a yank?

Ben Rady

who I would probably say buoyed.

But I wouldn't say either of those words.

I would just say excited because it's just too nautical.

Matt Godbolt

You'd say excited.

Yeah, that was...

That

Ben Rady

I'm not.

Matt Godbolt

says a man who works for a company called Aquatic.

Ben Rady

Yeah.

Matt Godbolt

Yeah.

Bowie.

Ben Rady

we We have a service actually at Aquatic called buoy and I can, I like trip over it every time I say it or spell it.

Matt Godbolt

Which...

It's...

Ben Rady

Buied.

Matt Godbolt

So in British English, that is boy.

It's always been boy.

Like, you know, what is the property of being able to float?

Ben Rady

Yeah.

Matt Godbolt

It is...

Say it.

Ben Rady

Booeyant.

Matt Godbolt

hey Yeah!

Listener, you look at the contortions on Ben's face as he tried to justify pronouncing it that way.

Yeah.

Yeah.

ah yeah

Ben Rady

Okay.

Point taken.

Matt Godbolt

All But that, you know, very few of these language based justifications hold water if you start looking too deeply because English is is not very logical.

Ben Rady

Yeah.

I think Well, none of them hold water with

Matt Godbolt

Anyway.

Ben Rady

buoyant because they float.

That's the that's what you're doing.

Matt Godbolt

but do

Ben Rady

it's and Never mind.

I'll go home.

um

Matt Godbolt

Maybe we should actually somehow we've been talking for, well, I've been talking and you've been very kindly and listener has been very kindly listening to me vent my spleen.

Ben Rady

Oh, no.

I mean, I love these worst war stories.

I think we should do more of these.

It's like, let me tell you about this bug that consumed two days of my life.

Matt Godbolt

Yeah.

Ben Rady

Those are great.

Matt Godbolt

I mean, I think it's valuable sometimes to hear them.

I mean, so it's fun to tell them, but sometimes it's nice to hear them because then you secretly, you go back to your desk.

Ben Rady

Mm-hmm.

Matt Godbolt

You're like, I don't feel so bad about spending four hours tracking down this thing now.

Ben Rady

Right.

Yeah.

You know, it's like the old thing about like, you know, hacking and programming in movies is like people with like, you know, one hand on each keyboard and then like, you all the things scrolling by on the screen and the charts and graphs.

And in reality, it's just staring at a stack trace for 30 minutes going like, "I am bad at my job" [whispered].

Matt Godbolt

yeah Well, in case there was any doubt, you're not bad at your job.

I don't think I'm bad at my job, but yeah, feeling that way occasionally is...

Ben Rady

Yeah, that's just how you feel.

You're just like, oh, well how did this ever work?

I don't understand how it even ever worked, let alone what's happening now.

Matt Godbolt

Yeah...

Ben Rady

So, yes.

Matt Godbolt

Yeah, like, who wrote this ...

you know the git blame?

And you're like, oh, oh, yeah.

Ben Rady

Right.

Oh, it's me.

Yeah.

Matt Godbolt

All right, friend.

Well, short of starting a whole new conversation, I think we should finish it up here unless there's anything you want Parting words of wisdom.

Ben Rady

No, that sounds good.

This was a good episode.

Matt Godbolt

Those were your...

Ben Rady

I like it.

I dig it.

Ship it.

Matt Godbolt

Fantastic, mate.

I will.

I will.

Ben Rady

Yeah.

Matt Godbolt

I will get the minions to edit and put it out soon.

And by the minions, I mean me.

Ben Rady

Perfect.

Matt Godbolt

There are no minions.

Ben Rady

Right.

Matt Godbolt

Cool.

Ben Rady

Cool.

Matt Godbolt

All right, friend.

Until next time.

Ben Rady

Until next time.

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.