Episode Transcript
Hey, Ben.
Ben RadyHey, Matt.
Matt GodboltI left that a bit long, didn't I, this time?
That was a bit later in the intro.
I was like, oh, because I was chatting to you pre-show and then, yeah, got distracted.
Ben RadyHow long could you go not saying that before people would just be like, is this thing broken?
Matt GodboltAre we on the right podcast here?
Ben RadyIs this the right podcast?
I don't even know.
Matt GodboltI have a funny story about that.
Maybe we've already said it, but one of my pals, one of my C++ pals, his name is Ben, Ben Deane, right?
And he's also British.
And so he was listening to the first episode of the podcast and he just nearly fell off his chair because the first thing I do is go, hi, Ben.
And he's like, how?
What?
I mean, I'm sure that other people called Ben exist.
I appreciate that.
But it was just him telling the story that made me laugh.
Ben RadySo it's like, did I forget to leave a phone call or a Google Meet somewhere?
And it's just Matt being like, yeah, hi.
Matt GodboltNow someone is just talking to me.
Hey, you bum dialed me.
Ben RadyYou left your phone on.
Matt GodboltYeah.
Ben RadyYeah, exactly.
Matt GodboltWe've all been there shouting as loud as we can, trying to get someone's attention.
But yeah, not that.
But anyway, I wonder how long we could get away with.
But it's what we do.
It's what we say.
Oh, another thing.
Ben RadyMm-hmm.
Matt GodboltI met somebody who is a listener or our listener, singular.
Ben RadyThe listener.
You met the listener.
Matt GodboltAnd it occurred to me that's about the fourth person now who's told me that they are our listener.
Ben RadyOh.
Matt GodboltSo we might have to accept that we have more than one now.
Ben RadySo we have four listeners.
Matt GodboltIt's so funny.
Given how the internet is all about tracking and every YouTube video tells you how many people have watched it and all that kind of good stuff, it's really hard to know how many listeners there really are out there for us.
Ben RadyOh yeah.
Because...
Matt GodboltIt seems broken in the podcast world.
Ben RadyBecause of one of the classic problems of computer science, cache invalidation.
Matt GodboltYeah.
Ben RadyYeah.
Matt GodboltEveryone wants to cache your...
I mean, I suppose that's the thing, like videos, nobody would want to cache gigabytes of videos, but you're like, hey, a couple of megs of MP3 file, sure.
Ben RadyRight.
Matt GodboltSpotify, we'll take a copy of that and we'll hand it out to everyone when they press play.
Ben RadyYeah.
Matt GodboltAnd then we get you to try and sign up for our Spotify podcast publisher account that then lets you see how many plays you've had on Spotify.
That's great.
But you can also listen to it on Apple and Google and YouTube and off of our website.
And you can pay, there are these places that will charge you decent amounts of money to tell you how many people listen to your podcast.
But I'm just like, somebody should write a web scraper that goes to all of these places and gets them all in one place, right?
And by somebody, I mean, probably an LLM that I could just tell it to do that thing.
Ben RadyYeah.
Matt GodboltAnyway, that's not what we're here to talk about today.
Ben RadyNo, we're not.
What are we talking about today?
Matt GodboltWe're talking about the last three days of my life, which have been supposedly preparing for conference presentations, which I have in September.
Ben RadyOh, okay.
Matt GodboltI've got three conference presentations coming up, which is great.
And while you're sort of writing slides and throwing out ideas, you're kind of like, well, I want a background window that I can kind of just tap into occasionally.
And there's tons of stuff in Compiler Explorer.
which is just like, hey, run this thing, wait for it to finish, have a look at the output, see if it makes sense, push it to production if it does, updating libraries, that kind of good stuff.
And so we've hit August now.
which I believe will be actually when this podcast comes out in a rare departure from the norm.
Ben RadyYeah, that's right.
Matt GodboltBut we've hit August, and I've long since had a calendar reminder to say, hey, we should really upgrade to Ubuntu 24 for all of the production nodes for Compiler Explorer.
Ben RadyMm-hmm.
Matt GodboltAnd, you know, we tend to drag our feet a little bit because we've been bitten before.
Ben RadyMm-hmm.
Matt Godboltthis is where we should see a big foreshadowing thing that should pop up here.
Ben RadyOh, boy.
Matt GodboltAnd in fact, 10 years ago, we were bitten by an issue.
I was doing some research trying to work out what happened and I found my own blog post from 10 years ago.
Ben RadyOh, wow.
Yeah.
Matt GodboltYou know, that kind of thing where you're like, oh, who had this problem?
Ben RadyYes.
Matt GodboltOh, I had this problem.
Ben RadyI did.
Former me.
Uh-huh.
Matt GodboltSo anyway, how hard can it be to upgrade the operating system?
We have everything scripted through the wazoo.
We use Packer to build our images, all of our images, apt install all the things they need and then they shut themselves down.
Great.
And then we make an image, a machine image out of that.
And that's what starts up and is a Compiler Explorer node.
Pretty straightforward.
And we do this fairly often because if you don't, then it rots and you're relying on a ton of stuff that's, you know, install this from this random website.
Ben RadyRight.
Yeah.
Matt GodboltSo anyway, that wasn't a problem.
So in theory, it was just change a 22 to 24 and then rerun the Packer.
Ben RadyMm-hmm.
Matt GodboltAnd
Ben RadyMm-hmm.
Matt Godboltit failed the first time for an easy to diagnose reason.
Ben RadyYeah.
Matt GodboltLuckily I'd already hit this before because I started a 24 upgrade for a different part of the system and hit it.
I was like, oh, I must remember this.
And of course, completely failed to remember to do it this time around.
Which was to do with the way AppArmor has been updated and some of the jailing things that we do.
Obviously, we run things in secure environments, and AppArmor has opinions about which things can run.
Ben RadyYeah.
Matt GodboltAnd certainly, our own jailing code needs to be configured so that it can do the things that are usually dodgy.
Like, hey, I want to make a whole new namespace.
I want to make all of these sort of isolated environments.
That shouldn't just be allowed to happen randomly.
So AppArmor comes in and tells you, no, you can't do that.
And then Compiler Explorer doesn't work.
So we fixed that.
Cool.
Deployed.
You know, anecdotally, it took a while to start up.
And I was like, ah, watched pot, you know, never boils kind of thing.
Ben RadyMm-hmm.
Mm-hmm.
Matt GodboltBut to sort of make this slightly less of a shaggy dog story than it already is going to be.
Ben RadyMm-hmm.
Matt GodboltIt turns out that our boot up time went from not a great couple of minutes to, you know, four or five minutes.
It was timing things out.
Ben RadyHmm.
Yeah.
Matt GodboltAnd even more interestingly, while the node was booting up, it was unresponsive.
Like SSH would time out.
And I'm like, what is going on here?
What on earth could it be doing at startup?
Ben RadyOh.
Matt Godboltthat could um you know bring it to its knees like that.
Ben RadyYeah.
Mm-hmm.
Matt GodboltSo what it ended up being is that at startup, we mount around 2000 SquashFS images.
And I don't know if we've talked about this before on the podcast, but Compiler Explorer has a very unusual...
Ben RadyDon't think so.
Matt GodboltYeah.
Okay.
So let's do a bit of an interlude.
This is, this is definitely a therapy session, by the way.
Ben RadyYeah.
Matt GodboltThank you, Ben, for being my therapist.
And thank you, listener, for being ah my therapist.
Ben RadyWell, yeah, you spent three days of your life on this.
You deserve to be able to vent about it, I feel like.
Matt GodboltI'm just, yeah, I'm going to feel better.
Ben RadyThat's just like your moral right.
Matt GodboltIt's because catharsis.
Ben RadyYeah.
Matt GodboltYes.
Ben RadyYeah.
Uh-huh.
Matt GodboltSo Compiler Explorer has many compilers.
They are immutable because once I've installed GCC 12.1, I never need to do anything with it again.
Again, with a massive footnote there, sometimes we break and we have to redo them, whatever.
But um they also need to be shared amongst up to about 40 different machines.
And that constitutes about two and a half, three terabytes of binary files.
Ben RadyYeah.
Yeah, yeah.
Matt GodboltAnd it's there's just not really a good solution for I'm sharing that amount of data with the kind of access patterns that are, well, it's a compiler, it's an executable, I'm going to run it from
Ben RadyRight.
Matt GodboltNFS or wherever you're storing it.
Ben RadyYeah, yeah.
Matt GodboltSo, you know, the first thing, you know, we when we first started out with Compiler Explorer, every node, it was all the compilers were built into that AMI image.
So that the image that I said I was building from 22 to 24 would actually contain the compilers.
They were actually apt installed at one stage and That was okay, but it does not scale because, you know, effectively every build gets slightly slower than the previous builders.
Ben RadyYeah.
Matt Godboltyou More and more and more images, more compiler images get sort of unpacked onto it.
And, that once it took more than 24 hours to to complete the AMI image, I knew that we were in trouble and I thought we have to come up with a different solution.
Ben RadyYeah.
Matt GodboltSo we used NFS and NFS was great for a while.
We hit some funny performance issues with NFS just out of the gate, but ultimately we got that settled down.
And then things like Boost, which is a C++ header only library.
It makes a lot of use of the fact that you can include a file from itself.
You can self-reference the file and so it has a bunch of pre-processed trickery that includes the file multiple times and hash defines something to be N plus one.
So it can include the same file 50 times to get expanding out of certain things that you can't do in the template system.
Ben RadyYeah.
Matt GodboltThose kinds of tricks.
Either way, what that is, is tiny text files, including tiny other text files, which is like the worst case scenario for NFS.
Ben RadyYeah.
Matt GodboltYou've got this massive...
even if you cache the file contents, NFS will always go and fetch the, or pretty quickly will go and fetch the metadata to see if the thing has changed on the remote side before it will serve up the cached content it has locally.
Ben RadyRight.
Right.
Matt GodboltAnd at that point, you might as well have just read the darn file because it's only 80 bytes long.
Ben RadyYeah.
Matt Godboltso
Ben RadyRight.
Yeah.
Matt GodboltLatency, massive problem.
And so Boost was timing out.
And so our first solution to this was we would rsync a few libraries to the local disk when we started up so that they were local and then substitute the path.
Great, lovely, but not scalable because we have thousands of libraries and then the boot up time was getting longer and longer as well.
So take two was for every compiler or library that is
Ben Radyyeah
Matt Godboltlike a final build, like a 12.1 released compiler, we install it on NFS, but then we also build a SquashFS image of that compiler.
Ben RadyOkay.
Matt GodboltThat SquashFS image is also on NFS, which so far you're thinking this is just manifestly worse, which maybe it is.
Ben RadyOkay.
Matt GodboltAnd then at startup, we mount the SquashFS image over top of the NFS location where it is also stored.
So it looks like we have one unified /opt/compiler-explorer, all the compilers in the world, but some of those directories in there have actually been mounted over top and are actually being served from a SquashFS image that's on NFS as well.
Ben RadyRight.
And the intention here is to sort of solve the tiny file problem.
Matt GodboltRight.
Ben RadyAnd instead of shipping across an 80 byte file, you're shipping across a SquashFS for a particular compiler and all of the bits come along with it in one swell foop.
Matt GodboltIn one swell foop, yes: it's actually better than that because the way SquashFS works is yes, it does do packing of smaller files into an area.
Ben RadyWow.
Matt Godboltalso compresses them.
Ben RadyMm-hmm.
Matt GodboltAnd as far as the kernel is concerned, SquashFS is like a block device file system, like a hard disk.
And so the kernel is caching and loading things as like 4K pages, 8K, whatever size it's reading and writing.
Ben RadyRight.
Matt GodboltSquashFS doesn't know that the underlying image is actually on a mutable network file system.
And so it caches them forever.
It's like, yeah, fine.
I'll keep this in my page cache, right?
I read page seven of this file system to hand to SquashFS that then unpacks it.
Ben RadyUh-huh.
Uh-huh.
Matt GodboltAnd then the files are also cached, right?
But the raw block level accesses are cached essentially forever, We've laundered the fact that behind the scenes, it's an NFS drive.
And so we get much better caching, much better performance of like reading the metadata for things because yeah, like you read the directory contents and it gives you like a 4K block that probably has all of the directories for everything that's inside that SquashFS image.
Ben RadyMm-hmm.
Matt GodboltAnd then the tiny files are all packed together as well.
So like we're winning on every level and it was a huge improvement in the compile time.
But it's not free to mount the image in the first place.
And obviously it takes up a little bit of memory on the machine to have like...
4,000 or 2,000 mounts, right?
Ben RadyYeah.
Yeah.
Matt GodboltSo each mount you know takes up a certain amount of kernel space and you know part of the acceleration is that pre-caching effectively of that sort of top level of the directory of every of every compiler and library.
So over the years, you know we went from tens to hundreds to thousands of compilers and our boot up time became dominated by, well, it takes 50 milliseconds to mount each SquashFS image
Ben RadyYeah.
Matt GodboltTimes 2000.
Suddenly that's an appreciable amount of time.
Ben RadyYeah, yeah.
Matt GodboltAnd so then you're like, what if we do this in the background while we're starting up?
And then it didn't that didn't work out.
Now I know why.
Ben RadyYeah, OK.
Ah.
Matt GodboltSo anyway, that was the situation that we're in.
That's why we mount thousands of files at startup.
And we have any number of ways of thinking about consolidating the number of SquashFS images we have so that we can combine all the GCCs together in one image or whatever.
But there are a number of other issues with that, which is a whole other podcast episode.
Ben RadyHmm.
Matt GodboltAnd every time I describe this problem to people, by the way, um everyone's like, yeah, I don't know what, I can't think of a solution to this problem.
ah This is the general problem of like, I need to ship immutable binaries with low latency to many places.
And I can't, and I need to take advantage of the immutability as much as possible.
And um yeah.
And then the management of that, right.
Ben RadyYeah, yeah, yeah, yeah.
Matt GodboltI've got thousands of these things, right.
Anyway.
So, so, the The problem turned out to be when you mount a an image on and a modern Ubuntu, systemd comes along and says, oh, you've added a mount.
I need to make an ad hoc unit, which is kind of what it's kind of dependency tracking node in its graph.
I don't know much about systemd.
I've learned a lot more about it in the last few days, but...
Ben RadyHeh
Matt GodboltWhat it effectively does is it creates a node in some dependency graph so that when you shut the system down, it knows to unmount it and it knows who depends on it and things like that, right?
Ben RadyOkay, yeah, right.
Matt GodboltSort of that stuff.
Because you can also phrase the entirety of like, /etc/fstab or certain services to say, hey, this service needs this thing.
And this thing is a network mount and that network mount needs networking and networking needs to be.
So systemd can like follow this graph and say, well, if you're turning on this service, I'll make sure the mounts it needs are in place.
Ben RadyYeah.
Matt GodboltAnd when you turn it off, I can unmount them as well.
Ben RadyYeah, yeah.
Matt GodboltSo it makes a ton of sense.
Ben RadyI discovered actually, side note, the other day that systemd also does that for shared memory directories,
Matt GodboltInteresting.
Ben Radywhich which bit me in a very painful way ah because of a technology that you're familiar with that uses shared memory for storing data.
Matt GodboltI'm aware of that.
Ben RadyAnd it suddenly disappeared when we turned the service off.
And we're like, what just happened?
Matt GodboltOh my gosh.
Ben RadyYeah.
Matt GodboltYeah.
systemd has its fingers in a lot of pies.
Ben RadyMm-hmm.
Matt GodboltSo when you mount an ad hoc thing, be it apparently shared memory or a SquashFS this image, which itself is mounted through loop back, which is another sort of mount, which is, you know, all that kind of stuff.
Ben RadyRight.
Yeah.
Matt Godboltsystemd tracks it and I don't know what systemd is doing to take so long because this is the rub systemd essentially takes a hundred percent CPU and twice over.
So on our two core machine that we run these things on, I can run top that when I actually got it, I said to you, the machine was unresponsive, right?
Because this is all in kernel land locks are being taken out left, right, and center.
Ben RadyYeah, yeah.
Yeah.
Mm-hmm.
Matt GodboltUm, you know, we're trying to mount these things in parallel at sensible levels because we want to try and mount them and deal with the latency.
If it takes 50 milliseconds and most of that is, is network latency.
I should be at a fire off two or three mounts at once.
and get the SquashFS to read the root directory and then have the mounts, whatever.
Even if the kernel is sequencing them, you'd think I'd like to have some mounts on the go at once.
But no, every time that comes in, systemd does a ton of work.
And so PID 1, which is, you know, "init", but it's called systemd on these systems, ha ha takes 100%
Ben RadyRight.
Matt GodboltAnd a second process called systemd, which is the one that is per user, I think, uh, takes a hundred percent CPU while this is going on.
And especially, and, and that is not the case on Ubuntu 22.
It does take some CPUs like I measured 30 and 40% on those, which again, it's like, what on earth are you doing?
But I'm sure it's got this massive dependency graph that it's running through.
Uh, So yeah, the short answer is Ubuntu 24, either the systemd has changed or some aspect of the configuration.
And every time you mount something, it takes a little bit of time, which is probably not a big deal, but unless you're doing 3000 of them back to back or trying to do four of them at a time.
Ben RadyRight.
Right.
Matt GodboltAnd then it sort of jams the system up completely.
And the knock-on effects for this when I rolled out the 24 was, A, our machines took ages to boot, but they did eventually come in just under the threshold of them getting whacked by...
machine did not come you know responsive exactly.
Ben RadyYeah.
The timeout essentially.
Yeah.
Yeah.
Matt GodboltBut of course, they've been chewing 200% CPU for like those three minutes while they were booting up.
And the way that we do our auto-scaling...
for our cluster is we take the average CPU of the, which is a terrible metric, but it's also the simplest one to do in AWS.
Ben RadyYeah.
Yeah.
Matt GodboltNow we've got, ah one of our ah committers is is working hard on getting a much better way of scaling up and scaling down and using metrics that make sense.
But the only sensible one that you can go to that just has a dropdown entry in AWS is add or remove nodes from the cluster to keep the average CPU at blah.
Ben RadyYeah, yeah.
Matt GodboltAnd so we've got that set to like 30%, 25% right now, so which is, again, not ideal.
But it does mean that now suddenly you get into this runaway situation where a little bit of load comes in, you fire up a new node, and for the three minutes, it's taking 100% CPU, which pulls the average up even further.
Ben RadyYep, yep.
Right.
ah
Matt GodboltAnd before you know it, you have 40 nodes that are booted up.
Ben Radyright
Matt GodboltAnd then once it hits that maximum of 40 [nodes]: obviously the CPU thing plunges down, and so it quickly stops dropping them all.
Ben RadyMm-hmm.
Matt GodboltBut then, yeah.
so So we rolled back to Ubuntu 22.
And I spent the last two days or day and a half trying to turn off systemd, try to disable this part of systemd, try to add every mount option known to mankind to the end of the SquashFS thing to say, for the love of God and all that's holy, don't track this.
I don't care.
I'm going to mount it and then I'm going to throw it away.
Ben RadyYeah, yeah.
Matt GodboltAnd neither me, my internet searches or any of the LLMs that I ask could come up with a way.
Ben RadyYeah.
Matt GodboltIt just doesn't seem like there's a way to do it.
In fact, by the end of one session, Claude was saying, you really need to file this as a bug.
Ben RadyYeah.
Matt GodboltAnd I'm like, I don't, I don't, I'm probably doing it wrong.
I still think I'm doing it wrong.
Ben RadyYeah.
Matt GodboltRight.
Right.
Um, was and you know, for the longest time, having 3000 things mounted is not really an ideal situation.
And so that was fun.
Yeah.
You're pulling the face of, Oh, uh,
Ben Radyyeah I'm just thinking, is there really no way to tell systemd just not to track these things?
Matt GodboltI couldn't find it.
You can do all these sort of repressions and things.
and it still didn't you know It was still being thrown through it and it's it's monitoring some mount thing.
Ben RadyUgh, it didn't work.
Yeah.
Matt Godboltah The best I could do was "kill -STOP" on the second systemd process, like the user level systemd process, which essentially is like a break pointing it.
Ben RadyYeah.
Matt Godboltthen mount them all, and then "kill -CONT" the process.
And that got rid of one of the 100 percenters.
But that process is sort of lazily on demand created.
So until you start doing work that needs it, it's not there.
And so my scripts were dying because they were trying to send it to stop.
And it was like, well, that PID doesn't exist yet.
And then I'd log onto the machine and finally get through.
Yeah, computers, man.
Ben RadyI mean, how do they even work?
Matt GodboltI mean, how even...
how even And so what I would ideally like is to not...
Well, first of all, I would like to have a much cleverer way of managing a large nest of SquashFS images and...
Ben RadyWow.
Mm-hmm.
Matt GodboltIn fact, this whole approach to mounting SquashFS images through NFS or whatever was something that I was doing around the same time that the company-wide solution at Aquatic was being developed.
So there is no surprise that the thing I've just been describing to you is familiar to you, at least in part, because some of the...
Ben Radyah Yeah.
Yeah, right.
Matt GodboltFor our audience, and I don't believe it's revealing any IP.
We will have to do some very creative cutting if it is.
But Aquatic has a solution for storing environments by effectively putting them in SquashFS images.
And, you know, that's not new either.
That's what snap images are.
That's what Flatpaks are or various types of things.
Ben RadyYeah.
Matt GodboltThey just mount them.
So like this is this is one of the ways that one solves a, hey, I just want an immutable bundle of things.
Ben RadyYeah.
Matt GodboltAnd then one of the people who worked on that from Aquatic, who is also a Compiler Explorer committer, has come up with some solutions that are a bit more clever about having a list of symlinks that point from the file system to a well-known path and that well-known path has AutoFS that auto mounts the thing on demand.
And so you still present this, uh, this apparent file system that looks like it's got or every compiler known to mankind, but only when you actually cd into the directory or try and run it.
Does the symlink get resolved?
And now that SquashFS image gets mounted and it appears in that position and then you can carry on with your life.
And that's great.
And obviously if we designed it from the beginning for that, we'd be able just retrofit.
we would It would have been fine, but retrofitting it into our current
Ben RadyYeah.
Matt Godboltah setup is really, really, really difficult.
And then you still have these problems.
So there's that that gives you on-demand mounting, which is one thing you can do.
And you could try and configure AutoFS to do this, but there are a number of reasons why it's difficult, which are much too complicated to go into now.
So it doesn't just work out of the gate, although it sounds like it ought to.
But we don't really want 3,000 SquashFS images.
That's a pain.
I do want to have, like, here are all the GCCs.
And what you could do is mount sub parts of those images into this unified tree, which means that like, Hey, I've got all the GCCs, but now GCC 15 has just come out.
I don't want to have to rebuild that image because that's 500 gigabytes of GCC and SquashFS is immutable.
Ben RadyYeah.
Matt GodboltYou can't add things after the fact you have to unpack it and then repack it again.
Ben RadyRight.
Matt GodboltSo if you're, if you're, uh, your solution for all, I'm adding one more GCC is, unpack all the GCCs and then repack all the GCCs with the new one, then you're kind of back to that original AMI problem we had that is it's going to get incrementally worse every time you add a new thing.
So what you really want to be able to do again, is like have this, well, all of the GCCs for the last, you know, 10 years are in "older GCCs", but they are mounted in each one individually.
Ben Radyyeah, yeah.
Matt GodboltIt's one image that has like the old ones.
Ben Radyyeah Right.
Matt GodboltAnd then periodically you add kind of layers.
This is sort of like a LayerFS thing.
Ben RadyYeah, right.
Matt GodboltAnd then you...
you consolidate the layers.
You can have a process that goes away and says, hey, layers three through nine, I can now net them out and make a new layer three, and then I rewrite the file system to be these things.
Ben RadyYeah.
Matt GodboltSo that was that's where we want to go ultimately with this.
But there isn't a quick MVP for it that gets me out of my my current hot potato situation right now.
Ben RadyRight.
So you're back to Ubuntu 22.
Matt GodboltAnd we've tried it.
We're back to Ubuntu 22.
Ben RadyYeah.
Matt GodboltAlthough, although...
although
Ben Radyyeah
Matt GodboltI had a sort of an idea.
So while banging my head on my keyboard, trying to go like, how would even does ah systemd notice when I mount things or when I do stuff to the system?
Ben RadyMm-hmm.
Matt GodboltI was like, wait a second.
What if I wrote something that looked at file system accesses?
So I have, for every file that's stored in a SquashFS image somewhere, it is also available naked in NFS because if SquashFS isn't around or whatever, or for those things we genuinely update quicker than the SquashFS images, it allows me to have access to the files that are just in that /opt/compiler-explorer, right?
Ben Radyright
Matt GodboltSo if I could post hoc, that is, run some trace through the whole system and say, hey, anytime I notice somebody accesses a file that is on NFS, that is inside one of the directories that I have a SquashFS image for, that's when I'm going to choose to mount it.
Ben RadyHmm.
Matt GodboltAnd for the first few times, they're still going, they're still reading from NFS, but once that mount has finished, we sweep in, sweep, that's not a word, swap in or flip in or something, I don't know, one those, sweep in.
Ben RadyYeah.
Yeah.
Right.
Yeah.
But eventually when it mounts, yeah.
Uh huh.
Sweep in.
Fly in.
Matt Godboltfly in the mounted SquashFS image over the top and so that is what I've been doing for the last two hours which is why I was slightly like let's just talk about it, it's top of mind for me right now and that's showing some early promise so in this world what we would do is we would not mount anything
Ben RadyYeah.
Matt GodboltAnd then we'd run this daemon, and all it does is sit there and watch file accesses and then sort of lazily bring in the SquashFS images.
Ben RadyYeah.
Yeah.
Matt GodboltAnd obviously, in the worst case, a Compiler Explorer node will eventually mount all of the images.
But likely as not, it'll never get close.
Ben RadyI was going to actually, was going to ask you about that is like, does that mean that as the Compiler Explorer nodes are running, they're just slowly accumulating these SquashFS mount points.
And do you need to like, clean them up on any sort of regular basis or do you just restart the machines every once in a while?
Like, how does that work?
Matt GodboltWell, so in the current situation we just mount them all at startup and they're up forever.
So that's the end, right?
Ben RadyOkay.
Matt GodboltSo this would be a mark.
Ben Radyand And the only memory they're really consuming in that state is just that small amount of kernel memory that you were talking about before.
Matt GodboltYeah, which isn't that small.
I can't remember what it was, but it was like a trivial enough, not sorry, non-trivial enough that you can see that the machine's like memory has gone down having finished mounting everything.
Ben RadyYeah.
Matt GodboltOh, well, which is less than ideal.
Ben RadyMm-hmm.
Matt GodboltSo yeah, the moment we pay the cost for all of them, even though, you know, we have some like GCC 1.23.
Ben RadyYeah.
Matt GodboltHow often do people use that?
Probably not very often.
And again, we have 40 nodes.
They're recycled really quickly.
It's very unlikely that any one node will will need all 3000 in its lifetime.
Ben RadyMm-hmm.
Mm-hmm.
Matt GodboltSo in this new world order, the way that I was imagining it is at least for V1, we just mount them and leave them up because it's no worse than what we have before.
Ben RadyMm-hmm.
Matt Godboltyou know, it might take an extra...
half a second, even a second to mount the access the first time.
But we're not holding up the compile in that case, it's just going through the slow NFS path.
Ben RadyYeah, yeah.
Matt GodboltAnd then, and also, obviously the SquashFS images, extra NFS accesses that we didn't need to do otherwise, but the hope is it'll net out pretty quickly.
And then by the time we either run it again, or even by the time it's finished, the first reading of like the ELF and it's starting to look at the DLLs that it needs to load in, then it's going to pull the DLLs from the SquashFS image.
So that is the hope.
We'll see how it goes.
Ben RadyMm-hmm.
Matt GodboltI have already found one situation where the SquashFS image is not actually up to date with respect to the changes on the disk, which is like, oh, well, this is going to throw a wrench, a spanner in the works.
Ben RadyOK.
yeah.
Yeah.
Matt GodboltSo that is ah an issue.
um
Ben RadyInteresting.
I guess you could have the demon that you're writing actually do that, right?
Matt GodboltIt would...
Ben RadyMaybe.
Matt GodboltI guess so.
I think I'm just going to go kind of caveat emptor.
ah We'll find them as we hit them, or maybe it'll be something we do as a post ah process.
Ben RadyYeah.
Matt GodboltThat's just running and looking for things that are out of date, you know, and yeah, go ahead.
Ben RadyYeah.
I guess the, Well, I guess the other thing you could do with that is if you, and you probably have this already, but you could, you could farm that thing for usage statistics, right?
Matt GodboltYes.
In fact, about a year and a half ago, we changed the um the privacy policy and our our um ah backend to track statistics because you know we don't like to track things.
That's not what we're into.
Ben RadyYeah.
Matt GodboltI don't care what you're doing with it, really.
But it is incredibly useful to say, how often do we use this compiler versus that compiler, which I think is a fair use of non-identifiable information.
Ben RadyRight.
For exactly like problems like this, where you're just like, I'm optimizing these things.
I want to optimize them based on the usage, not on like, you know, random things.
Matt GodboltExactly.
And so certainly we can do things like down the line, we can do things like, hey, let's have a cluster that only does legacy compilers, right?
And then that cluster just sits there lives there forever.
Ben RadyYeah.
Matt GodboltThere's two machines that run all the time.
They sit there in their old timey world and request for GCC 1, 2, 3.
They're on the front porch with their shotgun across the lap.
Ben Radyand Right, with the rocking chairs.
Matt GodboltYeah, that's right.
Waiting for a you know the...yea.
And then we could even have, you know, conversely, the some faster nodes that are serving the GCC, you know, 15.1s that have just come out and the trunk builds and things like that.
Ben RadyMm-hmm.
Matt Godboltand But our management is not good enough.
our ah are At the moment, having multiple clusters is painful for us.
And so we have kind of two or maybe three clusters, you know one for the GPU things, one for the ARM compilers, and then one for everything else.
Ben RadyYeah.
Matt GodboltAnd retrofitting in everything without breaking a site is so hard.
you know This is why you know when we did our talk, a conversation, and you you talked a little bit about your sort of branch-based deployment, spinning up that.
Ben Radyyeah Yeah.
Yeah.
Matt GodboltThat was like, oh man, I wish.
I wish we'd thought of that ahead of times.
Ben RadyRight.
Just twist, twist the knife.
I mean, you know, I, I think it sounds like at the scale that you guys are at right now, just spinning up one of those environments would be prohibitive in terms of cost, but maybe you could structure in a way where it wasn't quite so bad, you know?
Matt GodboltSo cost is not such a big deal anymore.
And I say that with a massive footnote.
Ben RadyYeah.
Matt GodboltPeople are surprised at how relatively cheap Compiler Explorer is to run.
We're currently at around about three grand a month ah burn rate of AWS stuff, although it's just gone up.
Ben Radyyeah
Matt GodboltBut it's gone up for good reasons.
The good reasons is that I've sent a message out, a blog post, in fact.
That's what we call these things, a message.
I've made a blog post kind of explaining
Ben RadyYou made one internet and you sent it out into the webs
Matt GodboltI did.
I sent it out into the webs about the cost breakdown of Compiler Explorer.
I did a big sort of dive into it so I could justify, you know, we're very lucky.
We have a lot of ah commercial sponsors.
We have a lot of people who ah donate on Patreon and GitHub sponsors and...
And my dog's barking, which I can't be bothered to edit out.
So we have a lot of money coming, which is fantastic.
And I like to be very upfront and open about what we do with the money as much as I can within the reasonableness of the fact that it's still
Ben Radyhu
Matt Godboltmy private finances at some level still, right?
Ben RadyRight.
Matt GodboltI'm sort of hand-waving and gesturing about this stuff stuff.
um So I like people to know where the money's going.
And so telling people like, this is what we spend it on and this is how much it breaks down to.
it was interesting.
And so it ended up on Hacker News, which was great.
And one person said, have you ever considered talking to, you know, the Grafana people or the...
SolarWinds or whatever, you know, the people that we pay money to for subscriptions for like monitoring stuff.
And I was like, yeah, kind of, but you know, there's something to be said for it's not that much.
It's not a huge, you know, that's costing me, you know, 40 bucks a month.
um That's kind of noise.
um And I don't want to give up too much of my, ah you know, I don't want wanna to sell out.
I would rather pay 40 bucks a month than them say, Hey, you have to put an ad on the top.
Ben RadyRight.
Matt GodboltIf I say, say thank you to, but and I'm like, ah, I don't know if they would do that, but that was, but it went, we went back and forth on that.
Ben RadyYeah, yeah.
Right.
Matt GodboltAnd then he was like, Oh, you do know that AWS have an open source budget.
And I'm like the what now?
Ben RadyOh, yeah.
Matt GodboltAnd so I looked it up and it was a blog post from like 2012 that made some mention of yeah you know, like, hey, if you're an open source project, contact us.
We might be able to help you.
Here's a form to fill in.
Ben RadyYeah.
Matt GodboltAnd I'm like, okay.
Now the form has looked complicated.
So given how old the blog post was, I just emailed the email address that it said and said, hey, is this thing still on, right?
Ben Radyyeah Right.
Matt GodboltI'm, you know, I'm the creator of Compiler Explorer.
I'm interested in talking if this thing's still on.
I immediately got an email bounce and I thought, well, there you go.
That tells me, I'm glad I did this rather than spending all the time to look at.
Ben RadyYeah.
Matt GodboltSo I thought nothing more of it.
An hour and a half later, somebody replied going, "oh, wow.
Yes.
We use Compiler Explorer.
We'd love to help definitely fill in the form and send it to us." And I look back at the bounce and it was like, oh, it must be one person's inbox is full.
Ben RadyOK.
Oh, yeah, OK.
Matt Godboltof the distribution list that it ultimately ended up.
Ben RadyYeah.
Matt GodboltSo I filled it in and thought, and it asked for what is a year's amount of money that your site or your, your project might need.
And so I was like, I guess three grand times 12 or four grand.
I can't remember exactly.
I think it was three, three, three and a half grand.
so So 36 grand, which, you know, it's like monstrous amount of money.
Ben RadyYeah.
Matt GodboltAnd I was expecting him to just say "hahahahah", no, seriously, how much do you need?
Um, And I thought, again, nothing more.
they that Someone much more business oriented got back to me and said, we'll be back.
We'll get back to you with it.
Thank you for your you know your email.
We'll be back to you within 30 days.
Ben RadyYeah.
Matt GodboltAnd I thought, all right, that's this is the now it's gone into bureaucracy.
We'll never see it again.
Ben RadyRight.
Matt GodboltAnd ah I just happened to log into my AWS account and I saw 36 grand credit had been applied to
Ben RadyWhat?
Yeah.
Matt Godboltit And I had to email them back and say like, okay, before I even talk to anyone, what are there?
Do I have to tell people about this?
Do I have to go out of my way to like, thank you?
Do I?
What's the deal here?
Ben RadyYeah.
Matt GodboltAnd the woman eventually got back to Oh, sorry.
I meant to tell you that you'd been approved.
I'm like, were you just going to let that hang?
Wow.
And so the short version, this is such a, off topic thing is that AWS are now funding the cost that I put forward, the year cost as it was last year, which means immediately you're like oh maybe we can start running more instances and whatever because I have the cash for it now so it feels like I'm treating it like a subsidy but what I'm mindful of is they may not renew at the end of this year in which case I have to be able to scale everything back again so there's a bit of thought
Ben RadyYeah, yeah, yeah.
Matt GodboltI don't know.
it's ah It's an interesting situation to be in, but it's an amazing situation in the short term.
It means like I'm looking at like Redis caching, whereas before I was like, ah I don't really think that I can justify you know another $100 a month just to have something that I might not use or these kinds of things.
Ben RadyYeah.
Matt GodboltSo it's very exploitative.
So I'm excited about that.
But how do we get to this?
Ben RadyOh, that's cool.
That's very cool, actually.
Matt GodboltI'm forgetting.
Oh, yeah, you were saying about how expensive it might be.
Ben RadyYeah, you know, the branch based stuff might be too expensive and, you know.
Matt GodboltBut now it might be okay.
Honestly, I look at my load balancer now.
Ben RadyYeah.
Matt GodboltSo load balancers cost, what, $10 a month plus the transfer?
Maybe a little more...
Ben RadyYeah, it's the data that you really pay for there, right?
Yeah.
Matt GodboltIt is, yeah.
And I've often wanted to have multiple load balancers, you know, one for each.
I used to have subdomains for like our staging environment and things like that.
Ben RadyYeah.
Mm.
Mm-hmm.
Matt GodboltAnd then I made it so that every subdomain ends up in godbolt.org, you know, comes to Compiler Explorer.
So you have to be like saying, www.staging.godbolt.org.
Ben RadyMm-hmm.
Matt GodboltAnd then you're into multi-level DNS and that's a pain in the bum because you can't do wildcards and all that kind of stuff.
You know, you know these things.
Ben RadyYeah.
Yeah.
Yeah.
Matt Godboltum But I stopped doing that because I was originally using it to route to a different um load balancer.
But you know to have one load balancer per environment was expensive.
So everything now goes to the same load balancer and it's URL match to go off to its its merry way.
And that's not scaling all that well now.
Ben RadyRight.
Yeah, yeah.
Matt GodboltI've got multiple of them and there's other things on there.
And you know there was a time when 10 bucks a month for another load balancer was like meaningful.
And now...
Ben RadyRight.
Matt Godboltnot to, you know, put too far to border.
Ben RadyYeah.
Matt GodboltThat's noise.
I don't worry about it.
So maybe I should go, maybe I should explore branch based development.
Ben Radyyou can spend 10 bucks a month to make your life a little easier, I would suggest that you do it.
Matt GodboltYeah.
Yeah.
That is, that is the cost.
I think right now is, you know, the trade-off between the cost of time.
Ben RadyYeah.
Matt GodboltAnd I have supposedly three presentations to prepare for and not be doing Compiler Explorer stuff.
And then I've got, kind of two and a half clear months before I have to get my, go and work for a real job.
Ben RadyWhat - a job?!
Matt GodboltI know.
Ben RadyThat sounds terrible.
Matt GodboltI know it does.
Ben RadyYeah.
Matt GodboltDoesn't it?
It sounds awful.
So yeah, I'm sort of very much top of mind thinking about how we're going to, uh, how going to go back to work, Ben.
don't know.
Well, that no
Ben RadyYeah.
I think in the first week you're going to be like, this is awesome.
That's what I predict.
You're just going to be like, oh oh, right.
Matt GodboltI reckon so too.
Ben RadyI remember why I love this.
Matt GodboltYeah, I think you're absolutely right.
Ben RadyYeah.
Yeah.
Matt GodboltI'm pretty bullish about it.
I check in with my new gig from time to time.
And, you know, I always come away feeling excited.
Buoyed [in a british acccents "boid"], as I would say, or Booid, would you say, as a yank?
Ben Radywho I would probably say buoyed.
But I wouldn't say either of those words.
I would just say excited because it's just too nautical.
Matt GodboltYou'd say excited.
Yeah, that was...
That
Ben RadyI'm not.
Matt Godboltsays a man who works for a company called Aquatic.
Ben RadyYeah.
Matt GodboltYeah.
Bowie.
Ben Radywe We have a service actually at Aquatic called buoy and I can, I like trip over it every time I say it or spell it.
Matt GodboltWhich...
It's...
Ben RadyBuied.
Matt GodboltSo in British English, that is boy.
It's always been boy.
Like, you know, what is the property of being able to float?
Ben RadyYeah.
Matt GodboltIt is...
Say it.
Ben RadyBooeyant.
Matt Godbolthey Yeah!
Listener, you look at the contortions on Ben's face as he tried to justify pronouncing it that way.
Yeah.
Yeah.
ah yeah
Ben RadyOkay.
Point taken.
Matt GodboltAll But that, you know, very few of these language based justifications hold water if you start looking too deeply because English is is not very logical.
Ben RadyYeah.
I think Well, none of them hold water with
Matt GodboltAnyway.
Ben Radybuoyant because they float.
That's the that's what you're doing.
Matt Godboltbut do
Ben Radyit's and Never mind.
I'll go home.
um
Matt GodboltMaybe we should actually somehow we've been talking for, well, I've been talking and you've been very kindly and listener has been very kindly listening to me vent my spleen.
Ben RadyOh, no.
I mean, I love these worst war stories.
I think we should do more of these.
It's like, let me tell you about this bug that consumed two days of my life.
Matt GodboltYeah.
Ben RadyThose are great.
Matt GodboltI mean, I think it's valuable sometimes to hear them.
I mean, so it's fun to tell them, but sometimes it's nice to hear them because then you secretly, you go back to your desk.
Ben RadyMm-hmm.
Matt GodboltYou're like, I don't feel so bad about spending four hours tracking down this thing now.
Ben RadyRight.
Yeah.
You know, it's like the old thing about like, you know, hacking and programming in movies is like people with like, you know, one hand on each keyboard and then like, you all the things scrolling by on the screen and the charts and graphs.
And in reality, it's just staring at a stack trace for 30 minutes going like, "I am bad at my job" [whispered].
Matt Godboltyeah Well, in case there was any doubt, you're not bad at your job.
I don't think I'm bad at my job, but yeah, feeling that way occasionally is...
Ben RadyYeah, that's just how you feel.
You're just like, oh, well how did this ever work?
I don't understand how it even ever worked, let alone what's happening now.
Matt GodboltYeah...
Ben RadySo, yes.
Matt GodboltYeah, like, who wrote this ...
you know the git blame?
And you're like, oh, oh, yeah.
Ben RadyRight.
Oh, it's me.
Yeah.
Matt GodboltAll right, friend.
Well, short of starting a whole new conversation, I think we should finish it up here unless there's anything you want Parting words of wisdom.
Ben RadyNo, that sounds good.
This was a good episode.
Matt GodboltThose were your...
Ben RadyI like it.
I dig it.
Ship it.
Matt GodboltFantastic, mate.
I will.
I will.
Ben RadyYeah.
Matt GodboltI will get the minions to edit and put it out soon.
And by the minions, I mean me.
Ben RadyPerfect.
Matt GodboltThere are no minions.
Ben RadyRight.
Matt GodboltCool.
Ben RadyCool.
Matt GodboltAll right, friend.
Until next time.
Ben RadyUntil next time.
