Episode Transcript
Hey, Ben.
Ben RadyHey, Matt.
Matt GodboltSo we yeah planned comprehensively, as always, and today's topic is going to be signals and processes.
Ben RadyYep.
Yeah, that was, that's
Matt GodboltAnd that is the sum extent of our planning.
Ben RadyWe said those words out loud annnnnd...
Record.
Matt GodboltAnd I said, yes, and hit record.
And then we continued talking about it during the intro.
And we're here.
So why is that top of mind for you?
Is there a reason why you are worrying about this right now?
Ben RadyThere's a reason that I'm worried about this right now, which is that I'm always worried about this because I see part of my job as a software engineer is making sure that the software that I write actually runs...
and does what it's supposed to do.
Matt GodboltMm-hmm.
Ben RadyUh, I know that there are lots of places in the world where as a software engineer, you're expected to write code.
And then there's another group or team or organization.
or outsourced company that is responsible for actually taking that software and running it on computers and making sure that it continues to run on those computers and that it delivers the value that it is intended to do.
And in some cases, those things are like very separated, right?
Matt GodboltRight.
Ben RadyLike,
Matt GodboltYou might just make a PR to a function.
You change the function, your tests pass, you check it in, and then you have literally no idea how it ends up serving people's requests or whatever it your company does.
Ben RadyRight.
Matt GodboltYeah.
Ben RadyRight, right.
And then on the other end of that spectrum, I think you can have situations and I have definitely been in these myself where it is like, no, we're building this for the very first time.
There's no infrastructure team.
There's you and you are going to compile your code.
you are going to SCP your code onto a server somewhere, and then you are going to run a screen and then exec that program in the screen.
Matt Godboltah Old school.
Ben RadyAnd now you can post in a Slack channel or some other but log place, hey, we've deployed to production.
Matt GodboltAnd by production you mean, yes, the only reason it didn't quit is because I'm still running it in a screen [session].
Ben RadyYes, exactly.
Matt GodboltThis is shades of
Ben RadyI did Control A, Control D in the screen, and now our production environment is safe.
Matt GodboltEverything's fine.
Ben RadyYes.
Matt GodboltEverything's fine.
What's your logging strategy?
Oh, we log back in and we reattach to the screen session to see what happens.
Ben RadyYou check the screen.
What's in the screen?
Matt GodboltYeah!
Okay.
That does work, but I can understand why, yeah, you might want something a little more sophisticated.
Ben RadyYes.
Well, and those are the two ends of the spectrum, I think, if we're going to simplify it down to a spectrum of like,
Matt GodboltYeah.
Ben RadyAnd, I think that you can in your career and I have done a lot of this as a software engineer, you can kind of like hop to the left.
I don't know....
the side of that spectrum and say, all right, well, okay.
I obviously don't want to run it in a screen.
What else could I do?
And then you start learning about like systemd and things like runit and supervisord and things like that.
Matt GodboltOr old school nohup was my...
Ben RadyYeah.
Right, nohup.
Matt GodboltJust nohup the thing and then log out and you're done, right.
Ben RadyExactly.
And then of course, you start moving into distributed environments, the cloud, you learn about Kubernetes and Elastic, what does the ECS stand for?
I forget.
and Elastic Compute Service?
Matt GodboltContainer store, compute something, container something.
Ben RadyNo, like Container Service.
Matt GodboltYeah.
Ben RadyYeah.
Matt GodboltOr you've got what's, what's the HashiCorp thing?
Ben RadyNomad.
Yeah.
Matt GodboltNomad, Nomad, similar things, you know, yeah.
Ben RadyNomad.
Yeah, yeah.
Matt GodboltAll of these things, which are like orchestration setups that say, Hey, you just tell me some through some mechanism, what you would like to have running and I'll find a place to run them and run them in a particular controlled way.
Ben Radyah huh
Matt GodboltAnd then you take that part of the deployment and running part is taken out of your hands.
It's done by a framework, but.
Ben RadyBut
Matt GodboltPresumably.
Yeah, go on.
Ben RadyBut all these things are accomplishing what is fundamentally the same goal, which is I have produced software and I want it to run.
on a computer or maybe multiple computers, maybe not multiple computers.
Matt GodboltYeah.
Ben RadyIt's like, oh, this needs to run, like exactly one, right?
Matt GodboltExactly one or [none at all].
Ben RadyLike there can only, there's like something is consuming a queue and there better only be one of them at a time or bad things are going to happen, right?
Matt GodboltYeah.
Ben RadySo I think all of that is kind of encompassed in this in this topic of like, I'm trying to run a program and how do I actually make sure that is happening the way that I want?
Matt GodboltYep.
Ben RadyAnd I think that we could even structure this from sort of the bottom up, right?
So we started with screen and I'm just running screen and now I've got a process and it's executing.
Matt GodboltWell, even screen is one level too far from, "I literally run the process and it's there and I'm watching it and I'm watching it I'll Control-C it", which you know is also valid, but it gives us a sort of starting point of like, what happens when you fire up a process and why is that not okay?
Ben RadyYeah, right.
Right.
Yeah, that's great.
I love this.
Okay, so it's like, so when you do that, you're like, all right, my plan for deployment now is I'm going to SSH onto the production server or EC2 instance or whatever you got, and I'm going to copy and SCP my bits up there, and then I'm going to run
Matt GodboltYeah, and let's not get into packaging and deployment.
That's even more complicated.
Ben RadyYeah, right.
Matt GodboltLet's leave it at that.
Ben RadyYeah.
Matt GodboltSome magical process happens and you have the bits that you need on that machine.
Ben RadyYes.
Yes.
I have my executable bits on the machine and then I'm just going to run it.
Matt GodboltAnd then...
Ben RadyWell, now what you have is you have a process whose child who's a child of an sshd process, right?
Matt GodboltProbably a child of the shell that you ran it on, depending on how you do it.
Ben RadyOh, yeah.
No, yeah.
That's right.
Yeah.
Matt GodboltI mean, if you're going to...
Ben RadyIf you're if you're looking at, no, no, you're absolutely right.
So if you got the tree, it's like, okay, it's a child of bash.
Matt Godboltpstree will show...
Ben RadyAnd then Bash is going to be a child of sshd.
And then that's going to be a child of the parent SSH server.
And then that's probably going to be a child of init, right?
Like roughly, am I
Matt GodboltOr which nowadays is probably systemd.
Ben RadyYeah.
Matt GodboltThorin, son of Thrain, son of Thror.
Ben RadyRight.
Matt GodboltIt's going to be your program, son of Bash, son of sshd child process, son of sshd parent process.
Ben RadyRight, right.
So.
Matt GodboltYeah, got it.
Yes, that makes sense.
All right.
Ben RadySo if you naively, or maybe like not naively, but you just sort of like just have enough knowledge to be dangerous, you're like, oh, I've got the ampersand operator in Bash that I could put at the end of that.
Matt GodboltYeah.
Ben RadyBecause it's like, okay, cool.
The production server is running on my laptop.
And if I put my laptop to sleep or, you know, the SSH session, the client is on my laptop.
The server is a server, but it's like, all right, I started on the server.
Matt GodboltYeah.
Ben RadyNow I want to go home.
I need to close my laptop lid and I need to leave.
Well,
Matt GodboltYeah.
Ben RadyWhat exactly is going to happen if I close this lid?
Like, I don't want it to stop, right?
So you're like, okay, well, here's what it...
Matt GodboltWell, let's talk about what happens in that situation, just to be absolutely clear.
Ben RadyYeah, okay, okay, go.
Matt GodboltRight.
So let me read this back to you.
So you're saying, yeah, you're running, as described, the production binary having SSH'd into a machine, and you've closed your laptop lid.
Ben RadyYes.
Matt GodboltAll right.
So assuming or even yeah assuming you just close your laptop lid and nothing shut down nicely, it just literally suspended, I don't actually know exactly what your laptop will do in this situation.
Ben RadyMm-hmm.
Matt GodboltBut let's just assume it disappears off the network instantaneously, which is also completely reasonable if you go into like a tunnel on the train on the way home, that kind of thing.
Ben RadyRight.
Yeah.
yeah
Matt GodboltRight.
Then eventually the TCP connection between your computer and the SSH daemon on the remote end will time out.
There'll be a keep alive that's missing probably, or some other heart beating mechanism will go down.
And the SSH daemon will say, hey, that person's gone now.
It's time to clean up their session.
It will, I think, kill the bash process.
And the bash process then will kill all of the children that it knows about, Something like that.
Or there's some...
sig...Yeah, so this is this is kind of it, right?
So what what is...
Ben RadyYeah, signals and processes, right?
Matt GodboltYeah.
I mean, I know that the result will be: my program will die.
Exactly how that dies, I'm not 100% sure, but that's what would happen eventually, maybe five or six minutes later when the SSH daemon times out your connection and says this person's not there anymore.
Ben RadyYes.
Matt GodboltIt kills the process tree through some mechanism and then, yeah, you get a phone call as you've got onto the train telling you that the production system is down.
Ben RadyRight.
Matt GodboltPlease fix it.
Ben RadyExactly.
Exactly right.
And so, and this is maybe where we troll our listener into, into posting the right answer on the internet to this, because I would suspect what probably happens is that the SSH daemon kills like the process group.
Matt GodboltOf course.
Ben RadyRight?
Matt GodboltYeah, because Bash becomes a a process group controller or whatever the name is...a leader.
Ben RadyYeah.
Matt GodboltProcess group leader.
That's right.
Ben RadyYeah.
Matt GodboltWhere's my Stevens book?
I haven't got it here.
No.
But yeah, there's...
Ben RadyYeah, but it's probably going to send a SIGTERM to that process group.
Matt GodboltOkay.
So...
Ben RadyAnd so every process in the process group is going to receive that term signal and then hopefully gracefully shut down.
I don't know if it follows it up with like a SIGKILL at some point or not.
Maybe it does.
Maybe it doesn't.
I'm not exactly sure what sshd would do there.
Matt GodboltNo.
No, but that would seem reasonable so that you never you don't end up with loads of processes that just decided not to kill themselves.
Ben RadyYeah, yeah.
Matt GodboltAnd frankly, I think Bash will probably do the right thing for that circumstance.
Ben RadyYeah, yeah, yeah.
So day one, we try to deploy like this.
Matt Godboltokay
Ben RadyWe close our laptop lid, we go home, we get the unfortunate call, and then we rush home, and then we open the laptop lid back up, and then we rerun the process.
All right, well, I can't do that.
So an enterprising person might say, okay what I'm going to do is I'm going to use the bash ampersand command because I know that will put a process into the background, right?
Matt GodboltRight.
Ben RadyAnd so I'm going to do that next time.
going to run, I'm going to do my deploy.
I'm going to put an ampersand on the end, right?
And then I'm going to like, now it's running in the background and now I shouldn't have to worry about this.
Matt GodboltAnd yeah.
Although if I were to do that with like a process like we were just talking about, the very first thing I would notice is that my shell prompt comes back and then immediately loads of junk from my log file is now appearing over top of what I'm running.
Ben RadyYes.
Matt GodboltSo even before we get into processes, and threads there's like a pragmatic thing.
Ben RadyYes.
Matt GodboltSo what I would probably do is redirect output to, you know, ~/log.txt and And then we'll put the ampersand on the end.
Ben RadyRight, exactly.
Matt GodboltSo that idea already, right.
So good.
And now it's in the background and I think we're great.
And I, you know, I'm tailing that log file for a bit and that's safe because that's a separate process.
Ben RadyYep.
Matt GodboltAnd now I close the laptop lid and get on the plane, a plane, train, whatever, any more mode of, what happens now?
Ben RadyRight.
Right.
Well, I think what happens, and I think this because I've had this burn me from time to time, is that, yes, you redirected standard out, but you did not redirect standard error.
And so there is actually still the daemon has a file handle that it thinks it needs to be writing to back to your thing.
And so you put this in the background and you do this again and it breaks again.
it does exactly the same thing all over again.
Matt GodboltWell, I think there's more than one reason.
So yes, first of all, standard error isn't going anywhere useful.
Ben RadyRight.
Matt GodboltThe second thing here is that although it is in the background, it's still a child of Bash.
Ben RadyYeah.
Right.
Matt GodboltSo you've got it's got you coming both ways.
And maybe thirdly, thirdful, is its standard input is still potentially connected to...
Ben RadyMm-hmm.
Matt Godboltthe console, the terminal, something.
I'm waving my hands a lot here because that's a very, I'm less sure about it.
But I certainly know that if you try and read from the console, you'll get one of the even more esoteric signals about like, hey, yeah, you can't, you're not connected to it right now.
[editing matt here, SIGTSTOP maybe?] And then you'll get stopped.
Ben RadyYeah.
Matt GodboltAnd so you'll see in Bash, stopped inputting required or something weird like that.
Ben RadyMm-hmm.
Matt Godboltum
Ben RadyMm-hmm.
Matt GodboltSo all of those would defeat you and you end up with a dead process.
Ben RadyRight, right.
So this is where you start investigating all of the various options that you can pass to SSH when you run this, because you're like, going to make a script.
I'm going to make a script that works, and I'm just going to run the script, and it's going to do my deploy, and then I'm going to trust that it works.
And you start learning about, OK, well, I need to do the option that like doesn't read from standard in, because I don't want the standard in problem.
And then I got to make sure that I redirect standard out and standard error so I can put this thing in the background.
Matt GodboltRight.
You're saying this right.
Just to be clear, these are options you say to SSH or to the bash.
Ben RadySSH, right?
Matt GodboltOh, I see.
So now, now we're not going to run bash at all.
We're just going to run the executable directly.
And, or what were you thinking?
Ben RadyWell, so you're going to run.
So I'm thinking of the world where it's like you do a thing, you like copy the bits up to the machine.
Matt GodboltUh huh.
Ben RadyAnd then you have like a separate SSH call where you're passing the command that you want to run as an argument into SSH.
Matt GodboltRight.
So you're no longer running an interactive session.
You're just going to...
Yeah, that makes sense.
Ben RadyRight?
Matt GodboltOkay.
Then that takes Bash out of the equation, which helps us a bit in this context.
Ben RadyYeah.
Matt GodboltAlthough there is a there is still another Bashian solution that I think I see people go for, which is you type disown in Bash, which says, push this thing and make it not a child of this process anymore.
Ben RadyAh, yeah.
Uh-huh.
Matt GodboltAnd that probably, probably...
might solve the problem most of the time, except you've left a big like rake in the grass for that because there are other processes in the system that might wish to get rid of that apparently now orphaned process.
Ben RadyYes.
Matt GodboltSo...
That's what nohup's for.
It's like it gets of the hang-up and there's some other things that it does.
And then there's daemonization and other bits and pieces, which which I'm sure we'll get to in a second.
But let's put that to one side and let's go down the rabbit hole that you've described, which is that like I'm now going to run SSH on my computer
Ben RadyOkay.
Yeah.
Matt Godboltum And I'm going to pass it rather than just SSH.
I'm going to do /path/to/my/executable with all the redirects and things set and try and run it from a server and have it live on the remote machine with all of the pipes and things stdin, stderr and stdout all connected to sensible places.
So go ahead.
Ben RadyYes.
Yes.
Matt GodboltSorry.
That's where I cut you off.
Ben RadySo, so you do that and then you should, I believe, be able to SSH in separately and do like a pstree and see that the parent process of this, the parent of this process is now one because it is disconnected from what it was doing before from the process group that it was in before.
Matt Godboltight.
Ben RadyUm, And at that point, you maybe have something where you can close your laptop and have it hang out.
Now, hopefully you sent your log somewhere sensible and you don't fill up the disk with logs.
You can pipe it into syslog, which is something that I do when I'm trying to punt on this problem entirely is I'm just like, you know what?
There's already a log rotation system on this machine and it's called syslog.
So I'm just going to pipe all my logs into that.
Matt GodboltRight.
And quite possibly you already have log aggregation set up for that so that you can go and read it on like a website and all that kind of nonsense as well.
Ben RadyMaybe you do if you're fancy.
Matt GodboltMaybe.
I mean, but yeah, if you're considering that option, you probably don't because you probably don't have any other infrastructure to lean on.
Ben RadyRight, right.
Matt GodboltYeah.
Okay.
So that seems reasonable.
Ben RadySo what do you do?
What do you do after this?
So you do this, you finally can go home now.
You can shut your laptop and go home.
And you're like, right, surely we can make this better than this.
Matt GodboltRight.
Ben RadyWhat do we do next?
Matt GodboltYeah, right.
So I still have...
Ben RadyDo you make the systemd job is what is...
I'm kind of questioning here.
Matt GodboltWell, see, I was thinking another thing.
So there is a process...
Process is a terribly overloaded term.
There is a sequence of things you can do on a POSIX system to become a daemon.
Ben RadyIt's special incantation.
You got to sacrifice something and that's how that works.
Matt GodboltThat's correct.
Yes, there's a pentagram involved and not a "Damon" also so because Matt Damon is the only "Damon".
Ben RadyYeah.
Uh-huh.
Right.
Matt GodboltSo aside here, so as you recall, one of the first ah folks at the company you still work at was also called Matt and was not me.
Ben RadyMm-hmm.
Mm-hmm.
Yep.
Matt GodboltAnd we were discussing various long-lived processes that we were designing a system to use.
And the obvious name was the Matt Daemon system.
To be pronounced Matt Damon, obviously.
Ben RadyRight, right.
Matt GodboltBut we never did it.
Anyway, daemonization is...
Let's not get into politics.
Becoming a daemon, as I understand it, is a multi-step process.
Ben RadyRight.
Matt GodboltThe first thing you need to do is fork, which gives you a new process, a shiny new process.
Then you call something called setsid, which says, I would like to become the session leader for this new process that I've been created because only a process group, and I'm doing this from memory, so listener, please.
And although Ben's nodding, this is not necessarily correct.
So just take this massive pile of [salt]
Ben RadyYeah, right.
Nope.
We may be hallucinating all of us.
Matt GodboltYes.
So you fork.
The child process then does setsid to become a process leader in its new group.
And then if I remember rightly, you have to fork again to then dissociate yourself from any last tendrils that previous process had.
And now you're running and you are completely in the clear.
It's something like that.
It's some weird sequence of events, which means that you have lost all connection with the previous process.
And so when you run some like system process and you pass it with --d or -d, sorry, then, and it immediately returns and disappears.
Apparently like, "Hey, did it do anything?" But you know, you you run PS and it's still running.
That's the kind of process that it's been through.
And you're, you know, you can type jobs and it won't be there.
It's like completely lost from you.
And probably...
Ben RadyYeah.
Matt GodboltI don't realize that the thing you were just talking about and I'm having the penny is dropping now some of the flags that you were talking about finding for SSH to set it up correctly might be the ones that effectively have the same side effect but I having just written something that is a daemon for the if you go back to the systemd conversation we were having last time something became a daemon and I went through that process so it's a bit
Ben RadyYeah
Matt GodboltSomewhat in top of mind.
And even though I had a daemonization thing there, I still, you can choose, I think, systemd, which we're going to, to say either systemd runs the process and does that for it in its own container, or it's expecting it to run in that particular way.
ah And so it can babysit different types of processes, if I remember rightly.
Okay, let's go back to what you said about systemd, because that sounds like a useful thing to know about.
What is systemd?
Ben RadyRight.
So so so the so just to put the problem in context, systemd is a solution to a problem.
What's the problem?
Well, so here's the problem.
So you've written your script.
Matt Godboltah Yes.
See the last conversation we had about it was to what of solution it might be.
Ben RadyRight.
Matt GodboltWhat problem it is.
Ben RadyWhat problem are we creating by solving another problem?
Matt GodboltYes
Ben RadyRight?
I think actually...
Matt GodboltYeah
Ben RadyIs that a thing?
I feel like I've said this before on the podcast.
I don't remember the difference between computer science and software engineering.
We know this one computer science is solving problems with computers.
Software engineering is solving the problems that you create when solving problems with computers.
And ah this is a, this is exactly.
Matt GodboltYes, that follows.
and That checks out.
The maths checks out for that for certain.
Ben RadyYeah.
um And so what problems are we, are we both solving and creating by using systemd?
Well, so you write your bash script, it deploys your thing.
You shut your laptop and then you wait five minutes, you open it back up and then you have [a check?] and it's still running.
And you're like right, I think I maybe believe that this is going to work.
And you go home and the next day you come in and still running.
Cool.
And then three days later it crashes.
And you're like, what would have been super cool is instead of me getting a phone call in the middle of the night because it crashed, if it had just restarted.
Matt GodboltWell, I mean, wouldn't cool if it hadn't crashed would be what the first thought you'd have.
Ben RadyTrue.
Matt GodboltBut at three in the morning, you probably just want to go, ah for God's sake, just restart the thing.
Ben RadyJust restart it, please.
I'll fix it tomorrow.
But can we please not call me because I have to SSH back in and rerun the script again or whatever, right?
Matt GodboltRight, right, right.
Ben RadySo you're like, I just want this to restart.
And then you Google and you're like, well, maybe I should run this in systemd, right?
And so you wind up making a whole systemd job definition.
And you, I forget where do you put it.
You put it in /etc/something, right?
Matt GodboltOr is it?
Yeah.
So there's...
Ben RadyI don't even remember now.
Matt GodboltSo, I mean, my understanding is in the beginning, there was init.
And init is effectively the first thing that the kernel executes...
Ben RadyMm-hmm.
Matt GodboltAs a user process and it then decides what to do.
And back in the mysteries of time, there were like run levels and it was all like clever directory structures and things like that.
Ben RadyOh, yeah.
Matt GodboltAnd it just fired up the right sequence of daemon processes.
One of which would be, you know, sshd so you could log into the machine or a getty that would let actually let you type on the console to get into the machine.
Ben RadyMm-hmm.
Matt GodboltAnd that was it.
And then after that, you're off the races.
And systemd is the new init.
And instead of it being,
Ben RadyMm-hmm.
Matt Godbolt...a set of of essentially shell scripts that get run to fire things up in the right order.
Again, I'm probably a bit...
missing loads of bits of context here, but it's a sort of a more principled approach where you have units that are like, I would like this thing to run, please.
I would like this to be true under these circumstances.
And it depends on these other things that also need to be either running or at least have started before me.
And so instead of having essentially numbered directories with, you know, 40.do-this , 41.do...,
Ben RadyYeah, RC dot D or RC dot one RC dot two, something like that.
Matt GodboltYeah, those were the run levels, I think, which was slightly different because it's single user mode versus multi-user mode.
Ben RadyYeah, something like that.
Yeah, right.
Matt GodboltBut this is more like, hey, what sequence do I need to run things in and shut them down in, in order for my system to come up?
Ben RadyMhm.
Matt GodboltAnd systemd does that kind of the right way by actually tracking dependencies, which again was expensive and caused me problems in our last conversation, but is is the right approach and the correct thing to do.
And so that's what systemd is.
It's like the overarching orchestrator of a computer and all of the processes that are running on it.
Ben RadyMhm.
Matt GodboltAnd so, yes, to make something run in systemd, you put a file in the right magical place.
You issue the correct incantation to systemd to go and notice that file is there.
And then what?
Ben RadyAnd then need to reload the system daemon.
Matt GodboltI'm looking at you because I thought you might've just done this and you could answer the question.
Ben RadyYes.
reload the systemd
Matt GodboltYeah, there's like daemonctl reload or something.
That's the magical incantation that says, hey, systemd, look through your configuration files.
Ben RadyTes
Matt GodboltSomething has changed.
Ben RadyYes.
Matt GodboltPlease do the needful now.
Ben RadyAnd then it should start up and then you're using something like journalctl to look at the logs of the thing to make sure that it started.
Matt GodboltWhich...
is I think for most people, when Linux systems particularly moved from init to systemd, the biggest frying pan to the side of the head was, where are all my chuffing logs?
They used be /var/log/whatever, and that's burnt into my mind.
They are text files and are in /var/log/blah, and systemd stopped that.
And now there are a few logs in /var/log, but nowadays you have to interact with it through, and it has a binary log file format, as I understand it, behind the hood.
And you have to learn journalctl, which I still haven't learned, and I still Google the same thing over and over and over again and type in the thing that it tells me to do, which...
Ben RadyYeah.
Right.
Matt Godboltis ...note to self.
Don't, don't do taking a note here.
Don't do that.
Make a cheat sheet for it and stick it to my monitor.
Like all the other cheat sheets I have.
Yeah.
So that was, but that was like the, but that broke most people, I think, because I didn't have to interact with adding and removing daemons from my system.
That's what, you know, my package management system did.
But whenever something went wrong, I'm like, where the hell's the log file?
Anyway, so journalctl.
Ben Radyright it's and It's in this magical program called journalctl.
um OK.
I feel like this is like I want to go to the next level now.
Matt GodboltSo...
Ben RadyIt's like, OK, cool.
We're going run this on like two computers could because ah we discovered that the reason it crashed is it got OOM killed.
Matt GodboltWell, let's finish the thought.
So just to be...
Right, right, right.
let Let's just let's just um finish the thought there.
So very concretely, you would install the binary to a known good location, which you probably were anyway.
Ben RadyYeah.
Yeah.
Matt GodboltIt wasn't just your home directory, hopefully.
Ben RadyPick a user that you're going to run it as.
Matt GodboltMaybe it was.
Yes, that's true.
Ben RadyMight be root, might not.
Matt GodboltYeah, let's hope it's it avoids being root if it can.
Ben RadyYeah.
Matt GodboltBut then, yeah, you make a little text file that sort of, it looks like Toml-ish to me, that systemd config-ish file that says, hey, I need these things.
Ben RadyYeah.
Yeah.
Matt GodboltI provide these things, which you often don't have to do.
Ben RadyMm-hmm.
Matt GodboltThis is how I'm going to be started up.
This script needs to run before I run.
Ben RadyYeah.
Matt Godboltthis needs This script needs to run after I run.
There's a few, like, customization points you've got like that.
And you can say what you're wanted by as well.
So in this instance, you probably say I'm wanted by multi-user.target, which is like a magical sort of target that says, hey, when it becomes a multi-user system, the fifth, whatever, um run level five, then this is, I am saying that I am wanted by it, which is a way of you kind of going the other way around from the usual dependency saying it depends on me.
Ben RadyYeah.
Matt GodboltAnd that means...
Ben RadyRight.
You're joining the dependency tree there.
Yeah.
Matt GodboltYeah, so now when you start when you reboot the machine, your service will come back up.
And then you can have some policies about retrying, restarting it, maximum number of times to restart, how often to wait between how long to wait between them, those kinds of things.
Ben RadyMm-hmm.
Matt GodboltAnd then effectively, it runs itself after that.
So that's what we do.
Yeah.
Ben RadyYeah.
Matt GodboltAnd so your installation process is copy the binary bits up and make sure that this systemd configuration is there.
Ben RadyYeah.
Matt GodboltAnd then obviously if you want to restart it, there are processes for restarting service, restart and all that kind of good stuff.
Ben Radyyeah Yeah.
servicectl?
Matt GodboltYeah, is that what you use?
I still use service space, service name restart.
Ben RadyI think that's one.
I don't know.
Matt GodboltThere's there's almost certainly a hundred ways to do it.
Honestly, I still want to go var run blah or whatever the whole old thing was.
Ben RadyYeah.
Matt GodboltI actually don't know what this command is, but it just comes out of my fingers when I need to say, make that thing run again.
um But yeah, service space, name of thing, space restart is now what I've learned to do.
But Okay, so that's where we are.
Ben RadyOkay.
Matt GodboltRight, okay, so now now we're good, right?
Ben RadyYes.
Matt GodboltWe know that the process is being appropriately managed by a piece of software that's designed to start it up at the right time and keep it running.
It also has some handling for like, if it does output to standard out, it'll go to a well-defined log place inside this journalctl thing.
Ben RadyMm-hmm.
Matt GodboltIf it crashes, it will restart it.
If you reboot the machine, it'll come back up with it if you set that to be so.
so Everything is wonderful.
So what's next?
Ben RadyRight.
So what's next is that you discover that the thing just crashes every four or five days ah because it's running out of memory because it needs to run on more than one computer.
It is too big.
So you have to now run it on multiple computers and you have to distribute whatever work it's doing.
Matt GodboltWe're assuming you've ruled out the, there's a memory leak type issue here.
Ben RadyYes, it's not a memory leak.
Matt GodboltYeah.
We're just, yeah, yeah, yeah.
Ben RadyIt's just too much data.
Matt GodboltIt's just like, Hey, it's too much.
Ben RadyYeah.
Matt GodboltSo what do we do now then?
Ben RadySo now we need to run it on multiple computers.
And so like one thing you might reach for here is Ansible maybe?
Matt GodboltI was going to say, is probably duplicating the line in the "scp shh machine service blah restart" and just do "for host in".
Ben RadyRight.
Yes.
For host and host list.
Yes.
Matt Godboltyeah
Ben RadyUh-huh.
And just do the exact same thing.
Matt GodboltSo that's the first thing I would do, right?
Ben RadyYes.
Matt GodboltAt least to start with, right?
That's the V0 of anything is like, well, okay, let's deploy it to the two computers I know about right now and just do the same thing on both of them.
Ben RadyRight.
Yes.
Yeah.
Matt GodboltAnd then, okay.
Ben RadyThat is probably what I would do.
And then I would have the thing where I would try to deploy it and there'd be some package or some configure.
Oh, we got to increase the size of the maximum size of the receive buffers on the network.
And so now I've got to like go and change that configuration.
I gotta change it.
And I've already scaled this out to like 10 computers now, like every month for the last, you know, 10 months, I've been just adding another computer to my to the list of hosts.
Matt GodboltYou've been adding another the host to the list of hosts.
Yeah.
Ben RadyAnd now it takes like, you know three minutes just to iterate through all of them.
and I'm like, oh, and I have to remember to log in and set all these settings every time I add a new host and it's getting worse and worse and worse.
Matt GodboltOkay, so we've now gone firmly outside of signals and processes.
And now this is like the setting up of the machine here is what you're talking about, which is valid.
Ben RadyWell.
Matt GodboltAnd if you think of, you know, the system, ah sorry, the systemd configuration unit file, whatever we just said, as being part of this machine configuration, then it does make sense to talk about some of the other things that you might need that machine to have set up like packages.
And as you say, system settings.
So yeah let's segue into that.
Let's do it.
Ben RadyYeah.
Yeah, OK.
So you've decided that now, okay I need to retire this bash script.
It's served me well, but it's time to move on to something a little bit where I don't have to like build all this stuff myself and make sure that it works and troubleshoot it all.
So I'm going to try to use Ansible.
Let's just say.
Matt GodboltAnd what is Ansible and what makes something able to be ansed, which is presumably what it means?
Ben RadyAnd well, first you have to have pants and you can have ants in your pants and then Pantsible.
Matt GodboltThat would be pansible.
Ben RadyThat's going to be the fork of Ansible is Ansible.
Matt GodboltOkay.
Okay.
Ben RadySo Ansible is, uh, honestly a tool that I have only used sometimes.
It is not, I sort of like wind up making the jump from like, the shell script to like terraform.
That's usually what I do is I'm like, all right, I'm going to go and I'm going to have something like nomad manage these, or I'm going to manage them in the cloud, just making Docker containers.
Matt GodboltI see.
So at that point, you jump straight out into sort of an orchestration environment as opposed to I'm controlling individual machines, because that's the other thing in here, that host list and the provisioning of those machines.
Ben RadyYeah.
Yeah.
Matt GodboltWe're assuming that these machines exist and you haven't got to like make them appear in EC2.
Ben RadyYeah.
Matt GodboltBut let's go through what Ansible is, because I think that is interesting.
Ben RadyYes.
But, but real, but real high level Ansible is you write a playbook.
And I think that playbook is pretty much in YAML and it's got like the steps that you want to perform.
And there's like a lot of sort of baked in things of like, "Oh, I need to copy this artifact from this place to this place".
Cool.
I need to create a, configuration file here.
Cool.
I need to restart systemd.
Cool.
It can do all those things for you.
And there's lots of baked-in tools in Ansible to sort of do the typical system management things: You can install packages.
You can create users.
You can..
you know, because it's like hopefully, like you said, we weren't running this thing as a root.
So we had a dedicated user for it.
I need when I'm setting up a new machine, I need to make that user.
I need to make sure they don't have a password, that they have the right SSH keys, you know, all those kinds of wonderful things.
So you have some, you know, script or some playbook that you run, you know, as root because it needs to be able to do all these things.
But then it sort of sets up the environment and then like subsequent deploys and things can, you know, kind of make it that the program can run as a user and it doesn't need to root.
Matt GodboltGot it.
Right.
That makes sense.
So it is essentially a canonifi...canonific..., that word, of what, the steps that you need to do the playbook.
Ben RadyYeah.
Matt GodboltI mean, that's a good name for it, right?
Ben RadyYeah.
Matt GodboltLike it, it, it replaces the playbook, which is the, you know, the Google doc that you have that says, when, remember when you create a new machine, here's the 25 steps that you have to do.
Ben RadyMm-hmm.
Matt GodboltAnd you kind of roll your eyes and do them.
And it's like, well, let's automate this.
And it does it in a principled way using, with a bunch of support files that help you, ah make sort of support functionality that lets you do like add user rather than having to go whatever steps you actually have to take to add the user, which I forget these days.
Ben RadyYeah
Matt GodboltOkay.
That makes sense to me.
I think one of the things that I have had difficulty in getting my head around when looking at these sets of tools and only because you've mentioned Terraform.
One thing I like about something like Terraform is that you kind of describe the end state
Ben RadyYeah.
Matt GodboltAnd Terraform's responsible for getting whatever the current state is to the end state.
Ben RadyYeah, yeah.
Matt GodboltSo, whereas with things like Ansible, as I understand it, is you have to be very careful to either be idempotent so you can run the same thing twice and it doesn't re-add another user if there is one already called that thing.
Ben RadyRight.
Matt GodboltOr you just have to not don't run that step again.
You know, like, hey, once we add that user, don't try and do it again.
And then you kind of go like, well, now I want to change the user to have a different you know full name or a different shell or whatever.
Ben Radyyeah
Matt GodboltYou're like, now I have to run the change command and I can't just change the add.
Ben RadyRight
Matt GodboltAnd Unix systems are so, so complicated.
I can't actually imagine how you could write a more general purpose like make my system look this way thing except for at least one listener somebody is currently shouting "Nix" into the void as they're walking along and I know that Nix solves this in a very cool way and I'm very excited by it but I don't have any personal experience with it other than someone demoing to me and me going wow that is super cool.
Ben RadyYeah.
Matt GodboltBut so just for that, yeah, Nix seems to be, it seems to be like a kind of,
Ben RadyI've heard those same things about Nix, but I have, again, no personal experience.
Matt GodboltA mind virus that people get, not in a bad way necessarily.
That does sound pejorative, but like, cause once you get it, I think you're like, Oh my gosh, this is how everything should always be done.
Ben Radyyeah yeah
Matt GodboltAnd that's great.
And you become like proselytize it to everybody.
And then most people's eyes glaze over.
Ben RadyRight
Matt GodboltAnd then you're like, that seems great.
And then you just log back onto the machine and just go "sudo apt install bob".
And you're like, there we are.
We're done.
Anyway, back to, oops, I've just banged my, yeah but sorry, editing Matt.
You just, I've just whacked the microphone stand.
[that's ok, I didn't edit it out -editing Matt] Where were we?
So I was sort of saying that there's this sort of difference between sort of prescriptive run these things in order and maybe they're idempotent or maybe they can adapt and say like, well, if's if there's a user already there, don't re-add it, that kind of feeling.
Versus the Terraform thing where you just say I should like this to be the end state.
Here is a list of users the machine has to have with the properties that users have.
Ben RadyRight.
Matt GodboltAnd then Terraform goes behind the scenes and goes, well, why don't I look at what users I've got?
Oh, now I'll make a plan.
A plan is add three users, delete one user, and presents it to you says, this is what I'm going to do.
Ben RadyYeah.
Have you ever actually used Terraform to do that type of system administration before?
Matt GodboltNot on a system, no.
I've only ever done it with infrastructural components.
Ben RadyRight.
Yeah.
Matt GodboltSo yes, that is true.
I've never used it for a you
Ben RadyThat'd be amazing.
I don't know if I can do that, actually.
Matt GodboltI don't know that it does.
Ben RadyThat'd be amazing if you could do that.
Matt GodboltYou're right.
Yeah, now I say.
But but suddenly, that's where where I was going with that.
Was less that Terraform specifically, but like the phrasing is either outcome or steps.
Ben RadyYeah.
Matt GodboltAnd you know it's nice to supply the outcome.
But yeah, I don't know if something does exist.
And my only interaction with things like that are with Packer, where I always start from an empty image and then run the sequence of steps to make an image that looks the way I want it to.
Ben RadyMmm.
Matt GodboltSo I never go back to it and kind of go, hey, I want that image, but slightly different.
Ben RadyYeah.
Matt GodboltSo yeah, anyway.
Ben RadyYeah.
Yeah.
Matt GodboltWe're all over the place.
Ben RadyBut yeah, maybe that's the, I feel like this, this podcast is like the rough draft of a conference talk.
Cause it's like, imagine that you want to run a program.
Matt Godbolt[laughing]
Ben RadyWhat do you do?
And you we just sort of work up from the bottom up.
And then I feel like the, it'd be good talk, right?
Matt GodboltI think that's a...
When was the last time you gave a conference talk?
Come on, it's your turn.
Ben RadyOh, it's been a long time.
I, I, I'm probably overdue, honestly.
Matt GodboltBecause...very much part of the, the last week's conversation.
The reason I was looking into that was because I was avoiding writing several conference talks that I have to give in about a month's time.
And a week has passed since we last spoke; now I'm giving away all of our secrets.
Although much longer will have passed in real time.
And I've probably given the conference talk by the time I've released this.
um So listen, you can be the judge of whether it was any good or not.
But yeah, I have done no work on it at all.
So...
..oops.
But yeah, this is a rough draft of a conference talk on...
Ben RadyIt is.
Matt Godbolt"So you want to deploy a service" or "So you want to run a service?"
Ben RadyYeah, exactly.
So you want to run some software, right?
Matt GodboltYeah, yeah.
Ben RadyHow are you going to do it?
And I feel like the punchline of this is like, okay, and now we're migrating this all to the cloud and we're going to use Terraform.
We're going to use GCP or maybe you have like, you know, ah a lot of companies I feel like these days have like and essentially like an internal cloud.
Like they're still using Terraform, but they're using tools like Nomad and they have their own, you know, physical servers and they have an infrastructure team that's managing it all.
And this maybe leads us back.
This is how you get this.
Okay.
This is the whole ...
This is how you get into the state where you're just like, yeah, I just like changed one function with some unit tests and pushed to PR and I have no idea where goes.
Matt GodboltYeah, that's exactly right.
Yeah.
Ben RadyYeah.
Uh-huh.
Yeah.
And now, and now the circle is complete.
Matt GodboltWell...
And now the circle is complete.
Yeah, I think we've we've probably yeah reached a good spot then.
Ben RadyYeah.
Matt GodboltYeah.
It's good to know these.
I think like all of these, like everything we talk about, really, certainly everything that I hold dear that we talk about on this ah this podcast is all about finding the right level of abstraction, knowing that there's a level beneath you.
Ben RadyYeah.
Matt GodboltWhich in this case, you know maybe your level of abstraction is those cloud tools that we've just been talking about and the services that run.
But knowing enough about the level beneath you to say like, okay, I do know that there are processes that run and that something is taking care of the input and output for those processes and making sure the right signals get to them at the right time and not the wrong things like me logging out.
Ben RadyYeah
Matt GodboltBut I don't know that it exists and maybe I could sketch something, but I don't necessarily know off the top my head.
And then you should know beneath that what...
that something exists, right?
Beneath that layer, we know that there is a systemd and I don't know how that works, but it's always good to have a decent understanding of the level beneath where you're working and then be aware of the layer below that.
Ben RadyRight.
Know vaguely what to Google or ask ChatGPT, right?
Matt GodboltRight.
Or ask your favorite Large...
Ben RadyYeah.
Yeah.
Matt GodboltYeah.
Ben RadyAsk your favorite LLM.
Matt GodboltYeah.
Yeah.
And so I think this plugs into that kind of mindset completely as like, you know, yeah, it's kind of like know how the cloud works and then...
Ben RadyYeah.
Matt Godbolt...know where to look when it doesn't work.
Ben RadyMm-hmm.
Mm-hmm.
Yeah.
Like if you the honestly the only downside to this is that in those environments, I feel like where you have those like, you know, a million layers of abstraction between you and the physical server.
Matt GodboltCool.
Ben RadyIf you're like an old fuddy daddy like us and you're like, can I just SSH in?
It's like, no, you can't have root.
It's like, whohe ah what why?
i know exactly what to do.
I know exactly how to fix this problem.
And now I'm going to have...OK, fine.
Sure.
Matt GodboltYeah.
Ben RadyWhatever.
Matt GodboltWell, and of course, the irony is, they can probably give you root, but it's not even on the real computer because you're several layers of virtualization away from the machine that's actually running.
Ben RadyYeah.
Mm hmm.
Right, yeah, exactly.
Matt GodboltYou talk about the metal.
Ben RadyIt's like it's running in the container service.
There's no root to give you.
Like you can't get there from here, right?
Matt GodboltYeah.
Ben RadyYeah, yeah.
Matt GodboltYeah.
Cool.
All right, friend.
Well, this has been great.
Ben RadyYeah, yeah.
Matt GodboltWe jammed it.
We did it.
Ben RadyNot bad for winging it.
Matt GodboltYeah, listener, you can let us know.
Post a comment somewhere.
I mean, some people watch this on YouTube and that's where I see most of the comments and then otherwise tweeted us or hachyderm.io mastodon-y thing or so just email us.
Ben RadyYeah.
Yeah.
Mastodon.
Matt GodboltYou can get us.
But we'd we'd love to hear what you think and what we're doing right and wrong because we've never really asked that.
Ben RadyThat's not hard either.
Yeah.
Matt GodboltWe just do this for us.
This is just our excuse to catch up, isn't it?
Ben RadyYeah, that's true.
Matt GodboltCool.
Ben RadyThat's true.
Matt GodboltAll right, friend.
Well, have yourself a great weekend and I'll speak to you soon.
Ben RadyAll right.
Until next time.
Matt GodboltUntil next time.
