Navigated to Running Programs - Transcript

Running Programs

Episode Transcript

Matt Godbolt

Hey, Ben.

Ben Rady

Hey, Matt.

Matt Godbolt

So we yeah planned comprehensively, as always, and today's topic is going to be signals and processes.

Ben Rady

Yep.

Yeah, that was, that's

Matt Godbolt

And that is the sum extent of our planning.

Ben Rady

We said those words out loud annnnnd...

Record.

Matt Godbolt

And I said, yes, and hit record.

And then we continued talking about it during the intro.

And we're here.

So why is that top of mind for you?

Is there a reason why you are worrying about this right now?

Ben Rady

There's a reason that I'm worried about this right now, which is that I'm always worried about this because I see part of my job as a software engineer is making sure that the software that I write actually runs...

and does what it's supposed to do.

Matt Godbolt

Mm-hmm.

Ben Rady

Uh, I know that there are lots of places in the world where as a software engineer, you're expected to write code.

And then there's another group or team or organization.

or outsourced company that is responsible for actually taking that software and running it on computers and making sure that it continues to run on those computers and that it delivers the value that it is intended to do.

And in some cases, those things are like very separated, right?

Matt Godbolt

Right.

Ben Rady

Like,

Matt Godbolt

You might just make a PR to a function.

You change the function, your tests pass, you check it in, and then you have literally no idea how it ends up serving people's requests or whatever it your company does.

Ben Rady

Right.

Matt Godbolt

Yeah.

Ben Rady

Right, right.

And then on the other end of that spectrum, I think you can have situations and I have definitely been in these myself where it is like, no, we're building this for the very first time.

There's no infrastructure team.

There's you and you are going to compile your code.

you are going to SCP your code onto a server somewhere, and then you are going to run a screen and then exec that program in the screen.

Matt Godbolt

ah Old school.

Ben Rady

And now you can post in a Slack channel or some other but log place, hey, we've deployed to production.

Matt Godbolt

And by production you mean, yes, the only reason it didn't quit is because I'm still running it in a screen [session].

Ben Rady

Yes, exactly.

Matt Godbolt

This is shades of

Ben Rady

I did Control A, Control D in the screen, and now our production environment is safe.

Matt Godbolt

Everything's fine.

Ben Rady

Yes.

Matt Godbolt

Everything's fine.

What's your logging strategy?

Oh, we log back in and we reattach to the screen session to see what happens.

Ben Rady

You check the screen.

What's in the screen?

Matt Godbolt

Yeah!

Okay.

That does work, but I can understand why, yeah, you might want something a little more sophisticated.

Ben Rady

Yes.

Well, and those are the two ends of the spectrum, I think, if we're going to simplify it down to a spectrum of like,

Matt Godbolt

Yeah.

Ben Rady

And, I think that you can in your career and I have done a lot of this as a software engineer, you can kind of like hop to the left.

I don't know....

the side of that spectrum and say, all right, well, okay.

I obviously don't want to run it in a screen.

What else could I do?

And then you start learning about like systemd and things like runit and supervisord and things like that.

Matt Godbolt

Or old school nohup was my...

Ben Rady

Yeah.

Right, nohup.

Matt Godbolt

Just nohup the thing and then log out and you're done, right.

Ben Rady

Exactly.

And then of course, you start moving into distributed environments, the cloud, you learn about Kubernetes and Elastic, what does the ECS stand for?

I forget.

and Elastic Compute Service?

Matt Godbolt

Container store, compute something, container something.

Ben Rady

No, like Container Service.

Matt Godbolt

Yeah.

Ben Rady

Yeah.

Matt Godbolt

Or you've got what's, what's the HashiCorp thing?

Ben Rady

Nomad.

Yeah.

Matt Godbolt

Nomad, Nomad, similar things, you know, yeah.

Ben Rady

Nomad.

Yeah, yeah.

Matt Godbolt

All of these things, which are like orchestration setups that say, Hey, you just tell me some through some mechanism, what you would like to have running and I'll find a place to run them and run them in a particular controlled way.

Ben Rady

ah huh

Matt Godbolt

And then you take that part of the deployment and running part is taken out of your hands.

It's done by a framework, but.

Ben Rady

But

Matt Godbolt

Presumably.

Yeah, go on.

Ben Rady

But all these things are accomplishing what is fundamentally the same goal, which is I have produced software and I want it to run.

on a computer or maybe multiple computers, maybe not multiple computers.

Matt Godbolt

Yeah.

Ben Rady

It's like, oh, this needs to run, like exactly one, right?

Matt Godbolt

Exactly one or [none at all].

Ben Rady

Like there can only, there's like something is consuming a queue and there better only be one of them at a time or bad things are going to happen, right?

Matt Godbolt

Yeah.

Ben Rady

So I think all of that is kind of encompassed in this in this topic of like, I'm trying to run a program and how do I actually make sure that is happening the way that I want?

Matt Godbolt

Yep.

Ben Rady

And I think that we could even structure this from sort of the bottom up, right?

So we started with screen and I'm just running screen and now I've got a process and it's executing.

Matt Godbolt

Well, even screen is one level too far from, "I literally run the process and it's there and I'm watching it and I'm watching it I'll Control-C it", which you know is also valid, but it gives us a sort of starting point of like, what happens when you fire up a process and why is that not okay?

Ben Rady

Yeah, right.

Right.

Yeah, that's great.

I love this.

Okay, so it's like, so when you do that, you're like, all right, my plan for deployment now is I'm going to SSH onto the production server or EC2 instance or whatever you got, and I'm going to copy and SCP my bits up there, and then I'm going to run

Matt Godbolt

Yeah, and let's not get into packaging and deployment.

That's even more complicated.

Ben Rady

Yeah, right.

Matt Godbolt

Let's leave it at that.

Ben Rady

Yeah.

Matt Godbolt

Some magical process happens and you have the bits that you need on that machine.

Ben Rady

Yes.

Yes.

I have my executable bits on the machine and then I'm just going to run it.

Matt Godbolt

And then...

Ben Rady

Well, now what you have is you have a process whose child who's a child of an sshd process, right?

Matt Godbolt

Probably a child of the shell that you ran it on, depending on how you do it.

Ben Rady

Oh, yeah.

No, yeah.

That's right.

Yeah.

Matt Godbolt

I mean, if you're going to...

Ben Rady

If you're if you're looking at, no, no, you're absolutely right.

So if you got the tree, it's like, okay, it's a child of bash.

Matt Godbolt

pstree will show...

Ben Rady

And then Bash is going to be a child of sshd.

And then that's going to be a child of the parent SSH server.

And then that's probably going to be a child of init, right?

Like roughly, am I

Matt Godbolt

Or which nowadays is probably systemd.

Ben Rady

Yeah.

Matt Godbolt

Thorin, son of Thrain, son of Thror.

Ben Rady

Right.

Matt Godbolt

It's going to be your program, son of Bash, son of sshd child process, son of sshd parent process.

Ben Rady

Right, right.

So.

Matt Godbolt

Yeah, got it.

Yes, that makes sense.

All right.

Ben Rady

So if you naively, or maybe like not naively, but you just sort of like just have enough knowledge to be dangerous, you're like, oh, I've got the ampersand operator in Bash that I could put at the end of that.

Matt Godbolt

Yeah.

Ben Rady

Because it's like, okay, cool.

The production server is running on my laptop.

And if I put my laptop to sleep or, you know, the SSH session, the client is on my laptop.

The server is a server, but it's like, all right, I started on the server.

Matt Godbolt

Yeah.

Ben Rady

Now I want to go home.

I need to close my laptop lid and I need to leave.

Well,

Matt Godbolt

Yeah.

Ben Rady

What exactly is going to happen if I close this lid?

Like, I don't want it to stop, right?

So you're like, okay, well, here's what it...

Matt Godbolt

Well, let's talk about what happens in that situation, just to be absolutely clear.

Ben Rady

Yeah, okay, okay, go.

Matt Godbolt

Right.

So let me read this back to you.

So you're saying, yeah, you're running, as described, the production binary having SSH'd into a machine, and you've closed your laptop lid.

Ben Rady

Yes.

Matt Godbolt

All right.

So assuming or even yeah assuming you just close your laptop lid and nothing shut down nicely, it just literally suspended, I don't actually know exactly what your laptop will do in this situation.

Ben Rady

Mm-hmm.

Matt Godbolt

But let's just assume it disappears off the network instantaneously, which is also completely reasonable if you go into like a tunnel on the train on the way home, that kind of thing.

Ben Rady

Right.

Yeah.

yeah

Matt Godbolt

Right.

Then eventually the TCP connection between your computer and the SSH daemon on the remote end will time out.

There'll be a keep alive that's missing probably, or some other heart beating mechanism will go down.

And the SSH daemon will say, hey, that person's gone now.

It's time to clean up their session.

It will, I think, kill the bash process.

And the bash process then will kill all of the children that it knows about, Something like that.

Or there's some...

sig...Yeah, so this is this is kind of it, right?

So what what is...

Ben Rady

Yeah, signals and processes, right?

Matt Godbolt

Yeah.

I mean, I know that the result will be: my program will die.

Exactly how that dies, I'm not 100% sure, but that's what would happen eventually, maybe five or six minutes later when the SSH daemon times out your connection and says this person's not there anymore.

Ben Rady

Yes.

Matt Godbolt

It kills the process tree through some mechanism and then, yeah, you get a phone call as you've got onto the train telling you that the production system is down.

Ben Rady

Right.

Matt Godbolt

Please fix it.

Ben Rady

Exactly.

Exactly right.

And so, and this is maybe where we troll our listener into, into posting the right answer on the internet to this, because I would suspect what probably happens is that the SSH daemon kills like the process group.

Matt Godbolt

Of course.

Ben Rady

Right?

Matt Godbolt

Yeah, because Bash becomes a a process group controller or whatever the name is...a leader.

Ben Rady

Yeah.

Matt Godbolt

Process group leader.

That's right.

Ben Rady

Yeah.

Matt Godbolt

Where's my Stevens book?

I haven't got it here.

No.

But yeah, there's...

Ben Rady

Yeah, but it's probably going to send a SIGTERM to that process group.

Matt Godbolt

Okay.

So...

Ben Rady

And so every process in the process group is going to receive that term signal and then hopefully gracefully shut down.

I don't know if it follows it up with like a SIGKILL at some point or not.

Maybe it does.

Maybe it doesn't.

I'm not exactly sure what sshd would do there.

Matt Godbolt

No.

No, but that would seem reasonable so that you never you don't end up with loads of processes that just decided not to kill themselves.

Ben Rady

Yeah, yeah.

Matt Godbolt

And frankly, I think Bash will probably do the right thing for that circumstance.

Ben Rady

Yeah, yeah, yeah.

So day one, we try to deploy like this.

Matt Godbolt

okay

Ben Rady

We close our laptop lid, we go home, we get the unfortunate call, and then we rush home, and then we open the laptop lid back up, and then we rerun the process.

All right, well, I can't do that.

So an enterprising person might say, okay what I'm going to do is I'm going to use the bash ampersand command because I know that will put a process into the background, right?

Matt Godbolt

Right.

Ben Rady

And so I'm going to do that next time.

going to run, I'm going to do my deploy.

I'm going to put an ampersand on the end, right?

And then I'm going to like, now it's running in the background and now I shouldn't have to worry about this.

Matt Godbolt

And yeah.

Although if I were to do that with like a process like we were just talking about, the very first thing I would notice is that my shell prompt comes back and then immediately loads of junk from my log file is now appearing over top of what I'm running.

Ben Rady

Yes.

Matt Godbolt

So even before we get into processes, and threads there's like a pragmatic thing.

Ben Rady

Yes.

Matt Godbolt

So what I would probably do is redirect output to, you know, ~/log.txt and And then we'll put the ampersand on the end.

Ben Rady

Right, exactly.

Matt Godbolt

So that idea already, right.

So good.

And now it's in the background and I think we're great.

And I, you know, I'm tailing that log file for a bit and that's safe because that's a separate process.

Ben Rady

Yep.

Matt Godbolt

And now I close the laptop lid and get on the plane, a plane, train, whatever, any more mode of, what happens now?

Ben Rady

Right.

Right.

Well, I think what happens, and I think this because I've had this burn me from time to time, is that, yes, you redirected standard out, but you did not redirect standard error.

And so there is actually still the daemon has a file handle that it thinks it needs to be writing to back to your thing.

And so you put this in the background and you do this again and it breaks again.

it does exactly the same thing all over again.

Matt Godbolt

Well, I think there's more than one reason.

So yes, first of all, standard error isn't going anywhere useful.

Ben Rady

Right.

Matt Godbolt

The second thing here is that although it is in the background, it's still a child of Bash.

Ben Rady

Yeah.

Right.

Matt Godbolt

So you've got it's got you coming both ways.

And maybe thirdly, thirdful, is its standard input is still potentially connected to...

Ben Rady

Mm-hmm.

Matt Godbolt

the console, the terminal, something.

I'm waving my hands a lot here because that's a very, I'm less sure about it.

But I certainly know that if you try and read from the console, you'll get one of the even more esoteric signals about like, hey, yeah, you can't, you're not connected to it right now.

[editing matt here, SIGTSTOP maybe?] And then you'll get stopped.

Ben Rady

Yeah.

Matt Godbolt

And so you'll see in Bash, stopped inputting required or something weird like that.

Ben Rady

Mm-hmm.

Matt Godbolt

um

Ben Rady

Mm-hmm.

Matt Godbolt

So all of those would defeat you and you end up with a dead process.

Ben Rady

Right, right.

So this is where you start investigating all of the various options that you can pass to SSH when you run this, because you're like, going to make a script.

I'm going to make a script that works, and I'm just going to run the script, and it's going to do my deploy, and then I'm going to trust that it works.

And you start learning about, OK, well, I need to do the option that like doesn't read from standard in, because I don't want the standard in problem.

And then I got to make sure that I redirect standard out and standard error so I can put this thing in the background.

Matt Godbolt

Right.

You're saying this right.

Just to be clear, these are options you say to SSH or to the bash.

Ben Rady

SSH, right?

Matt Godbolt

Oh, I see.

So now, now we're not going to run bash at all.

We're just going to run the executable directly.

And, or what were you thinking?

Ben Rady

Well, so you're going to run.

So I'm thinking of the world where it's like you do a thing, you like copy the bits up to the machine.

Matt Godbolt

Uh huh.

Ben Rady

And then you have like a separate SSH call where you're passing the command that you want to run as an argument into SSH.

Matt Godbolt

Right.

So you're no longer running an interactive session.

You're just going to...

Yeah, that makes sense.

Ben Rady

Right?

Matt Godbolt

Okay.

Then that takes Bash out of the equation, which helps us a bit in this context.

Ben Rady

Yeah.

Matt Godbolt

Although there is a there is still another Bashian solution that I think I see people go for, which is you type disown in Bash, which says, push this thing and make it not a child of this process anymore.

Ben Rady

Ah, yeah.

Uh-huh.

Matt Godbolt

And that probably, probably...

might solve the problem most of the time, except you've left a big like rake in the grass for that because there are other processes in the system that might wish to get rid of that apparently now orphaned process.

Ben Rady

Yes.

Matt Godbolt

So...

That's what nohup's for.

It's like it gets of the hang-up and there's some other things that it does.

And then there's daemonization and other bits and pieces, which which I'm sure we'll get to in a second.

But let's put that to one side and let's go down the rabbit hole that you've described, which is that like I'm now going to run SSH on my computer

Ben Rady

Okay.

Yeah.

Matt Godbolt

um And I'm going to pass it rather than just SSH.

I'm going to do /path/to/my/executable with all the redirects and things set and try and run it from a server and have it live on the remote machine with all of the pipes and things stdin, stderr and stdout all connected to sensible places.

So go ahead.

Ben Rady

Yes.

Yes.

Matt Godbolt

Sorry.

That's where I cut you off.

Ben Rady

So, so you do that and then you should, I believe, be able to SSH in separately and do like a pstree and see that the parent process of this, the parent of this process is now one because it is disconnected from what it was doing before from the process group that it was in before.

Matt Godbolt

ight.

Ben Rady

Um, And at that point, you maybe have something where you can close your laptop and have it hang out.

Now, hopefully you sent your log somewhere sensible and you don't fill up the disk with logs.

You can pipe it into syslog, which is something that I do when I'm trying to punt on this problem entirely is I'm just like, you know what?

There's already a log rotation system on this machine and it's called syslog.

So I'm just going to pipe all my logs into that.

Matt Godbolt

Right.

And quite possibly you already have log aggregation set up for that so that you can go and read it on like a website and all that kind of nonsense as well.

Ben Rady

Maybe you do if you're fancy.

Matt Godbolt

Maybe.

I mean, but yeah, if you're considering that option, you probably don't because you probably don't have any other infrastructure to lean on.

Ben Rady

Right, right.

Matt Godbolt

Yeah.

Okay.

So that seems reasonable.

Ben Rady

So what do you do?

What do you do after this?

So you do this, you finally can go home now.

You can shut your laptop and go home.

And you're like, right, surely we can make this better than this.

Matt Godbolt

Right.

Ben Rady

What do we do next?

Matt Godbolt

Yeah, right.

So I still have...

Ben Rady

Do you make the systemd job is what is...

I'm kind of questioning here.

Matt Godbolt

Well, see, I was thinking another thing.

So there is a process...

Process is a terribly overloaded term.

There is a sequence of things you can do on a POSIX system to become a daemon.

Ben Rady

It's special incantation.

You got to sacrifice something and that's how that works.

Matt Godbolt

That's correct.

Yes, there's a pentagram involved and not a "Damon" also so because Matt Damon is the only "Damon".

Ben Rady

Yeah.

Uh-huh.

Right.

Matt Godbolt

So aside here, so as you recall, one of the first ah folks at the company you still work at was also called Matt and was not me.

Ben Rady

Mm-hmm.

Mm-hmm.

Yep.

Matt Godbolt

And we were discussing various long-lived processes that we were designing a system to use.

And the obvious name was the Matt Daemon system.

To be pronounced Matt Damon, obviously.

Ben Rady

Right, right.

Matt Godbolt

But we never did it.

Anyway, daemonization is...

Let's not get into politics.

Becoming a daemon, as I understand it, is a multi-step process.

Ben Rady

Right.

Matt Godbolt

The first thing you need to do is fork, which gives you a new process, a shiny new process.

Then you call something called setsid, which says, I would like to become the session leader for this new process that I've been created because only a process group, and I'm doing this from memory, so listener, please.

And although Ben's nodding, this is not necessarily correct.

So just take this massive pile of [salt]

Ben Rady

Yeah, right.

Nope.

We may be hallucinating all of us.

Matt Godbolt

Yes.

So you fork.

The child process then does setsid to become a process leader in its new group.

And then if I remember rightly, you have to fork again to then dissociate yourself from any last tendrils that previous process had.

And now you're running and you are completely in the clear.

It's something like that.

It's some weird sequence of events, which means that you have lost all connection with the previous process.

And so when you run some like system process and you pass it with --d or -d, sorry, then, and it immediately returns and disappears.

Apparently like, "Hey, did it do anything?" But you know, you you run PS and it's still running.

That's the kind of process that it's been through.

And you're, you know, you can type jobs and it won't be there.

It's like completely lost from you.

And probably...

Ben Rady

Yeah.

Matt Godbolt

I don't realize that the thing you were just talking about and I'm having the penny is dropping now some of the flags that you were talking about finding for SSH to set it up correctly might be the ones that effectively have the same side effect but I having just written something that is a daemon for the if you go back to the systemd conversation we were having last time something became a daemon and I went through that process so it's a bit

Ben Rady

Yeah

Matt Godbolt

Somewhat in top of mind.

And even though I had a daemonization thing there, I still, you can choose, I think, systemd, which we're going to, to say either systemd runs the process and does that for it in its own container, or it's expecting it to run in that particular way.

ah And so it can babysit different types of processes, if I remember rightly.

Okay, let's go back to what you said about systemd, because that sounds like a useful thing to know about.

What is systemd?

Ben Rady

Right.

So so so the so just to put the problem in context, systemd is a solution to a problem.

What's the problem?

Well, so here's the problem.

So you've written your script.

Matt Godbolt

ah Yes.

See the last conversation we had about it was to what of solution it might be.

Ben Rady

Right.

Matt Godbolt

What problem it is.

Ben Rady

What problem are we creating by solving another problem?

Matt Godbolt

Yes

Ben Rady

Right?

I think actually...

Matt Godbolt

Yeah

Ben Rady

Is that a thing?

I feel like I've said this before on the podcast.

I don't remember the difference between computer science and software engineering.

We know this one computer science is solving problems with computers.

Software engineering is solving the problems that you create when solving problems with computers.

And ah this is a, this is exactly.

Matt Godbolt

Yes, that follows.

and That checks out.

The maths checks out for that for certain.

Ben Rady

Yeah.

um And so what problems are we, are we both solving and creating by using systemd?

Well, so you write your bash script, it deploys your thing.

You shut your laptop and then you wait five minutes, you open it back up and then you have [a check?] and it's still running.

And you're like right, I think I maybe believe that this is going to work.

And you go home and the next day you come in and still running.

Cool.

And then three days later it crashes.

And you're like, what would have been super cool is instead of me getting a phone call in the middle of the night because it crashed, if it had just restarted.

Matt Godbolt

Well, I mean, wouldn't cool if it hadn't crashed would be what the first thought you'd have.

Ben Rady

True.

Matt Godbolt

But at three in the morning, you probably just want to go, ah for God's sake, just restart the thing.

Ben Rady

Just restart it, please.

I'll fix it tomorrow.

But can we please not call me because I have to SSH back in and rerun the script again or whatever, right?

Matt Godbolt

Right, right, right.

Ben Rady

So you're like, I just want this to restart.

And then you Google and you're like, well, maybe I should run this in systemd, right?

And so you wind up making a whole systemd job definition.

And you, I forget where do you put it.

You put it in /etc/something, right?

Matt Godbolt

Or is it?

Yeah.

So there's...

Ben Rady

I don't even remember now.

Matt Godbolt

So, I mean, my understanding is in the beginning, there was init.

And init is effectively the first thing that the kernel executes...

Ben Rady

Mm-hmm.

Matt Godbolt

As a user process and it then decides what to do.

And back in the mysteries of time, there were like run levels and it was all like clever directory structures and things like that.

Ben Rady

Oh, yeah.

Matt Godbolt

And it just fired up the right sequence of daemon processes.

One of which would be, you know, sshd so you could log into the machine or a getty that would let actually let you type on the console to get into the machine.

Ben Rady

Mm-hmm.

Matt Godbolt

And that was it.

And then after that, you're off the races.

And systemd is the new init.

And instead of it being,

Ben Rady

Mm-hmm.

Matt Godbolt

...a set of of essentially shell scripts that get run to fire things up in the right order.

Again, I'm probably a bit...

missing loads of bits of context here, but it's a sort of a more principled approach where you have units that are like, I would like this thing to run, please.

I would like this to be true under these circumstances.

And it depends on these other things that also need to be either running or at least have started before me.

And so instead of having essentially numbered directories with, you know, 40.do-this , 41.do...,

Ben Rady

Yeah, RC dot D or RC dot one RC dot two, something like that.

Matt Godbolt

Yeah, those were the run levels, I think, which was slightly different because it's single user mode versus multi-user mode.

Ben Rady

Yeah, something like that.

Yeah, right.

Matt Godbolt

But this is more like, hey, what sequence do I need to run things in and shut them down in, in order for my system to come up?

Ben Rady

Mhm.

Matt Godbolt

And systemd does that kind of the right way by actually tracking dependencies, which again was expensive and caused me problems in our last conversation, but is is the right approach and the correct thing to do.

And so that's what systemd is.

It's like the overarching orchestrator of a computer and all of the processes that are running on it.

Ben Rady

Mhm.

Matt Godbolt

And so, yes, to make something run in systemd, you put a file in the right magical place.

You issue the correct incantation to systemd to go and notice that file is there.

And then what?

Ben Rady

And then need to reload the system daemon.

Matt Godbolt

I'm looking at you because I thought you might've just done this and you could answer the question.

Ben Rady

Yes.

reload the systemd

Matt Godbolt

Yeah, there's like daemonctl reload or something.

That's the magical incantation that says, hey, systemd, look through your configuration files.

Ben Rady

Tes

Matt Godbolt

Something has changed.

Ben Rady

Yes.

Matt Godbolt

Please do the needful now.

Ben Rady

And then it should start up and then you're using something like journalctl to look at the logs of the thing to make sure that it started.

Matt Godbolt

Which...

is I think for most people, when Linux systems particularly moved from init to systemd, the biggest frying pan to the side of the head was, where are all my chuffing logs?

They used be /var/log/whatever, and that's burnt into my mind.

They are text files and are in /var/log/blah, and systemd stopped that.

And now there are a few logs in /var/log, but nowadays you have to interact with it through, and it has a binary log file format, as I understand it, behind the hood.

And you have to learn journalctl, which I still haven't learned, and I still Google the same thing over and over and over again and type in the thing that it tells me to do, which...

Ben Rady

Yeah.

Right.

Matt Godbolt

is ...note to self.

Don't, don't do taking a note here.

Don't do that.

Make a cheat sheet for it and stick it to my monitor.

Like all the other cheat sheets I have.

Yeah.

So that was, but that was like the, but that broke most people, I think, because I didn't have to interact with adding and removing daemons from my system.

That's what, you know, my package management system did.

But whenever something went wrong, I'm like, where the hell's the log file?

Anyway, so journalctl.

Ben Rady

right it's and It's in this magical program called journalctl.

um OK.

I feel like this is like I want to go to the next level now.

Matt Godbolt

So...

Ben Rady

It's like, OK, cool.

We're going run this on like two computers could because ah we discovered that the reason it crashed is it got OOM killed.

Matt Godbolt

Well, let's finish the thought.

So just to be...

Right, right, right.

let Let's just let's just um finish the thought there.

So very concretely, you would install the binary to a known good location, which you probably were anyway.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

It wasn't just your home directory, hopefully.

Ben Rady

Pick a user that you're going to run it as.

Matt Godbolt

Maybe it was.

Yes, that's true.

Ben Rady

Might be root, might not.

Matt Godbolt

Yeah, let's hope it's it avoids being root if it can.

Ben Rady

Yeah.

Matt Godbolt

But then, yeah, you make a little text file that sort of, it looks like Toml-ish to me, that systemd config-ish file that says, hey, I need these things.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

I provide these things, which you often don't have to do.

Ben Rady

Mm-hmm.

Matt Godbolt

This is how I'm going to be started up.

This script needs to run before I run.

Ben Rady

Yeah.

Matt Godbolt

this needs This script needs to run after I run.

There's a few, like, customization points you've got like that.

And you can say what you're wanted by as well.

So in this instance, you probably say I'm wanted by multi-user.target, which is like a magical sort of target that says, hey, when it becomes a multi-user system, the fifth, whatever, um run level five, then this is, I am saying that I am wanted by it, which is a way of you kind of going the other way around from the usual dependency saying it depends on me.

Ben Rady

Yeah.

Matt Godbolt

And that means...

Ben Rady

Right.

You're joining the dependency tree there.

Yeah.

Matt Godbolt

Yeah, so now when you start when you reboot the machine, your service will come back up.

And then you can have some policies about retrying, restarting it, maximum number of times to restart, how often to wait between how long to wait between them, those kinds of things.

Ben Rady

Mm-hmm.

Matt Godbolt

And then effectively, it runs itself after that.

So that's what we do.

Yeah.

Ben Rady

Yeah.

Matt Godbolt

And so your installation process is copy the binary bits up and make sure that this systemd configuration is there.

Ben Rady

Yeah.

Matt Godbolt

And then obviously if you want to restart it, there are processes for restarting service, restart and all that kind of good stuff.

Ben Rady

yeah Yeah.

servicectl?

Matt Godbolt

Yeah, is that what you use?

I still use service space, service name restart.

Ben Rady

I think that's one.

I don't know.

Matt Godbolt

There's there's almost certainly a hundred ways to do it.

Honestly, I still want to go var run blah or whatever the whole old thing was.

Ben Rady

Yeah.

Matt Godbolt

I actually don't know what this command is, but it just comes out of my fingers when I need to say, make that thing run again.

um But yeah, service space, name of thing, space restart is now what I've learned to do.

But Okay, so that's where we are.

Ben Rady

Okay.

Matt Godbolt

Right, okay, so now now we're good, right?

Ben Rady

Yes.

Matt Godbolt

We know that the process is being appropriately managed by a piece of software that's designed to start it up at the right time and keep it running.

It also has some handling for like, if it does output to standard out, it'll go to a well-defined log place inside this journalctl thing.

Ben Rady

Mm-hmm.

Matt Godbolt

If it crashes, it will restart it.

If you reboot the machine, it'll come back up with it if you set that to be so.

so Everything is wonderful.

So what's next?

Ben Rady

Right.

So what's next is that you discover that the thing just crashes every four or five days ah because it's running out of memory because it needs to run on more than one computer.

It is too big.

So you have to now run it on multiple computers and you have to distribute whatever work it's doing.

Matt Godbolt

We're assuming you've ruled out the, there's a memory leak type issue here.

Ben Rady

Yes, it's not a memory leak.

Matt Godbolt

Yeah.

We're just, yeah, yeah, yeah.

Ben Rady

It's just too much data.

Matt Godbolt

It's just like, Hey, it's too much.

Ben Rady

Yeah.

Matt Godbolt

So what do we do now then?

Ben Rady

So now we need to run it on multiple computers.

And so like one thing you might reach for here is Ansible maybe?

Matt Godbolt

I was going to say, is probably duplicating the line in the "scp shh machine service blah restart" and just do "for host in".

Ben Rady

Right.

Yes.

For host and host list.

Yes.

Matt Godbolt

yeah

Ben Rady

Uh-huh.

And just do the exact same thing.

Matt Godbolt

So that's the first thing I would do, right?

Ben Rady

Yes.

Matt Godbolt

At least to start with, right?

That's the V0 of anything is like, well, okay, let's deploy it to the two computers I know about right now and just do the same thing on both of them.

Ben Rady

Right.

Yes.

Yeah.

Matt Godbolt

And then, okay.

Ben Rady

That is probably what I would do.

And then I would have the thing where I would try to deploy it and there'd be some package or some configure.

Oh, we got to increase the size of the maximum size of the receive buffers on the network.

And so now I've got to like go and change that configuration.

I gotta change it.

And I've already scaled this out to like 10 computers now, like every month for the last, you know, 10 months, I've been just adding another computer to my to the list of hosts.

Matt Godbolt

You've been adding another the host to the list of hosts.

Yeah.

Ben Rady

And now it takes like, you know three minutes just to iterate through all of them.

and I'm like, oh, and I have to remember to log in and set all these settings every time I add a new host and it's getting worse and worse and worse.

Matt Godbolt

Okay, so we've now gone firmly outside of signals and processes.

And now this is like the setting up of the machine here is what you're talking about, which is valid.

Ben Rady

Well.

Matt Godbolt

And if you think of, you know, the system, ah sorry, the systemd configuration unit file, whatever we just said, as being part of this machine configuration, then it does make sense to talk about some of the other things that you might need that machine to have set up like packages.

And as you say, system settings.

So yeah let's segue into that.

Let's do it.

Ben Rady

Yeah.

Yeah, OK.

So you've decided that now, okay I need to retire this bash script.

It's served me well, but it's time to move on to something a little bit where I don't have to like build all this stuff myself and make sure that it works and troubleshoot it all.

So I'm going to try to use Ansible.

Let's just say.

Matt Godbolt

And what is Ansible and what makes something able to be ansed, which is presumably what it means?

Ben Rady

And well, first you have to have pants and you can have ants in your pants and then Pantsible.

Matt Godbolt

That would be pansible.

Ben Rady

That's going to be the fork of Ansible is Ansible.

Matt Godbolt

Okay.

Okay.

Ben Rady

So Ansible is, uh, honestly a tool that I have only used sometimes.

It is not, I sort of like wind up making the jump from like, the shell script to like terraform.

That's usually what I do is I'm like, all right, I'm going to go and I'm going to have something like nomad manage these, or I'm going to manage them in the cloud, just making Docker containers.

Matt Godbolt

I see.

So at that point, you jump straight out into sort of an orchestration environment as opposed to I'm controlling individual machines, because that's the other thing in here, that host list and the provisioning of those machines.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

We're assuming that these machines exist and you haven't got to like make them appear in EC2.

Ben Rady

Yeah.

Matt Godbolt

But let's go through what Ansible is, because I think that is interesting.

Ben Rady

Yes.

But, but real, but real high level Ansible is you write a playbook.

And I think that playbook is pretty much in YAML and it's got like the steps that you want to perform.

And there's like a lot of sort of baked in things of like, "Oh, I need to copy this artifact from this place to this place".

Cool.

I need to create a, configuration file here.

Cool.

I need to restart systemd.

Cool.

It can do all those things for you.

And there's lots of baked-in tools in Ansible to sort of do the typical system management things: You can install packages.

You can create users.

You can..

you know, because it's like hopefully, like you said, we weren't running this thing as a root.

So we had a dedicated user for it.

I need when I'm setting up a new machine, I need to make that user.

I need to make sure they don't have a password, that they have the right SSH keys, you know, all those kinds of wonderful things.

So you have some, you know, script or some playbook that you run, you know, as root because it needs to be able to do all these things.

But then it sort of sets up the environment and then like subsequent deploys and things can, you know, kind of make it that the program can run as a user and it doesn't need to root.

Matt Godbolt

Got it.

Right.

That makes sense.

So it is essentially a canonifi...canonific..., that word, of what, the steps that you need to do the playbook.

Ben Rady

Yeah.

Matt Godbolt

I mean, that's a good name for it, right?

Ben Rady

Yeah.

Matt Godbolt

Like it, it, it replaces the playbook, which is the, you know, the Google doc that you have that says, when, remember when you create a new machine, here's the 25 steps that you have to do.

Ben Rady

Mm-hmm.

Matt Godbolt

And you kind of roll your eyes and do them.

And it's like, well, let's automate this.

And it does it in a principled way using, with a bunch of support files that help you, ah make sort of support functionality that lets you do like add user rather than having to go whatever steps you actually have to take to add the user, which I forget these days.

Ben Rady

Yeah

Matt Godbolt

Okay.

That makes sense to me.

I think one of the things that I have had difficulty in getting my head around when looking at these sets of tools and only because you've mentioned Terraform.

One thing I like about something like Terraform is that you kind of describe the end state

Ben Rady

Yeah.

Matt Godbolt

And Terraform's responsible for getting whatever the current state is to the end state.

Ben Rady

Yeah, yeah.

Matt Godbolt

So, whereas with things like Ansible, as I understand it, is you have to be very careful to either be idempotent so you can run the same thing twice and it doesn't re-add another user if there is one already called that thing.

Ben Rady

Right.

Matt Godbolt

Or you just have to not don't run that step again.

You know, like, hey, once we add that user, don't try and do it again.

And then you kind of go like, well, now I want to change the user to have a different you know full name or a different shell or whatever.

Ben Rady

yeah

Matt Godbolt

You're like, now I have to run the change command and I can't just change the add.

Ben Rady

Right

Matt Godbolt

And Unix systems are so, so complicated.

I can't actually imagine how you could write a more general purpose like make my system look this way thing except for at least one listener somebody is currently shouting "Nix" into the void as they're walking along and I know that Nix solves this in a very cool way and I'm very excited by it but I don't have any personal experience with it other than someone demoing to me and me going wow that is super cool.

Ben Rady

Yeah.

Matt Godbolt

But so just for that, yeah, Nix seems to be, it seems to be like a kind of,

Ben Rady

I've heard those same things about Nix, but I have, again, no personal experience.

Matt Godbolt

A mind virus that people get, not in a bad way necessarily.

That does sound pejorative, but like, cause once you get it, I think you're like, Oh my gosh, this is how everything should always be done.

Ben Rady

yeah yeah

Matt Godbolt

And that's great.

And you become like proselytize it to everybody.

And then most people's eyes glaze over.

Ben Rady

Right

Matt Godbolt

And then you're like, that seems great.

And then you just log back onto the machine and just go "sudo apt install bob".

And you're like, there we are.

We're done.

Anyway, back to, oops, I've just banged my, yeah but sorry, editing Matt.

You just, I've just whacked the microphone stand.

[that's ok, I didn't edit it out -editing Matt] Where were we?

So I was sort of saying that there's this sort of difference between sort of prescriptive run these things in order and maybe they're idempotent or maybe they can adapt and say like, well, if's if there's a user already there, don't re-add it, that kind of feeling.

Versus the Terraform thing where you just say I should like this to be the end state.

Here is a list of users the machine has to have with the properties that users have.

Ben Rady

Right.

Matt Godbolt

And then Terraform goes behind the scenes and goes, well, why don't I look at what users I've got?

Oh, now I'll make a plan.

A plan is add three users, delete one user, and presents it to you says, this is what I'm going to do.

Ben Rady

Yeah.

Have you ever actually used Terraform to do that type of system administration before?

Matt Godbolt

Not on a system, no.

I've only ever done it with infrastructural components.

Ben Rady

Right.

Yeah.

Matt Godbolt

So yes, that is true.

I've never used it for a you

Ben Rady

That'd be amazing.

I don't know if I can do that, actually.

Matt Godbolt

I don't know that it does.

Ben Rady

That'd be amazing if you could do that.

Matt Godbolt

You're right.

Yeah, now I say.

But but suddenly, that's where where I was going with that.

Was less that Terraform specifically, but like the phrasing is either outcome or steps.

Ben Rady

Yeah.

Matt Godbolt

And you know it's nice to supply the outcome.

But yeah, I don't know if something does exist.

And my only interaction with things like that are with Packer, where I always start from an empty image and then run the sequence of steps to make an image that looks the way I want it to.

Ben Rady

Mmm.

Matt Godbolt

So I never go back to it and kind of go, hey, I want that image, but slightly different.

Ben Rady

Yeah.

Matt Godbolt

So yeah, anyway.

Ben Rady

Yeah.

Yeah.

Matt Godbolt

We're all over the place.

Ben Rady

But yeah, maybe that's the, I feel like this, this podcast is like the rough draft of a conference talk.

Cause it's like, imagine that you want to run a program.

Matt Godbolt

[laughing]

Ben Rady

What do you do?

And you we just sort of work up from the bottom up.

And then I feel like the, it'd be good talk, right?

Matt Godbolt

I think that's a...

When was the last time you gave a conference talk?

Come on, it's your turn.

Ben Rady

Oh, it's been a long time.

I, I, I'm probably overdue, honestly.

Matt Godbolt

Because...very much part of the, the last week's conversation.

The reason I was looking into that was because I was avoiding writing several conference talks that I have to give in about a month's time.

And a week has passed since we last spoke; now I'm giving away all of our secrets.

Although much longer will have passed in real time.

And I've probably given the conference talk by the time I've released this.

um So listen, you can be the judge of whether it was any good or not.

But yeah, I have done no work on it at all.

So...

..oops.

But yeah, this is a rough draft of a conference talk on...

Ben Rady

It is.

Matt Godbolt

"So you want to deploy a service" or "So you want to run a service?"

Ben Rady

Yeah, exactly.

So you want to run some software, right?

Matt Godbolt

Yeah, yeah.

Ben Rady

How are you going to do it?

And I feel like the punchline of this is like, okay, and now we're migrating this all to the cloud and we're going to use Terraform.

We're going to use GCP or maybe you have like, you know, ah a lot of companies I feel like these days have like and essentially like an internal cloud.

Like they're still using Terraform, but they're using tools like Nomad and they have their own, you know, physical servers and they have an infrastructure team that's managing it all.

And this maybe leads us back.

This is how you get this.

Okay.

This is the whole ...

This is how you get into the state where you're just like, yeah, I just like changed one function with some unit tests and pushed to PR and I have no idea where goes.

Matt Godbolt

Yeah, that's exactly right.

Yeah.

Ben Rady

Yeah.

Uh-huh.

Yeah.

And now, and now the circle is complete.

Matt Godbolt

Well...

And now the circle is complete.

Yeah, I think we've we've probably yeah reached a good spot then.

Ben Rady

Yeah.

Matt Godbolt

Yeah.

It's good to know these.

I think like all of these, like everything we talk about, really, certainly everything that I hold dear that we talk about on this ah this podcast is all about finding the right level of abstraction, knowing that there's a level beneath you.

Ben Rady

Yeah.

Matt Godbolt

Which in this case, you know maybe your level of abstraction is those cloud tools that we've just been talking about and the services that run.

But knowing enough about the level beneath you to say like, okay, I do know that there are processes that run and that something is taking care of the input and output for those processes and making sure the right signals get to them at the right time and not the wrong things like me logging out.

Ben Rady

Yeah

Matt Godbolt

But I don't know that it exists and maybe I could sketch something, but I don't necessarily know off the top my head.

And then you should know beneath that what...

that something exists, right?

Beneath that layer, we know that there is a systemd and I don't know how that works, but it's always good to have a decent understanding of the level beneath where you're working and then be aware of the layer below that.

Ben Rady

Right.

Know vaguely what to Google or ask ChatGPT, right?

Matt Godbolt

Right.

Or ask your favorite Large...

Ben Rady

Yeah.

Yeah.

Matt Godbolt

Yeah.

Ben Rady

Ask your favorite LLM.

Matt Godbolt

Yeah.

Yeah.

And so I think this plugs into that kind of mindset completely as like, you know, yeah, it's kind of like know how the cloud works and then...

Ben Rady

Yeah.

Matt Godbolt

...know where to look when it doesn't work.

Ben Rady

Mm-hmm.

Mm-hmm.

Yeah.

Like if you the honestly the only downside to this is that in those environments, I feel like where you have those like, you know, a million layers of abstraction between you and the physical server.

Matt Godbolt

Cool.

Ben Rady

If you're like an old fuddy daddy like us and you're like, can I just SSH in?

It's like, no, you can't have root.

It's like, whohe ah what why?

i know exactly what to do.

I know exactly how to fix this problem.

And now I'm going to have...OK, fine.

Sure.

Matt Godbolt

Yeah.

Ben Rady

Whatever.

Matt Godbolt

Well, and of course, the irony is, they can probably give you root, but it's not even on the real computer because you're several layers of virtualization away from the machine that's actually running.

Ben Rady

Yeah.

Mm hmm.

Right, yeah, exactly.

Matt Godbolt

You talk about the metal.

Ben Rady

It's like it's running in the container service.

There's no root to give you.

Like you can't get there from here, right?

Matt Godbolt

Yeah.

Ben Rady

Yeah, yeah.

Matt Godbolt

Yeah.

Cool.

All right, friend.

Well, this has been great.

Ben Rady

Yeah, yeah.

Matt Godbolt

We jammed it.

We did it.

Ben Rady

Not bad for winging it.

Matt Godbolt

Yeah, listener, you can let us know.

Post a comment somewhere.

I mean, some people watch this on YouTube and that's where I see most of the comments and then otherwise tweeted us or hachyderm.io mastodon-y thing or so just email us.

Ben Rady

Yeah.

Yeah.

Mastodon.

Matt Godbolt

You can get us.

But we'd we'd love to hear what you think and what we're doing right and wrong because we've never really asked that.

Ben Rady

That's not hard either.

Yeah.

Matt Godbolt

We just do this for us.

This is just our excuse to catch up, isn't it?

Ben Rady

Yeah, that's true.

Matt Godbolt

Cool.

Ben Rady

That's true.

Matt Godbolt

All right, friend.

Well, have yourself a great weekend and I'll speak to you soon.

Ben Rady

All right.

Until next time.

Matt Godbolt

Until next time.

Never lose your place, on any device

Create a free account to sync, back up, and get personal recommendations.