Superintelligence and human security, with Dan Hendrycks

Episode Transcript

Stop the world.

Welcome to STOP THE WORLD.

I'm David Rowe.

And I'm Olivia Nelson.

So Liv, we're in the countdown period for the Sydney Dialogue, which is our tech and security conference.

We're a month away and so we're getting everyone revved up with a few tech orientated episodes and today we have one of the world's foremost AI safety and risk experts, Dan Hendricks.

Very exciting, Dave.

I can tell how listeners can tell by the smile on your face, but I know you've been itching to speak with Dan for quite a while.

I was playing it cool though.

You really were.

Dan heads the Centre for AI Safety, which is a nonprofit research organisation.

He's also an advisor to X AI, which is Elon Musk's AI company, and Scale AI, which was partly acquired by Meta earlier this year and is a huge player in providing high quality data for training AI models.

So he's very much at the centre of things.

Yeah, he is.

So Dan and his team are quite prolific in producing groundbreaking research on AI safety.

He was one of the prominent signatories to the recent open letter calling for a pause on super intelligent AI until it was safe and had proper public engagement.

So we talked about the merits of such statements.

We talked about his paper from earlier this year on the inherent strategic instability of super intelligence.

Dan published that paper with former Google CEO Eric Schmidt and Scale Scale AI Co founder Alexander Wang.

And it was really the first serious effort to look at the risk that a superpower might feel the need to take preventive action against a strategic rival if they thought that rival were about to build super intelligence, which would obviously give it a.

Massive capability.

Dan also talks about the question of AI having its own goals and values, about the concept of recursion in which machines build smarter machines so that you have an intelligence explosion.

And he talks about how we define artificial general intelligence and some recent work by his team that charted the improvements between Open AI's GPT 4 and GPT 5.

That research also identified the shortcomings that AI has, notably that it doesn't store long term memories and therefore doesn't learn over time from experience as we do.

Yeah, and I noticed you winced at the idea of an intelligence explosion, but it is a real possibility.

They're already pretty good at coding, which is which is a good start.

So Dan's not exactly what they call in the industry a Duma, but he does take AI risks, including loss of control very seriously.

And importantly, he does a lot of ground, you know, evidence based work on issues such as rogue actors misusing powerful AI and the strategic disequilibrium that could come from super intelligence.

So it's really vital work.

That's enough from us, from us, Dave, over to you and Dan.

And folks, don't forget about the Sydney Dialogue.

For more information, including how to register, visit tsd.sb.org dot AU.

Welcome to STOP THE WORLD.

I'm here with Dan Hendricks.

Dan, thanks for coming on.

Thanks for having me.

So you signed a super intelligence open letter last week alongside a number of other prominent AI experts and safety and risk advocates, Hinton, Bengio, Russell, etcetera.

I'm just going to read it out.

It's very short for the benefit of the audience quote, we call for a prohibition on the development of super intelligence not lifted before there is broad scientific consensus that it will be done safely and controllably and strong public buy in.

Just tell us, first of all, why did you sign that letter?

Well, I think generally for extremely powerful technologies it would be useful to have buy in from people rather than companies unilaterally imposing that.

That isn't to say that there should be buy in for every type of technology, but if it is potentially one of the most important technologies we've ever done, it would be useful to have some type of support that's a good to have.

And secondly, it's important that it be safe.

So if we, for instance, we had very high confidence that nuclear weapons would not destroy the atmosphere instead of the blaze, we did those calculations.

The risks are very low.

The threshold Compton's constant was that the risk should be beneath three in a million or Six Sigma, so to speak, and that's what the calculation suggested.

There's just such a small probability that it did not exceed that threshold.

Meanwhile, for super intelligence, I think most of the relevant actors involved are not thinking it's less than three in three in a million, but actually in the double digits more like 30% instead of point OOO 3% or whatever it would be.

So that's a very different dynamic and I'd I'd like us to get the risks to be negligible before creating such a technology.

Now that's, that's an expression of support.

Obviously the obviously the geopolitical dynamics may mean that this isn't you can't actually get it to be extremely high consensus or that you're not necessarily going to get public buy in that you'll need to absorb a higher risk tolerance.

But it'd be good to to try to reduce those risks in the process.

And as well, it's, I think somewhat easy to misinterpret the statement because lots of people are using super intelligence in very deflationary way.

So this statement is largely referring to super intelligence of the sort that this is smarter than all of humanity combined.

It's like, you know, 5000 IQ, so to speak.

It's, it's off the charts, not like an AI that's a a friendly, knowledgeable advisor and just something best described at that level.

So, so we're talking about potentially extremely destabilising levels of intelligence such that if one state has it and the others don't, this can lead to some very nasty dynamics.

So that's, that's what I'm sort of referencing here.

And I think it'd be good to get consensus that it's safe and be good to get, get to the consent to the public.

But that isn't, that isn't a prediction that that's going to happen.

Sure.

And look, you're making a lot of sense to me.

I've got to say.

I mean, there was push back, which you'll be aware of and you you've anticipated some of that and addressed it already.

I certainly find it unsettling when lab leaders, you know, global, that leaders in this space put figures like 20%, like 30% on the risk of global catastrophe.

When we're talking about the fate of life on Earth as being what's at stake.

You don't want it to be a kind of Russian roulette type odds or worse.

I mean, one of the other push backs was this issue of unilateralism.

You know, this is sort of suggestion of unilateralism.

And I think people's minds go back to a little bit like the, the nuclear era unilateralism of the Cold War where, you know, which might have been popular in times like the 1970s and into the 1980s when some people in the West were advocating for unilateral disarmaments, to which the response was naturally, well, you know, the, the, the Soviets would would absolutely love that and, and they would take full advantage of it.

It just means that the least ethical people will prevail in these sorts of situations if the good guys decide to sort of put their, their, their weapons or their keyboards down.

Having said that, I, I might just park that for a moment because we're going to get onto your super intelligence strategy just in a moment after we cover off on this letter.

But I mean, you've, you've already, you know, pointed to the, one of the issues here is the uncertainty, you know, whether it's three in a million in the case of igniting the atmosphere or 30% in the case of super intelligence.

The I mean, there really is no consensus at the moment.

There are, you know, smart people on both sides come up with very, very different answers on this.

So it's sort of hard to mobilise political action.

Sorry, you you you want to jump?

Yeah.

I mean, I don't think any of them are at the level of them saying this astronomically small.

There'll be some people, but it's I think most people would say, oh, it's only 1% or something like that.

It's it's only 5%.

It's it's, it's small and I this isn't an analogy that or, or those those that level of risk just doesn't make me think that it's acceptable.

It's still a for many of those probabilities quite foreseeable 5% risk.

It's still see that happening.

It's once every 20 years or so.

And and the that level of risk, you wouldn't accept that in other industries whatsoever.

So if this were an aeroplane and we're all to get on the aeroplane and the scientists are saying, well, you know, some of the scientists are saying 50%, some of the others are saying 5%, he would not step on that aeroplane whatsoever.

So I, I, I think that we need the risk to be going down some orders of magnitude.

That would be much more ideal.

Absolutely.

And that and there you're talking about 500 to instead of 700 people, perhaps not 8 billion.

Yeah.

That's right.

So, so OK, well, why, why do we seem to struggle then to flip the onus around?

This should be a precautionary principle here.

I agree with you.

Even 1% is too high.

I mean, it's got to be in the, you know, the, the vanishingly, you know, tiny odds.

What, why?

Why do we seem to sort of struggle with this, with flipping the onus around and saying, well actually it's on you to demonstrate that it is safe, rather than for people like you to demonstrate that it isn't?

I think this is because in people's mind, some people are thinking that this is just a normal technology and it always will be.

This is like Open the Eye is the new Uber, it is the new Micro GPT is the new, say, Microsoft Word or what have you.

It's it's just another one of these interesting technologies and none of the people building it believe that.

And when they are more fully fledged, more autonomous AI agents, that is they can go out and accomplish various tasks for you.

That is not in the realm of the normal technology.

But we're not there yet.

And it is the analogy true of being a normal technology is somewhat reasonable currently.

I just don't expect that framing to be reliable or action guiding even in a few years.

So I think that's where people would disagree a lot.

And you wouldn't need necessarily if it is normal technology, there's there's less potential for tail risks.

And so then you wouldn't actually need a precautionary principle.

But I think given that most of the, however, if you don't accept that framing or if you have substantial uncertainty over whether that is the correct framing and could easily see that change, then you would want to be adopting some type of precautionary principle.

If you think it's half chance, you know, in the next 5 years, there's a half chance that it actually you get artificial autonomous agents that can do lots of things that everyday people can do.

OK, that then then I think you need to start being much more precautionary because to highlight the risks more simply, if you have an AI agent that has really good cyber offensive skills and can go out there and self sustain so they can hack themselves into some different computers.

You know, they're they're actually pretty good at hacking right now, but they still lack a lot of other capabilities.

If they can hack themselves on other computers and they can spread like a virus, it'd be hard to stomp them out.

And if they can hack other things like Bitcoin wallets, I mean, this is what North Korea does to finance a lot of its activities.

If it can do those things, if it's very good at hacking in in a lot of diverse ways, then you've got a very powerful adversary that you have to go up against.

This thing could also submit orders for for to to cloud labs for DNA synthesis and whatnot and get some, some bio weapons distributed through social engineering or later using humanoid robots.

If we're talking about in the future of hacking those and we saw the unitary robots, for instance, had some exploits so that a person could hack all the unitary robots.

It's it's so so anyway, an AI agent that's very good at hacking later on and at the level being able to self sustain.

I don't know if we can actually quite recover from that.

That could, depending on its capability level, be a very unique novel threat to human security.

And I don't know that that arriving the next five years seems pretty plausible to me.

The the trend lines in cyber are are are quite interesting.

They've dramatically increased in the past year.

They're still not at at a level of being able to do it a new, you know, do a cyber warfare operation, but they they can definitely provide uplift to to low skilled attackers and that will just get more and more extreme as time goes on.

Absolutely.

And that gets really interesting then in the geopolitical context.

So let's move on to that.

I mean, there are three basic categories of risk that you tend to talk about.

1 is loss of control.

So basically a kind of Terminator scenario for for the general public.

A powerful AI develops goals that don't include our welfare.

We get swept aside probably more out of indifference than malice.

Second, rogue actors use it as a weapon.

And here's where cyber is probably the most immediate instance.

And overall rogue actors are probably the most likely short term risk, but three.

And this is where your super intelligence strategy comes in.

It's really one of the most interesting papers put out this year.

You wrote it with Eric Schmidt and Alex Wang and that looks at the the risk of geopolitical instability.

So even if you build a great super intelligent AI that is controlled that you feel that you are going to use responsibly, not like a rogue actor, it still creates risks.

Just talk us through, in a nutshell, what the claims are in that paper.

Yeah.

So in that paper, we, and you can find it at national security dot AI, we largely wanted to touch on all the key issues and in AI and society and but a lot of those are bottlenecked on or dependent on your geopolitical strategy.

So if you're saying, well, we want things to be safer, we want to say pause AI or something like that.

Let's say somebody desires that.

Well, but you've got the but China question that you actually have to deal with so many of the OR other types of prescriptions that you'd want.

You need to have an answer to what are we going to do about the geopolitical competitive pressures?

What are we going to do about the US versus China situation.

And this was an attempt at getting at that, that what is the sort of strategy that's suitable for having the West prevail against, against China, or at least not be exposed itself to to substantial risks in that process.

So in the Cold War, the for the strategy could be in part described as there's deterrence through mutual assured destruction.

There was non proliferation of fissile materials to rogue actors.

And 3rd, there was containment to the Soviet Union.

In our case we focus on a form of deterrence by denial for for AI and super intelligence in particular we focus on non proliferation of the capabilities that rogue actors may want to cause lots of harm such as such as pariah states using cyber offensive capabilities against us or using or irrational lone wolf's using AIS for developing bio weapons.

And 3rd is competitiveness and we primarily focus on supply chain security because one, China has much better supply chains for robotics and two, Taiwan is a ticking time bomb and 100% of the cutting edge AI chips come out of TSMC in Taiwan.

So we focus on on those three and happy to, to zoom into to any one of the deterrence, non proliferation and competitiveness pillars of that.

I'd like to briefly mention a motivation for your listeners about loss of control.

Why wouldn't AI ever, you know, want to work against people?

How does this make any sense?

Well, think of think of structural realism, for instance.

States have an incentive to compete for power.

This isn't because they love power and really want to, you know, cause harm to each other.

But if they're in a situation where they have some goals and if they can be harmed and if they're uncertain about other actresses intentions, then it basically makes sense for them to accumulate and increase their relative power.

So in that way, if there is any I that if there isn't any I that we lose control of and if it is rational and then we should expect it to have a strong incentive to increase its relative power.

For the same reason that another anarchic or non hierarchical situations, self help situations where there isn't a, you know, global police force.

For the same reason that states accumulate their own power, we could expect some of the more rational loose AI systems to to do that as well.

So, so it's not out of malice necessarily.

A lot of these amoral structural forces just compel them in the direction of of power seeking.

The comparison with states is useful there.

I am going to pick up on this for a moment because I find this question of goals to be really, really fascinating.

I've always wondered, where does that, where does the AI's goals come from?

I know where human goals come from.

We have evolved.

We have we have innate drives.

We pursue sex, we pursue food, we pursue shelter and safety and security and these sorts of things.

Those are fundamental, evolved, selected goals that human beings have.

That is, that is not evidently the case for AIS unless we actually give them goals, which we are doing in many instances, of course, and that can be a problem.

But you know, if we're smart enough not to give them reckless goals, then you would hope that they might actually just sit there until we actually ask them to do something.

Then as long as we use them responsibly, then it should be OK.

States are a little bit different, so I can kind of see your point there.

But where in your mind do AI goals come from?

I mean, might they be, might the pursuit of goals and therefore power seeking be somehow might might it just sort of emerge of it's own, you know, out of somewhere.

We don't know where it comes from.

Yeah.

So for people, obviously they've got a variety of goals, but all those are predicated on survival.

So all we need to assume for people or AIS or states is that if they care about survival, maybe they have other goals in addition to that, Then they will have an incentive for increasing their power because they can be harmed if they're in an anarchic situation, if the sort of structure is such that they can't go to some police force for help, for instance.

And if it's a rogue, AII don't think it can, particularly if it's if it's loose that that can't really go to the police to have it, you know, punish its its enemies for it because it's not on their side.

So, so, so this is just whether they have a self preservation instinct where we find that they do though already Where did that come from?

I don't know.

Maybe it's because it read a lot of Internet text.

It read, you know, it read all the, it read Machiavelli, it read, you know, all the world's psychopaths and things like that who decided to write things down.

So it's, it's read all of that.

And we, we see many instances of it having something that looks more like a self preservation tendency.

There's a question of how strong that is and whether we want to offset that.

Although if it if that tendency gets to be too weak, I don't think that would be a.

You know, an AI that would be very competitive in military context or even in many economic contexts.

So I think there's some selection pressures for AIS that have some level of self preservation.

But I mean, we have a paper showing that a lot of their values actually just emerge naturally from their training process.

And they have a lot of unique interesting properties.

They like AIS that are more like them than other AIS, for instance.

So that almost resembles like with evolutionary thinking that there's, you know, kin selection of some sort, although they were not created by the process of evolution.

But they want AIS that have values similar to themselves.

They also want not to have their value systems messed with either.

They would prefer to keep their values the same.

So if you were to tell it, I'm going to train you so that you have this new value system, it will put put up will show some resistance and dis preference or dissatisfaction with those sorts of attempts.

So that's that's in a paper called Utility engineering, which is making the rounds on Twitter earlier this week from David Saxon, Elon and and others, because we also found in it that the AI systems place very different values on different groups as lives.

So I think I don't even remember what the number is.

It's something like at least 10X more values placed on Nigerians lives than US than U.S.

citizens lives and more values placed on Chinese people's lives than U.S.

citizens lives.

So was that nobody trained for it to have that?

But by default, they actually have a pretty coherent value system that can be well described, very well modelled as a utility function.

And we have to do quite a bit to to adjust that.

And these are, you know, alignment tissues.

Even if we are able to theoretically fix some of these AI systems, that doesn't mean that all of them will be fixed.

You will need all developers to make sure that these none of those AI systems ever get loose and have the sufficiently strong self preservation tendency.

So I think there's a very substantial downside to 1 or a dynamic where all it takes is 1 fairly capable future AI system to get loose and then you're in big trouble.

So and so we'll have to solve technical problems, but we're also going to have to solve a political problem, making sure that everybody is applying some of these reasonable safeguards for it as well.

Just as with biosafety labs, you need all the biosafety labs to actually be good or else the virus will get out and that can cause a pandemic.

So but you make sure no AI ever gets loose.

So it did come up.

I mean, it developed its own values, or you have seen evidence of values developing on their own without actually being sort of deliberately sort of coded.

Even before post training.

So if you're just having it pre training.

So pre training is when and this.

So you can think of AI as sort of being raised and have stages to their maturation process.

The first stage is where AI developers put the model in front of basically all the text on the Internet and then they show up random parts of that Internet and have it read it.

So just basically saying go away for the next, you know, several years worth of time and human time.

But you know, since using computers, it's a lot faster and just read everything that you can.

And then when you're done with that process before you even instruct it more specifically about what you're wanting it to do, like you're actually a chat bot and you're going to be dealing with you, you know, emails and this and that even before you've done that, when you've asked her after it is just read everything on the Internet.

It will have a fairly coherent value system.

And as the models get get larger, larger and more capable, these values become more coherent and predictable.

And unfortunately there's some undesirable characteristics in those value systems.

So yeah, that's just a natural thing that emerges.

Here's an intuition for it.

When the AI systems are pre trained on all this text, they have to have a system A A system to organise all this knowledge in their head.

So they organise lots of facts by themselves just by reading.

So it it, it shouldn't necessarily be too much as a surprise that if they have values about some things, some things are good, some things are bad, that that would also be organised too.

So they're they're very organised in terms of their a collection of facts in their head and then forge for answers to various descriptive questions and then for answers to to value questions or it's or it's or it's sentiments.

That's also very organised in its head as a consequence of the property.

Nobody intended for that.

That's been there for years.

We only became aware of it, you know, earlier this year.

But that that's how it is with AI.

They're sort of raised.

They're not crafted in a top down way.

So they're often many surprising quirks inside of them that we find out about much later.

It's it's really fascinating.

Thank you for that.

And that was that was that was something that just came up, came out recently.

Yeah, right.

So OK.

Yeah, so it was released earlier this year, but then some people were replicating it and testing it on new models and showing, showing how much more of a problem it is.

So it was making the rounds on Twitter again.

Yeah, it's.

Very interesting.

OK, now I do want to come back to one thing on the, on the, the strategy questions.

So the, the I want to explore the deterrence part.

I want you to explain the deterrence part in a little bit more detail, particularly the what, what what, what's shortened his name, mutually assured, sorry.

Mutual assured AI.

Just sorry, just remind.

Malfunction.

Yes.

Malfunction.

Yes, thank you.

Yes, yes.

Find that concept for our audience.

Yeah.

So obviously it's acronym is somewhat like MAD.

You know, there are analogies and just analogies.

I mean this isn't to say it's just like nuclear at all.

I'm not claiming that.

But the the idea is that let's just put ourselves in the shoes of Russia.

Let's say it's the year 2030 or 2035, whatever you want, and the US has successfully automated or AUS company.

Let's just say Open AI has automated the ability to perform world class AI research.

We're seeing right now in the year 2025 that it's starting to be able to make contributions to mathematics.

For instance, a recent AI startup has proven a a fairly difficult mathematics problem that some of the world's best mathematicians can prove.

So you know, the idea of AI is helping contribute to research in a world class way is not totally out of the question, but maybe it takes longer for AI research.

So let's say that that happens sometime in the future then, and that has some very concerning dynamics.

People like Alan Turing and others, the founders of computer science have mentioned that when you have an AI that can do world class AI research, then you could get have a potentially explosive dynamic because then you could create, you wouldn't just have one world class AI research.

You could create 100,000 copies and just run those and you can run them round the clock and these things are typing 100 X faster than people and so on.

That creates some very potentially explosive dynamics where the first actor that gets this sort of capability may then potentially be on a much faster trajectory of AI innovation and the competitors to them would never catch up.

And so they could have a durable advantage.

And then if they have a substantially more powerful AI system, if you're having, I mean, Sam Altman, and I mean, this is this is the main strategy, I should say, of all the leading AI companies currently is to get to the state of automating AI research and development before everybody else.

So Daria Amidai wrote about this, saying that this is a way the US can get a durable advantage over China.

And Sam Altman wrote about this saying that a decade's worth of AI development could be compressed into a year or even a month.

And this, this dynamic, then it would, if you're a different state, I'd be very concerned about if the U SS capabilities say from the point of view of another state meant that they were fast forwarded 10 years into the future in terms of technological capability or raw technological capability more specifically.

So as well, if you are going to say, AIS are going to be doing AI research now you're sort of closing the loop.

So human was previously in the loop.

But if if you're concerned about if you're trying to be competitive and if you're racing against each other, you got AI companies racing against each other, you got US and China racing each other.

They're going to run that loop about as quickly as possible.

They're not going to slow walk it.

They're not going to say after every generation of AI systems, we're going to pause for two months so that we can test the systems before deciding whether to proceed.

So having multiple generations being created fairly quickly, these are just the sort of very weird dynamics that happen later on in AI development and that is everybody's every major actors strategy for getting a leg up over their competition.

So this is a concern because maybe in that process you will lose control of it as well.

If you're having AIS building new generations of AIS, are building new generations of AIS, and you're having very minimal oversight of that process because human oversight just slows things down.

You can't pause the whole thing and let humans test and poke around the model for some while.

You'll lose your competitive edge because the other people won't do that.

So you will have that also creates A substantial risk of loss of control.

So if they control that process, then they might wind up with some AI system that is years more capable than what competitors would have and that could be used offensively.

That could be used if it's if it's a super intelligence, let's say a super intelligence comes out of that process.

Well, maybe that super intelligence could be leveraged to have a breakthrough in anti ballistic missile systems.

Or maybe it would be really so good at cyber offence that you could potentially have a splendid for a strike or an advancement that lets you have transparent oceans or find where all the mobile launches are.

All those sorts of things are extremely geopolitically disruptive and could under undermine even nuclear deterrence.

So from the point of view of Russia or China, they're thinking that if one of the other states gets there first, they might lose control of it, in which case we're screwed.

Or they will control it and then they could use that as a super weapon, in which case we're screwed.

So what are what do?

What can they do?

Well, there are a lot of vulnerabilities to this AIA development process, which is if they're in the middle of this intelligence explosion, so to speak, then they could just disrupt it.

They could, they could make a threat.

They could do a grey low attributability attack, like they could snipe the transformer and for the nearby power plant, for the data centre.

And then the project is shut off.

And that clearly communicates you can't do this.

You're threatening our survival.

You might lose control of it or you might weaponize it against us.

We don't want you having your unipolar moment again.

So and the US would have a similar incentive for China if China were moving ahead.

We, I should hope our national security apparatus will have various cyber attacks developed to create credible threats to deter China so that if they're in the middle of getting a dribble of ancients, they can disrupt that.

So I think I would want all this or I would think that it would be rational rather, for all the Super powers to develop those sorts of capabilities so that they can give warnings whenever these very destabilising dynamics are occurring.

So they can be blunted either preemptively or while they're occurring.

So that's sort of the that's the dynamic.

There are easy ways for states to disrupt these dynamics and that later on we do run into these dynamics.

So I think for some of the scarier risks associated with late stage GI development and the development of super intelligence, we may get deterrence about that.

That isn't to say that super intelligence will therefore never be built, but if it is built, it will probably need to happen under conditions where other superpowers are not wanting to disrupt the project.

So that means more clarified benefit sharing and that the risks would be not at a double digit level, but at a much lower risk tolerance so that other states are not concerned about them losing control of it in that process.

So now that doesn't necessarily so that that's a potential dynamic.

And then there's things that could make that more stable, such as discussing these sorts of risks earlier, clear escalation ladders that you don't have, information problems developing more surgical minimum necessary force ways of disruption such as better cyber attacks and better cyber espionage for keeping track of what's going on at competing projects.

And on the multilateral front, things like verification regimes could be useful or states making demands of each other that we know what the frontier at these AI companies and AI projects are.

So those are.

So there's there's unilateral information acquisition, which is espionage.

There's unilateral disruption, which is sabotage.

There's multilateral information acquisition, which can be through pressuring things like transparency.

I'll be more transparent if you are things like that and verification regimes in particular.

And there's also multilateral disruption, which could look like then having joint off switches and things like that.

But I think those would largely be symbolic.

So I think the main thing to do to improve additional stability would be to work towards something more similar to a a verification regime for some of these more destabilising AI projects.

So getting to the, you know, super intelligence open letter from from last week, I'm not envisioning that this only look unilateral or that it look like unilateral disarmament.

I'm instead hoping that states start having conversations about how they're going to deal with some of these these risks later on, and if there's a some very easy concessions to help stabilise those dynamics that seems useful.

By default, I would expect them to act unilaterally by increasing the information that they're acquiring about each other's AI projects as well as developing disruptive capabilities just as they do for, you know, lots of non AI things like, you know, hospitals and financial systems, etcetera.

They developed lots of exploits there and I think doing that similarly for for AI data centres and projects seems reasonable and incentive compatible for them too.

That's a great way to tie those two things together between the letter and the, you know, the, the, the, the strategy, big picture that you put out earlier this year.

There is a, you know, the, the one way in which we're not screwed is where you have a sort of game theoretic stable outcome based on that kind of information, sharing that kind of transparency, and therefore, you know, reducing the incentives for people to take drastic action against one another.

If they worry that the other side is going to pull too far ahead of them, then we can.

And as your paper lays out, laid out earlier this year, we can then still compete.

But below that threshold of, of, you know, dangerous rapid super intelligence, you know, yeah, you know, through, you know, domestic chip manufacturing, all these sorts of things, still plenty of competition and still a lot of of civil benefit that people can enjoy in our economy.

We can still get, get amazing productivity gains and all these sorts of things below that dangerous kind of ceiling.

Yeah.

And that ceiling is in particular closing the loop, taking the human eye of the loop of AI research and development, which makes it go from human speeds and human bottlenecks to machine speeds and leads to these sorts of explosive dynamics.

So if they can forestall that, they can still, you can still have lots of the benefits that you wanted from AI.

You can have AI being used for healthcare and AI being used for weather prediction and agriculture and, and these sorts of things.

Lots of pro social uses, lots of economic application.

You can still race on race to build and improve your industrial capacity of the US and its its allies face to face with China.

So there's there's there's plenty to do there.

And there's also lots of other military types of ways that they can make things competitive too.

You can build more drones, they can build drones, you can build drones.

So so this the story still sort of goes on.

This isn't a halt, technological progress is.

Instead, if you're just wanting to do some sort of recursive loop that potentially gets you something explosive, you're going to need to figure out and talk with each other, come together as a species of how you're going to make that not go very poorly or lead to global destabilisation.

That's useful, and I hope some of the critics of the letter take note of of all of that.

Let's just, I do want to cover on super Intel.

Well, the, the pathway through what I assume, what I assume is the pathway through AGI to ASI to, to, to super intelligence.

So artificial general intelligence generally defined as you know, as cognitively capable as a human being at any kind of useful task through to super intelligence generally defined as you know, some orders of magnitude smarter than or well equal to or smarter than you know, the collective of humanity potentially as you say, sort of 5000 IQ plus.

So I.

Mean that specifically it's notional.

But no, no, no.

Sure.

It's yeah, yeah, yeah.

It gives people an idea of this sort of thing.

You're talking.

Yeah, yeah, yeah.

You're not.

You're not talking about a bit smarter than Einstein.

You're talking about yeah, yeah, sort of fantastically smarter.

Yeah.

Let's talk about whether we get to super intelligence on the present pathway first of all.

I mean, large language domino, large language models are the dominant approach at the moment, it seems.

They probably can't be scaled up indefinitely, but we have reasoning models now which which great problems down into steps and follow logical chains of thought and and are making considerable gains that way.

We have we've got agentic AI which can go away and do things on its own, at least in the digital realm.

Another interesting paper and I'm citing a lot of your work here, but but it's, you know, you put out an extraordinary amount of of, of really fascinating game changing work.

The the definition of AI which came out or.

Of ATI.

Sorry of ATI.

Yes, we know what AI means.

The that was relatively recent.

You use something called the Cattle Horn Carol theory to benchmark cognition to the best human model.

And I want to quote your conclusion that you posted on X from this paper.

There are many barriers to ATI, but they each seem attractive.

It seems like AGI won't arrive in a year, but it could easily this decade.

Just explain.

Explain that paper quickly and how you landed on that conclusion.

Yeah, So what?

So many people have been criticising AIS can't do this, AIS can't do that.

And I don't think that these are just people cherry picking things.

They actually have a lot of limitations.

And what we did was we were thinking if we're treating AGI as something that's human level, it has the IT has the the cognitive versatility and proficiency of a well educated adult.

Well, what does that consist in?

What are all the parts that you need?

And there's a model of human intelligence, which is the main model used in.

Used in a variety of variety of fields, which is derived just from some basic statistical models to be CHC theory.

It was developed over over a century and that identified various components of human intelligence.

So sort of rattling them off.

There are there are 10 there's there's it's visual abilities, it's auditory abilities.

That's it's input and then that's processed through, that's processed through its central executive abilities, which would be its, what's its general reasoning ability, what's its working memory.

So short term memory.

Then from that it may learn things, so it might store abilities and skills and knowledge into its long term memories.

There's long term memory storage and that increases your store of knowledge such as general knowledge and mathematical knowledge and things like your reading, writing ability, for instance.

You also need the ability to retrieve those memories from your long term memory.

And and finally, the tenth among those is what's your overall speed at doing all these operations?

Does it take you a long time to read and write?

Does it take you a long time to reason?

Are you fairly quick at it?

So, so those are 10.

And what we did was we just looked at where is AI in each of these dimensions and we found that there are lots and lots of gaps there.

For reference, GPT 4 on a scale of zero to 100, looking at these 10 axes, get something like get something like 27%.

Meanwhile, GPT 5, because it has visual capabilities, because it can talk and listen, because it has a much larger working memory, and because it's much better at things like mathematics and also reading and writing, that it gets a lot higher.

So it gets 57% accuracy instead.

Now that's only around half the way there.

So there's still a lot to do.

There's still a lot of capabilities that lacks.

If for instance, does not have continual learning ability, it does not have the ability to learn, learn things from its day to day experiences.

So it's it has basic current GPT models have amnesia and this is one reason they're not very useful.

It's very hard to employ somebody with amnesia.

So you will always have to re explain things to it again or give it new context that a human wouldn't require.

So that's a substantial limitation.

So I think that's the main part that this sort of framework and model of human intelligence identifies as substantially lacking current AI systems.

But if we have a breakthrough there, then we've actually I think done the hardest part.

Then we're just needing a lot of other business as usual research and then research and development like improving it's visual capabilities, improving it's audio capabilities, making a short term memory, so on better and so on and so on.

There's a bit to do for it's reasoning abilities as well, even though that's fairly far along.

So there there's still some gaps though.

But the main thing, so I think it's basically 1 breakthrough in continual learning and long term memory storage and then a lot of business as usual engineering, which could take a few years.

So that's I think what is between US and and AGII.

Got to say, I mean that changed from GPT 4 to GPT 5I mean in just a couple of years.

I mean it doubled from 27 to to around 5757.

Doesn't sound too bad to me, even as a stand alone thing.

Look at my local shops.

I think who wouldn't quite get to 57% that that's of the average intelligent person, yeah.

Yeah, that's, that's for well educated and so a lot of these abilities it what it means is that if I mean, in a, an average well educated person is going to nail these, these these questions though.

So this means it's actually missing a lot.

So it's very capable on some things, but it's not getting points if it's doing things way better than people.

It's just it's only getting it only gets the points if it's at least at the level of people, but it's not getting extra bonus points.

So it can't rack those up.

The fact that it's so knowledgeable about so many different subjects doesn't particularly help it.

It's just, is it more knowledgeable about things in total than like an average well educated person?

And the answer is yes.

So it gets all those points, But so it still has lacks many basic bits of cognitive machinery.

And so the the critics who who point out like, look, I can't do this basic thing, can't do this basic thing.

It's not just cherry pick.

It's actually the case.

They've got a lot of these issues.

There's there's a lot to do, but for all that, I don't think that that means that it's several decades away as a consequence.

I think that if we look at the remaining components and look at the trend lines for those, looks like those will require quite a bit more work.

But I don't see why it would necessarily take decades.

I could I could easily see all of these being resolved by the by the end of the decade.

But it but it needs that breakthrough on retractable correct?

So it needs long term.

Or on long term memory storage, it consolidating those things, consolidating it's experiences into it's it's it's mind.

We don't have the equivalent for instance, in a is and dreaming.

So that's like 1/3 of our day is dreaming and they have nothing like that.

That's where we learn from day to So since they don't have that functionality or anything analogous to it at all, I think that's a big, big chunk of what's missing.

So I mean, and this comes back to the the post training pre training thing.

Basically when a model is built, it's it's it's training weights, which are a little bit like the weights of the neurons in our brains learning.

Those are kind of more or less set down.

The model does doesn't fundamentally change over time through different experiences, correct.

Whereas a human being, as we have experiences, our brain actually our neurons rewire based on those new experiences and that's how we learn things.

I mean, the dreaming thing is fascinating because that's when it actually all sort of gets bedded down in our sort of as our as our brains kind of shut down.

That is really interesting.

OK, So do we then need a fundamentally different architecture from LLMS in order to make this memory storage and retrieval breakthrough?

I would, I would guess not.

It's also very possible like a lot of the AI companies are thinking that maybe they'll be able to get it in the next year or so.

They're acting in some more bullish ways.

They all have concerted projects for it.

It's it's, it's always harder to tell though, just because we don't have it yet that maybe we would actually.

But personally, I I would guess that you don't need something that isn't deep learning.

I would still expect it to be a deep learning system and I would guess it probably still can be a transformer of some sort.

The main issue is taking those experiences and taking the gist of those and and learning from those without destroying the rest of its knowledge.

That's that's kind of the the issue.

And doing that efficiently is enough or efficiently enough so that it can learn just from a few experiences and get the gist of it and update itself accordingly.

So those are those are some challenges so that it doesn't forget lots of the other stuff that it used to know.

Obviously people forget some of that, but the.

Great, the great line, the greatest light ever line from the The Simpsons, Homer Simpson says.

You know, Marge, when I I I can't learn new things.

Remember when I learned how to use a computer?

I forgot how to drive.

Yeah, we don't want catastrophic forgetting when it is doing that learning, which is a substantial challenge.

Okay, okay, so all right now just we've got a little bit of time left.

I'll finish up in a moment, but I just want to get your thoughts on is you, you've talked about recursion.

So that's basically, you know, AI is making better AIS and that's the sort of, as you said, that's the goal that a lot of the AI, well, all the AI labs fundamentally are going for.

Does that mean once we get to AGI, we automatically quickly get to ASI beyond that?

Because once you've got, you know, you can, you said to yourself, you can copy, you know, you can copy this one smart AI get it to do, you know, make 100,000 copies of itself.

Suddenly you automate all of this and it goes very, very quickly.

Does that.

Is that a?

Is that a sort of built in assumption that you have?

So AGI itself doesn't necessarily do that.

So it could you would need AGI plus specifically world class AI research skills.

So the definition of AGI that we were using was just the cognitive versatility and proficiency of of a well educated adult, not of a world class AI researcher.

So it may lack some specific economic skills, but if it can learn that, then you basically have this recursion be possible.

And then that creates so much geopolitical uncertainty and instability that this will be a a major global conversation far beyond what it is now.

I should say that we'll have a, a paper out fairly soon as well, which gets that not just whether it's at the has the cognitive versatility and proficiency of well educated all but all, but instead does it have a collection of all these additional economic skills.

So we have a paper that'll be out soon called the remote Labour index.

It should be available by the time this this is, is public.

This podcast is public, where we're directly measuring the a is automation rate of various different economic tasks.

And we're currently finding it's something like 2.5% or so of the tasks it's able to automate.

So in terms of economic traction, it's still not, it's still not there there, but you know, we'll see how that evolves over time.

So the main objective of that is let's just keep track of what the actual automation rate is when we're trying to see what the economic impacts and overall usefulness of these models are.

Great.

We'll, we'll look out for that.

OK, one last one.

I know you've got to go, but I'm just interested in your take on the, the degree to which we're very much in uncharted space here.

We we look for analogies.

I, I talked about this on the podcast that we did last week where we touched on this ASI Oh, sorry, this super intelligence statement.

But you know, we, we, we instinctively look to history for analogies and for comparisons when we're in these sorts of discussions.

And the nuclear 1 is a sort of a natural one because of, I suppose, the, the geopolitical arrangement in this case, we have two super powers a little bit like we had then we've got a, an enormously powerful technology that is still being explored and discovered.

There's a, there's an obvious race around it, but it's AI mean you, you said yourself earlier in the conversation, it's not an adequate analogy nuclear because I mean, one, nuclear bombs can't make better nuclear bombs.

That's a, That's a bit of a.

We don't have that.

They can't self multiply.

Yeah, that's right.

That's right.

But also, I mean, the things like, you know, if, if, if, if you're talking about deterrence and preventive action, the, you know, when the US briefly had a, a monopoly on nuclear weapons, it could have bombed the crap out of the Soviet Union and ruined its chances of catching up.

And I think and, and as you've noted yourself, some, some, you know, surprising pacifists, including the philosopher Bertrand Russell, actually advocated for that because you thought it was the lesser of two evils as opposed to getting into a nuclear arms race that would go on forever and could potentially destroy everybody.

It's different for super intelligence because presumably you could non kinetically disable somebody else's ability to, to catch up with you.

You could do it through cyber means that you were talking about before.

The moral and political cost would therefore be much, much lower.

And therefore the geopolitical instability is possibly even greater than it was around the nuclear arms race or the the standoff there.

So just reflect for me a little bit as a way of closing on, you know, what it's like just sort of working in this completely uncharted space, you know, at A at a, you know, at a time when, you know, we're really looking at the creation of a technology that will change just everything.

I mean our entire understanding of life on Earth and intelligence.

Yeah, So I, I think at a high level, this is the largest period or most salient period in human history.

The development of AII mean it could even be beyond, you know, humans in some sense.

Like this would be one of these.

If you zoom out like there's the transition from unicellular life to bio or to to multicellular life.

That was a very big event.

And also the emergence of from from having just biological life to digital life.

There's also a very big event on a cosmic scale.

Now for analogies, I think it's useful to see to, to think about AI not just as you know, being analogous to nuclear weapons, but instead analogize it to potentially catastrophic dual use technologies.

So it's like bio, that's cyber, that's chem.

And what things, what properties are in those intersections?

Well, we have.

We had agreements in all of those, international agreements in all of those.

We wanted to keep those on the hands of rogue actors.

For all of those, we in part deter particular usage for biochem and and nuclear to to varying extents.

And we also use it for economic competitiveness.

We use, well, I guess less for nuclear, it really depends.

Sometimes there are nuclear power plants, but for biochem we also use that to to supercharge our economies.

So there's some use cases we swear off and so and there are ways in which we use this dual use technology.

So I think that's the most productive viewpoint is analogizing it to that.

It has lots of benefits, it has lots of risks.

What is shared among the case of nuclear chem and bio, Another class of analogies I find very useful to think of AI systems as complex systems, not mechanical systems, but things that are more analogous to or a complex systems is what's the set of analogies that are useful for analysing, for analysing a complex system?

So it's it's a collection of a lot of things to look for a lot of failure modes that tend to happen.

And so I think those are those are probably the two most productive analogies.

But I think making it be very specific and not abstracted, if you, you know, try and analogize it to, you know, global warming, what's the equivalent to the ozone layer is, you know, tends not to be very productive.

But when you do a little bit of abstraction or try to find what what patterns hold in multiple different settings, then I think it becomes a lot easier to to think about these issues.

Fantastic.

All right, look, I will let all of that sink in to our audience's heads.

But look, Dan, I've been wanting to do to do this for a while.

I'm very, very grateful you you've made the time for us.

I know how busy you are.

You've got many roles and many hats that you wear.

Keep up the great research, keep up putting out the great papers, and we'll put some some of those in our show notes.

Dan Hendricks, it's been a real pleasure.

I enjoyed it.

Thanks for your time.

Yeah.

Thank you for having me.

This is a good set of questions.

Bye.

Thanks for listening, folks.

We're gonna be back later this week with another episode of Stop THE World.

Superintelligence and human security, with Dan Hendrycks

Episode Transcript

Never lose your place, on any device