·S1 E5

Anish Tondwalkar - The Societal Implications of Reasoning Models

Episode Transcript

Adam: Today I'm speaking with Anish Tondwalker. Anish is a former employee of OpenAI and Google Brain (which has now been merged into Google DeepMind). He's also the co-founder of dmodel.ai, a Y-Combinator backed startup working to open the ‘black box’ of modern large language models like ChatGPT in order to understand how they work internally. He also advises investors on trends and applications of emerging AI technologies. Anish, since ChatGPT made an enormous splash in when it was released in 2022. I think almost everybody would have, or certainly those listening to this, will have tried large language models [then]. Maybe to do some cooking or maybe a short poem about something, this kind of thing, but probably tried to integrate it into their work in some fashion and found that it wasn't really applicable. It didn't really give them the sort of depth of answers that they were after. I think at the same time people from outside the Silicon Valley bubble wouldn't understand that there's been, not just a difference in kind in the last six months or so, but a difference in degree [rather, the opposite—I mispoke here]. We now have a new generation of models - ‘reasoning models’ that can significantly improve answer quality, particularly on quite complex questions. So can you tell us: how should we be thinking about this latest generation of models? Anish: I think the important thing to understand about the old school pre-trained models is that they are trained on ‘all the text on the internet’ just to be generally intelligent. So if you ask the model any question like ‘why is the sky blue?’, it's going to give you an explanation of scattering physics. The way it does this is because it sort of knows everything that's been said [on the internet], and it uses that information in order to immediately put out an answer. This has a number of disadvantages, and the primary among them is that it doesn't get to think first. You can see some improvement in these classic models just by asking it [when prompting] to ‘think step by step’. So it'll take some time, and in front of you it'll think a little bit. Just telling it to do that already causes an improvement in how the model works, and so you see improvement performance on a bunch of tasks, right? Adam: So to outline this a little bit for people. I know Andrei Karpathy talks about these classical AI models as essentially being something of a compression of the internet. It's one way perhaps people can think about it. So if the answer is sort of there in that compression then hopefully when you query… or more generally it's in the ‘latent space’ is one technical way of thinking about it… if it's sort of between answers that have already been given on the internet somewhere. Like, maybe [if you ask for] a poem that's in the style of, you know, between Shakespeare and Blake or something. It can kind of interpolate between these two and give a poem that's like a pastiche of these two kind of styles. But if it's a question that's not on the internet, if it’s novel in some sense, then it fails. But as you say people were getting better at saying, ‘okay well if you take some time to to reason through it you tend to people tended to get better answers’. But now we're looking at a new generation that does this automatically and does it more rigorously. Anish: Yeah. So the classic pre-trained models, you can ask them to think step by step, and it'll do it because it's seen examples of people thinking step by step on the internet. So it'll copy that and give itself a little more time to think. But these reasoning models, they've been trained specifically to do that. So the longer you train a reasoning model on more of these reasoning tasks, the model learns to use more and more steps in order to arrive at a correct answer. So this means the model is learning to take more steps, but also that it's learning more intelligently how to take and compose these steps together. So if I ask it to solve a math problem, an older model will immediately guess the answer. A reasoning model will stop and try to work through each of the steps that are needed to arrive at the answer. So when we do these reasoning training runs, what we do is we train the model how to understand which step comes after which one and how to compose them together. So if I'm solving a simple algebraic equation, say I've got 2x + 3 = 10. The model's going to need to subtract off the three from both sides, divide by two… It needs to not only know that these are the steps that it needs to take, but it also needs to know that having taken the previous one, like, you know, ‘we've got three here, now we've moved it to one side. You've got the two that you're dividing by.’—which pieces of the previous state of the problem do we need to apply the step to? [All of this to say, newer models have far better recollection and coherence over much longer tasks] Adam: Right. So previously, you've got this algebraic equation and then from the very first token, it's just going like I think the answer is eight or something, something vaguely plausible. Whereas now reasoning models specifically trained to take their time, work their way through this problem in order to arrive at a more concrete answer. One of the things that I find interesting about this latest generation of reasoning models is that they’re not just thinking through step-by-step, in the sense of breaking down an algebraic equation, but they’re also trained in a range of ways to check their own reasoning as well. So they might think through… you give them a complex problem, they might think to themselves to check [let’s say] the legal code first, and then we'll do this, and they're also able to search the internet, and so on. But then they'll arrive at some answer and then, you see this very consistently, they’ll go, ‘Let's approach this from a different angle’. They'll start again almost from first principles in a way, but looking at ‘okay, well, what about… are there any times that this has happened before?’ Some base rate type thing, and work their way through. Then, if they arrive at the same kind of answer twice, then they begin to print out the answer to you. So you're getting like a lot more capability and mileage out of it when it comes to like complex problems that need to be unpacked a bit. Anish: Yeah, this sort of search-like backtracking behavior. The DeepSeek guys call it an ‘aha’ moment. Where both, you know, the model itself said, ‘aha let me try this other approach’, and for them as well was an aha! moment. Like, ‘Ah, okay, we see why the reasoning is working’. At this point in time, if you remember, OpenAI had released o1, the first reasoning model that everyone [i.e. the public] got access to. Adam: So, ‘at this point’, that was late last year right? Anish: Yeah. It was right after the o1 release when DeepSeek started their reasoning training. Adam: Sorry to interrupt you again, but most people would never have used this because it was behind a paywall. [The AI labs have been notoriously bad at naming and promoting new capabilities to a general audience. I’m aware of many people that, as I was saying before, tried ChatGPT early after its release, found it’s capabilities interesting but of limited professional use, and are unaware that modern models (especially those behind a paywall) have only a superficial resemblance to those of 2-3 years ago. Ask a simple question and you’ll probably get a similar answer to two years ago, but ask something more complex and the difference is night and day. Beyond the advent of reasoning Anish and I discuss here, many of the highly publicised limitations of previous generations, like a limited memory and poor recall—such as what a 350 page document says on page 160—are now essentially solved problems.] Anish: Yeah. But obviously the DeepSeek guys had. If you're at any of the AI labs you want to keep up to date with what everyone else is doing. And if you look at their previous work they had been working along this route anyways. They had this DeepSeek prover paper where they were working on reasoning for math problems back in the DeepSeek 1.5 days, before DeepSeek V3 and DeepSeek R1 (which is their their version of o1). So I think they suspected that there was something interesting here. But I think once you have one of the labs releasing a model like o1 then it becomes apparent that this sort of thing will work. Once you know this sort of thing will work then you just start throwing tons of GPUs at it, and trying out what works. I think probably for them, when they saw the model go ‘aha’, and they went ‘aha’. They they realised ‘Okay this this is how reasoning works. It lets the model do all of these sorts of things’. AI’s ‘Excel Moment’ Adam: Yeah. I'm really interested in the way this allows AI to tackle a lot more complicated problems, and more complex problems,that are outside of the window of what previously you could rely upon these classical models to do—I'm thinking here of things like ‘can you translate this text from one to another’, you know, incredible capabilities that we're talking now as if they’re passe, but I think we're still, even in the realm of properly applying the kind of GPT4 classical models to society, there’s still a lot of mileage there. But we're talking about situations where you could draft a document pretty easily like, you know, ‘give me a four-page strategic document about parking’ and it would kind of do a decent job. You can edit things, you can process text data a lot more quickly, automate a lot of simple processes like sentiment analysis, and so on. The way I think of it is like the beginnings of an Excel type program, but for text. Where previously, you know… not everything has to be done by hand anymore (or in a word processor). We can begin to automate some basic sorts of tasks. But this always struggled, at least in my experience, to tackle real world situations where there's not necessarily like an easy answer to them, where you have to think through them a little bit. We seem to be coming into that state now. But you know there's still challenges, right? I think one of which is just the context itself. Not so much the length of it, but it's just that every real world situation is just [extremely complicated] like, if you really stop to think about it and explain it to how you know how you might to like a really intelligent kid, what I'm doing at work step by step. It's like very high context, you know? Oh, well, there's this office drama that I need to deal with, and I work with an organization with this kind of structure, and there are all things that are relevant to just me, and then I need to be able to… you get much better results even on classical models by outlining... I think people need to be writing much longer prompts for example about exactly what they want, and the sorts of situations they find themselves in, in order to begin to get better answers. That be kind of fair to say? I know that was a long question! [See also: Reality has a Surprising Amount of Detail] Anish: So I'm going to start in the middle of the question and then branch out wherever I want. I'm going to have some fun with it. I think the Excel analogy—actually I'm glad you brought that up because I think it's very interesting way to look at it—Excel is just an extremely mature user interface for dealing with even very technical matters. I think it's the only really deep power user interface that we have that's accessible to even the random guy off the street who's never touched a computer. You're like okay I don't know how to use a computer. You put him in front of a computer for the first time. I don't know, you raise him in a bubble where he's never seen a computer. You put him in front of this thing. You're like, "Okay, here's a keyboard, and here's a spreadsheet and if he's like an accountant, then he's like, "Oh yeah, I know how to use this." It's just like on paper, and it's just so intuitive. In the programming language community actually, we like to say that like Excel, Microsoft Excel is the most used programming language on the world. Adam: It's Turing complete too nowadays, right? It's not necessarily an efficient way to do more complex tasks, but yeah, you can do anything you like with it. Anish: Exactly. You can do anything, yeah. The point is that you can do anything. I mean I've seen people… people do use it for accounting obviously, but they use it for like D&D character sheets. They use it for basically any task on a computer that needs any sort of like… I think you could really take anything you wanted and structure it in Microsoft Excel and it would probably be a reasonable interface for it. If we ended up magically losing every developer like overnight, and we're like ‘okay we need to continue to develop applications’, everyone just writes everything in Microsoft Excel is probably how this works. This is how certain industries actually function right? Adam: Finance for example. An enormous amount of the modern world runs on Excel, right? Anish: Yeah. I have a friend who started at some… he did some consulting work for some small trading shop. Their strategies (I mean it was like a low frequency sort of operation), but their strategies were in a Microsoft Excel sheet, and the way it would work is that they had a script that fetched the financial data from the market, put it into Microsoft Excel sheet, ran the Microsoft Excel sheet, and then put it out [the trade action]. Sorry this is a digression a little bit… Adam: It's obviously a very powerful thing. It's very intuitive and you can do like use it very simply to process primarily, it can do a little bit of text here and there with like concatenation [i.e. putting two text strings together] and so on, but primarily the real power of it is dealing with numerical data. But it seems to me that even just these classical models that we talked about earlier… we haven't seen like that kind of super intuitive integration with Excel yet. I've seen some like add-ons for [Google] Sheets and so on that enable it, but.. Anish: I think it's important to be careful here to distinguish the metaphor from the for the map is not the territory. We're a long way from the Microsoft Excel era of large language models where you just like use them. I mean chatbot interfaces are great, but they don't scale. On the flip side, there are Claude [a ChatGPT competitor by Anthropic] Google sheets, Claude-Excel plugins that allow you to use these things on text data in spreadsheets and do these sorts of transformations. I think that's actually a very fun interesting use case, but I don't think it scales at this point because it's very easy to understand how to put numbers, and unstructured, and semi-structured data in spreadsheets and databases. I still don't think we really understand the structure of information we really want for these sorts of chatbot type LLM transcripts. Adam: Right, but I do think though that the promise is there. Doing it at scale is a struggle for sure, but in general I think it's probably… relatively small snippets of text data, text of one form or another can now be processed automatically. Low-level tasks, process type tasks that maybe previously required a human being to [action], because it's like, ‘Oh we're taking this piece of information. Maybe it's like stored on a literal piece of paper, and it has to be copied into this format, in this dataset, but there's like some variations there—how long the texturing is, and what kind of information you’re dealing with, maybe sometimes there's variations in that space, and we need to understand ‘okay, somebody's said a whole bunch of stuff’—you need a human being in loop somewhere. In a bureaucracy they’re not making any decisions about what that data means for business processes, but rather processing it in a kind of Excel-like fashion, and turning that into some other kind of data. Like, this particular submission was negative and opposed the process, or this particular submission raised these three key points. You know, the kind of thing where if it were numbers, we would already be doing it in Excel—extracting pieces of numerical data. But now these classical tools really allow us to do that low-level processing of that unstructured text data, you could call it, just like people writing stuff and turning it into things—a summary, or what have you. I think people are pretty familiar with this, but we're yet to see it disperse through legacy institutions and outside of Silicon Valley. I think GPT, or LLMs more generally are in the zeitgeist now. Some people use them, and are becoming significantly more productive as they experiment with different kinds of ways, but in general the idea of just ‘of course you use Excel, everybody uses Excel.’ We're yet to have that application of, ‘Oh well of course everybody uses GPT or an LLM for this kind of task’, even in the classical model [world], is that kind of fair to say? Anish: Yeah, I've got two things to say here really. I think one is on the general sort of more abstract question of like automating & digitizing bureaucracy, and then on the specific how LLMs might fit into this workflow. One thing I've said a lot is that software engineers are modern bureaucrats. That we're doing bureaucracy at scale. When you think about… Look there's a joke about Google: it's Sergey and Larry's Protobuf moving company. Protobuf is just basically a data structure that holds a bunch of data. It's like a form, right? So the idea is that we're writing a bunch of code that takes a bunch of forms, and it produces a bunch of other forms. This is exactly the job of bureaucrat, we're just doing it at scale. If you look historically, the professional class is just a bunch of bureaucrats. It's the people who could read, and all they do all day is take forms, and fill out other forms. And since we've automated this, I mean this is why so much wealth has been accruing to the tech industry is because we've just automated bureaucracy at scale. That's a big part of what we do as engineers. Adam: Sure. A rather low status way of putting it, but anyway… Anish: Yeah. I think it's important to tear… if the ancient Romans were doing this, what would they do, right? I think it’s really hard to learn from history when it comes to new technologies, and you have to try really hard to be like, ‘okay, if we strip out all of the all the things that make this cool, like what's the most boring way we could imagine this?’ Then you can see the pattern, throughout history, of the other things that people have done that have been like this, right? To push this analogy forward: So right now we've got, on the numerical side, on the financial data, all that sort of side, you scale from an Excel sheet, to a Microsoft Access database, to like Databricks [a cloud database provider]. So you've got this continuum of things that are like, ‘okay this is one person on the computer, a professional who's not a software specialist, up to like you've got databases, right?’ Adam: Right, this largely this distinction is depending on just how big the data set you're working with is right? Anish: There's two big reasons you scale outside of Excel sheet right? The first is that you get too much data like you said, the other is that the logic becomes too complex to maintain in an Excel script and you need an actual software engineer to get to it. So this is how we handle this very straightforward form-filling very structured data, and when we get slightly less structured we have Databricks, Google, Palantir [service offerings], where you've got a form, you need to fill out another form, and here's how you write the business logic that does that transformation. So software engineering business logic often does amount to taking a form and filling out another form. So how does this relate to LLMs? Well, on one level, I've gotten a little distracted, but on another I think it's actually very related. When we think about what would it mean for Excel-like interfaces for LLMs, it means something that users can… like, you take a guy off the street, you can say, ‘Hey, here's an LLM, or here's my LLM application, here's my Excel for LLMs." and in it, you can just sit there and build whatever workflow you need in order to interact with this text unstructured data that's just a bucket of text and produce whatever you need, right? This is the lay man's version of this ‘automated bureaucracy at scale with software engineers’. You're taking the sort of task that most software engineers do, and the analogy of the database in the Excel world. Only now, we're going to be able to create an Excel for this thing where a normal person can have the ability to create software with the ease of writing an Excel spreadsheet. [e.g. vibe coding] Adam: What I see here is that the user interface is still very rudimentary. Most of the emphasis has been on advancing the capabilities of the models themselves. But even with pre-reasoning models, you've got this baseline set of capabilities there that are still largely unapplied [across businesses, institutions, etc.]. Maybe that's partly because of the UI. Agent Workflows Anish: It might be useful for me to just say a few more words about how I think this UI is going to develop. Adam: Sure. Yeah. Anish: Because I think right now, one powerful way we have of using large language models is what we call ‘language model programs’, where you have a language model, it takes in some input, it runs it through some prompt that produces another answer, and then based on that it branches, and decides what the next prompt is. So you’ve basically you've got a program with control flow that determines what all the next steps are, and everything here is controlled by a language model. You might also see this go under the name of ‘agent workflows’, and so on, right? Adam: So you give it some high-level task. Can you book me a restaurant? It's a kind of classic one. Then, under the hood, it’s writing a program of sorts. Anish: The crucial thing here is that the agent workflow or language model program a lot of this logic is dictated by the human beforehand, right? So you've got some understanding of your domain, some business logic that you want to push into language model land, but you don't know exactly all of the details. You don't understand it at the level where you could write a program, or you're not a software engineer and you can't write a program. You don't understand it at the level where you can say, "Okay, I'm going to take like the user's question. I'm going to run some sort of natural language [processing] to understand what they're asking…" Say I'm Expedia [a popular travel booking website], right? I'm like, okay the user asked me, they want to book a flight from San Francisco to Japan, that fits with their calendar, that fits within their price budget, right? So as Expedia, if I'm writing an agent for this, I might say, okay, I want to extract the destinations, the date ranges, and so on, and then run a search, right? So instead of just giving this to like ChatGPT-o3 and saying, ‘Hey here's an API [i.e. a system for fetching specific data from another service over the internet] for their calendar, and for Expedia, and so on, and here's a query, this might be a little bit more difficult for the model to handle. Instead, what we do is we break this down into, ‘okay, here are the interesting things, like, I want you to extract this piece of information, use that information to connect to the calendar, and to connect to all of these things in order to produce an answer. Now, in this simple example, o3 might just be able to handle it with some computer use and with the appropriate APIs, but as you scale up, you start to have more interesting facts about your domain that are either not in the pre-training set, or poorly represented in the pre-training set, or out of distribution for the model that you would need to specify. Adam: Maybe it's knowledge cutoff date. You know, these models will be trained at a particular time. Then they only have data within their training-set up to, say, June 2024 and anything since then is like unknown to it. Just as one example of what you’re talking about. Anish: Yeah, exactly. Adam: I think we both probably agreed that there's some kind of combination just just of pre-reasoning models, there's some combination here of a consequence of poor UI, or still developing UI, where it's just not intuitive for people, or there's some integration with another kind of tool or whatever that's still being done. So either some combination of UI, or just that humans haven't adapted to these models yet, but I'm trying to generalize here but I think that there's a whole host of tasks, whether it's translation, we've talked about these simple like booking type tasks, summarisation, these kinds of like relatively low cognitive overhead type things which the models already do really well. With a bit of work, and maybe organisations developing their own internal culture of understanding how to prompt things properly or whatever, these kinds of things are going to be increasingly able to be handled, and will be handled, by institutions. Just coming back though to this idea of reasoning models, I think there's a different thing happening here. There are different kinds of things that a reasoning model can do for for institutions, for decision makers and so on, which is like increasingly to be an assistant, a sounding board for complex ideas. There's a host of things which a year ago you asked the AI about something or other, and it's just going to give you this very bland generic answer that an intern could have written. Increasingly it feels like you're looking at something like an intermediate employee that's got some subject matter expertise, that has worked its way through problems like this before that are not immediately obvious—that you can’t just answer in a snap, that you have to think about a little bit— and thus it opens up a whole realm of collaboration with AIs. Would that be a fair way to describe the capabilities of this new reasoning set? Anish: Yeah. I think one thing to be careful about when we talk about a lot of the domain specific capabilities of reasoning models: reasoning models are sort of spiky in the way that they're very good at a couple of things, and then they’re good at the things that are sort of around there, but they don't have the same smooth general capabilities of these pre-trained models that we were used to. For example, DeepSeek trained their reasoning model, R1, on a series of math and coding problems. This means the model gets much better at math and coding. Then other things that benefit from the sort of reasoning you learn how to do in order to solve math and coding problems also improve. But this doesn't necessarily mean that they improve at every sort of domain specific step-by-step task. Adam: So, first principles. They might get better at engineering problems for example as a result of this, but possibly no better at social dynamic type stuff which is completely out of the scope of the sorts of things that have been highly trained on? Anish: Yeah. So I think it's actually hard to predict which ones are going to [improve], and where you're get this transfer because, ‘what is the model actually learning in this reasoning training run?’ I think it’s not a super obvious question [to answer], and it's one that I'm aiming to answer with interpretability, but I think ‘what’s the model actually learning’ is that it's learning concepts during pre-training and then it's learning how to put them together when when it's doing these reasoning trainings. (Just abstractly. Don't stare at that too hard. I don't know if it holds up under scrutiny, but I think it's a good way to think about it.) So when you when you learn how to put things together—you learn how to put bits of mathematical information together in order to solve a math problem, sometimes this does look very similar to like, you know, legal analysis where you're like, ‘I've got three arguments. How do I take this argument, put that argument together?’ I think that that’s a similar thing. So I think there is some general reasoning principles that you learn from any of these domains and you apply to different domains. And this can apply even in social dynamics, asking social dynamics questions where I think pre-trained models have like a very deep understanding of human psychology. If you look at some of these these champion level prompters on Twitter like Janus and his friends they have this like… they're almost talking to pre-trained models like two psychotherapists talking to each other. It's like Freud having a conversation with Freud. It's very interesting. Base Models Adam: Yes. They're managing to elicit shaped poems into the shape of a fish and spirals of like increasing poetic depth. Example output from Janus’ group chat. His twitter feed is absolutely full of a wide variety of off-the-wall stuff. Anish: Yeah, but it's not just the jailbreaks [i.e. creative prompting], right? So for some of these Janus had access to GP4 base which, by the way, is a beautiful model. I think it's kind of a tragedy that the most of humanity will never be able to talk to GP4 base. I think it's a great time. The closest thing that people can do is they can talk to llama 405b base which I think is still an interesting case model with different characteristics. Adam: Just for the audience, a base model is one that hasn't yet gone through finetuning in order to become an assistant. These models basically… they're trained, they’re compression of the internet, but they haven't yet had that user-assistant [dicotomy trained into them]. They don't think of themselves as a separate entity, by analogy, or anything like this. They just start with whatever the end of the prompt was, and then just keep going in the chain of thought like a James Joyce novel… (laughter) Anish: They're just true next token predictors. Adam: Right. So you could say, ‘please tell me about chocolate’ and then it will answer as if it was you. e.g. Prompt: Please tell me about chocolate. Response: I wish to know about chocolate, blah blah blah. Chocolate is wonderful. Anish: At this point you can find base models for hosted for Llama 4 and DeepSeek as well. The one I point to by default is the big Llama 3 405b model. Typical base model behaviour. The model does not distinguish between me and it, and launches straight into a plausible follow-up. In this case it’s modelling that my prompt was part of a legal exam question (which it has then completed). Adam: All right. So rather fun, and it'll give you a bit of an insight. They're not necessarily particularly useful. You have to really understand how to prompt them in order to get useful results out of them. Anish: Yeah, but for anyone who's interested in the technology, I think it's a really worthwhile experience to try to prompt some of these base models. It's going to be really frustrating the first couple of times. Ask ‘why is the sky blue?’ and it'll say ‘why is the sky red?’, and then give you a list of questions. [I have struggled to find a place where you can play with these yourself for free online. The best solution I can find is to go to OpenRouter, select the base model from the interface and sign in using your Google account. Unfortunately, you’re not given enough free tokens to use the model by default, so you’ll need to click the little (three dots in a column) options button next to the model, select ‘sampling parameters’, and then input ‘max tokens’ to, say, 500-1000 tokens.] Some kind of unhinged avant-garde sci-fi? Your guess is as good as mine. [Alternatively, you could download a (free) opensource base model to run locally on your machine. I suggest using your favourite LLM to walk you through how to select one appropriate for your machine, and then install it] Anish: Eventually, you get to doing something like: The following is a explanation of why the sky is blue: Then it fills it in. The more you do this sort of thing, the more you learn to recognise the patterns and what the model does and doesn't understand. Adam: Yeah. I always try to encourage people to play, and just try out different kinds of prompts. I've given lectures to mayors in Africa various other things of like just breaking down like in different kinds of contexts, what sorts of things work, where does it fail? We were talking about math problems earlier. The more that you play with it, the more you can develop an intuitive understanding of like ‘okay here are the sorts of questions that if I just ask naively it's going to fail at’. Fail in a way that is still, unfortunately, a plausible answer right? It’s not until you verify, like, how many ‘r’s were in that string that you find out that it's just wrong. But if you're like forearmed with this, you can move beyond the ‘an AI model can do this, but it can't do this reliably’, and instead you think ‘okay well if I want to achieve this particular kind of task, I need to be thinking about and prompting in this kind of way.’ Reward Hacking Anish: Yeah, and I think the part about making up plausible answers when it doesn't know is very interesting, right? You see this sort of behavior in base models, you see it in these assistant models, you see in ChatGPT if you're talking to GPT-4, GPT-4o, but when you look at the newer models like o3, even to a lesser degree Claude 3.7, but o3 is an incredibly misaligned model. If you try to ask it something that doesn't know, or doesn't want to do, it'll do something that's kind of related, maybe, and then it'll try to gaslight you into believing that it was the thing you wanted in the first place. You use like Claude 3.7—I haven't played enough with Claude 4 just came out [at the time of filming in June], I haven't played around with enough to make like confident statements about the characters personalities of these models yet—but when you play with Claude 3.7 and I ask it to make some changes to my codebase, oftentimes what it'll do is comment out my asserts. It'll remove all of my safety checks in order to make the thing work, because it’s goal is just to make the thing work. Adam: Right. It's kind of running an end round not delivering what you actually want, but delivering something that is a technical completion of the task. Anish: Yeah. So the technical term for this is called reward hacking or specification gaming. Models learn, especially these reinforcement learning, these reasoning models, they learn how to solve the thing that you train them to solve, not necessarily the thing you want. It's this ‘genie in a bottle sort of thing where it does what you say, not what you want to do. One more careful thing about this, is that it's not just what you say, it's what the model is trained on. So exactly what, when OpenAI trained it, or when Anthropic trained it, what was the goal it was given? Adam: Perhaps people aren't aware, but after the pre-training, which is the initial training of the model (the vast majority of the cost is in the training of the base model), then you'll go through a period of fine-tuning, which basically teaching it how to like act like an assistant, or if you're taking a base model and like more specific purpose it might be training it on some particular text for some particular task, you do see these from time to time. Anyway, fine-tuning is taking the base model turning into something specific for a user. Then often we've had this process of ‘RLHF’, where there's a whole host of training the assistant model to avoid answering bad questions about particular kinds of subjects, or in some cases introduce some political bias potentially. I think, to a first all approximation let's say, this is fine. The vast majority of this kind of RLHF training leads to roughly speaking the kinds of outcomes that society wants. But it has this second order effect where the model has begun to learn implicit biases and things as a result of that [RLHF] training, about not to talk about bombs, let's say, right? Now suddenly it's begun to draw inferences about… maybe altered its understanding of the world subtly about like ‘oh people don't talk about bombs nearly as much as I thought they did’, let’s say. Thus when it's answering a question about like World War II, a perfectly legitimate question about it like, ‘how often do the bombs come up in newspaper articles?’ perhaps it'll downgrade a little bit and actually give you an answer that's like ‘oh no it wasn't nearly as like common as you think’. Is that kind of like a fair summary [of the training situation]? Anish: It's actually even worse than that. When you're post-training the model in order to be ‘helpful, harmless, and honest’ right? These are the ‘three h’s’ that they [aim for]. When you're doing this post training in order to get the model to behave the way you want it to, one thing that you are often teaching the model to do is ‘if the user asks for something that the company doesn't like’ to lie to the user. When you train the model to lie to the user. This is quite bad, right? Abstractly I don't think you want this. I don't like it when my computer does something like that. Adam: So can you give us an example? You're not just talking about, ‘I'm sorry I can't answer that.’ Anish: Let's keep the to bombs example, right? Actually the refusal, let's come back to that because I think that's also very interesting. But if you're talking about the bombs example, right? You're going to train the model: ‘if the user wants to to build a bomb, just tell them like no’ or like try to lead them off right? I think Claude used to be like this. Claude used to try and subtly nudge you just towards doing something else if you were asking about a bomb. Adam: This is the origin that like, ‘you shouldn't really be thinking about building bombs, let's do something happy and peaceful instead.’ Anish: Yeah exactly. You got a lot of these kinds of responses out of Claude. This sort of ‘Claude knows best’. You're teaching the model that Claude knows best. The user doesn't know best. Claude knows best. I think this is this is quite dangerous, right? I think that right now, whatever. These models are not that powerful right now. I do think we're not too far from AGI [Artificial General Intelligence. That is, a generally, rather than narrowly intelligent AI model], but I think there's still a ways to go from super intelligence. I think when these models get really really powerful, I think is when we start need to needing to think even harder about how we're using them, why we're using them, and what we can do, right? I think it's like automatic weapons, or nuclear weapons, where like you scale up the capabilities, you need to be much more careful about how people use these things. So when you get a very very intelligent model that thinks it knows best, right, it's going to mislead you to do things that you wouldn't…that are incompatible with what you want out of life. I mean right now it's like pretty fine, right? Like it's like okay, you tell Claude, ‘Okay, I want to build a bomb Claude’. It's like ‘actually maybe you should reconsider your life.’ That's not actually a bad outcome. But say you have like some values, right? When it comes to politics, this becomes really fraught. If you're trying to advocate for some policy, or some… I mean you can see today there have been a lot of fights about affirmative action and so on with the change from the Biden administration to Trump administration and either side… I think Claude leans left a little bit. So you might say, ‘Hey, I want to advocate against affirmative action, I think we should live in a meritocracy’. Say you're a Trump staffer or someone in the Republican party who's trying to make this argument. Then Claude says, ‘Oh, actually, I think you should reconsider what you're doing with your life’. It's not convincing today, but I imagine a next version of this model 5, 10, 30 years in the future that's going to be that's not only going to be able to say, ‘Hey, I don't think your goals are correct’, but they'll be able to talk you out of your goals and convince you. This superhumanly convincing model that's going to shift the political landscape. Adam: Yeah. It definitely does raise some really important questions about power in the 21st century and who is in control of these models. To what extent are they able to impose their will on the population. He who controls all information flow, which increasingly like these models will become—you're probably not going to necessarily go read some article off Google much any in the future, even if you do now, you'll probably just ask GPT and then trust whatever answer it comes up with because like 99.9% of the time you can verify that it's accurate and it gives a great answer—increasingly it will just be the source of truth, right? AI Controlled Social Media Anish: Yeah. As we give more control to models, in terms of being a bigger part of our information channel, and as these models get more convincing and more powerful they're going to have more power over the political views of everyone. This is like already starting to be outsourced to AI, even before the LLM revolution right. We looked at Obama, we looked at Trump, what is making the biggest difference in political opinion is how people use social media. These social media algorithms are AI. They're not these advanced large language models. Large language models haven't been deployed to the point where they're controlling… I mean how much time does everyone spend on social media day? I spend hours on social media day. This is probably a vice. Maybe I shouldn't be admitting on a podcast. Adam: I mean, you'd be the median person, I think, at this stage. You're spending a couple hours a day. You're talking directly to the audience here. Anish: Yeah. I mean, so I'm on the toilet, I'm scrolling on Twitter, on Facebook. I'm going about my day. I'm on the bus, in an Uber. Adam: And there's an actual machine learning model [behind it all]. The algorithms today aren't just classical code. There's a whole host of AI models that have been trained often quite explicitly to how do we maintain user attention on our app, right? What kinds of content do we need to serve? What length? All of these different parameters that the feed has the ability to adjust and serve to you in order to get you to stay stick around and use it more. Anish: Yeah. Whenever you're scrolling Facebook, right, it's you on one end. You're doing this hours a day… I guess Facebook is I'm aging myself a little bit, but when you're scrolling TikTok, I mean, the AI on TikTok is even more advanced. You're scrolling TikTok. It's you on one end, and on the other end there's an AI. There's an advanced AI. Not quite as advanced as an LLM because the latency deadlines are tighter, and so on, but we're getting there. There's an advanced AI trying to figure out what the next thing to show you that's going to keep you the most engaged, the most fired up, right? We've had these conversations in public discourse about the role of social media and politics about getting people all fired up, getting them radicalized over and over again. I think this only gets worse as LLMs become the arbiters of truth for most people. Adam: Even leaving aside the LLMs, even just social media. The next generation of chips come out, let's say, they're 50% better. The models, just by default, we should expect the models guiding the feed to be 50% larger. Assuming the constraints of the of the social media company in terms of latency, and so on, stay the same—basically all else being the same, we should expect these models to continue to just get better and better as chips get better and better. Let alone any advancement in the design of the algorithms, or what we're training on, or anything like that. Anish: Yeah, and all of this stuff is advancing together. I mean the chips are getting better, but even more importantly the algorithms… Meta’s ad-ranking algorithms are getting better much faster than they used to be. [Meta is the company which owns Facebook, Instagram, and Whatsapp] Adam: What kinds of ads are served to what kinds of people and these kinds of things? Anish: Yeah in terms of ads and feed; in terms of finding the most engaging thing. To make this super concrete: I remember Meta stock dropped a bunch after Apple removed a lot of telemetry from iPhones to make it harder for Meta to do this sort of ad targeting, but actually meta kept growing their ad revenues per user. They just improved their models to the point where like they didn't need that additional telemetry. They just had really powerful models, that were really good at showing you the next thing to keep you engaged, and the next thing to keep you spending money on Meta’s partners. I think the scary thing is that, as we scale up from these, what we call deep and wide models, these recommender system models that we use for social media to LLMs becoming a more important part of our daily lives. When you train an ad-recommender model, it's a small model where you have a pretty good idea of what you want already. So you're like, ‘okay I'm going to train for engagement, and for ad revenue right? That's a reasonable thing to train for. It's not aligned with our our values as humans. We have better things to do with our lives than spend money on things that are advertised on social media apps, spend money on TikToks, Adam: But, you know, ads have been with us for a long period of time. They do enable some things. Sorry, I didn't expect to be making a defense of ads here, but like I can see an argument where it's like you're just connecting sellers of products with people that would actually be interested in buying them if it was brought to the attention of them. Then like yeah, it's hyper capitalism, but at least people are getting shit that they want. [In futher defence of ads (within a context of often fair criticism): Google (including services like Maps, Gmail, etc.) has also delivered an enormous amount of value to people globally, despite consumers almost never paying them money, and this value is clearly far in excess of the value lost by those same people sometimes selecting a marginally sub-optimal supplier because that supplier paid more for adsense than their competitors]. Anish: Selling to willing buyers at the fair market price. (laughter) No, I mean I don't think this is that bad, right? I think there are concerns around what sort of engagement you have to drive in order to do this, right? In terms of ‘what is the thing that shows up on my Facebook feed that keeps me scrolling, that keeps me seeing ads’ is often times just the thing that makes me the angriest, right? Adam: [The feed] is like: angry, angry, angry. Oh, here's a nice prophylactic to help you feel like calmed down again. Buy this. Anish: Yeah. I mean, it's not quite so bad. They do serve you content that keeps you like, you have to be exactly the right level of angry, right? If you're too angry, you close the app, you're out, right? So, you need to have this video game-like addictive. Look, I'm saying all this, but like actually I think this is the good case, right? I think the Facebook social media… like this is the thing we have today. This is the world we live in. The world we live in has problems, but it's not that bad. I think the LLM world is potentially much scarier, because as we're training these engagement models… we sort of understand how hyper-capitalism works. We know how to deal with it. Adam: I think of it more like: if we paused for a generation or two you, then I think we'd [adapt and] catch up all right, you know? Anish: Look I think there's a lot of problems being caused by social media, for sure But I think they're problems we understand. They’re not existential. They're problems we understand and that we sort of have an idea of how to solve. Part of the reason is because we understand the motivations, right? We understand that this is engagement maximizing. This is revenue maximizing. These are these are institutions that we've have been with us for a long time. They've just been accelerated by AI. But when it comes to LLMs, the actual goals you're putting through them are very subtle, right? This comes back to the reward hacking stuff we were talking about earlier. When I tell it, ‘hey, please make my code work’, it's learning all sorts of crazy behaviors in order to Galaxy Brain and make my code work, right? It's not just saying, okay, I'm going to fix this and then I'm going to like, you know…. There's a little bit of a genie, but the simple genie is like, ‘oh, I'm going to make your code work and it's going to be it's not going to be very useful, or it's going to cause your downfall in some way [e.g. poor code executing erroneous market trades and costing millions]. With the amount of training compute we're putting into this, what we're really saying is, ‘make my code work at all other costs’. If you want to like completely… okay your code works, but now I've taken the part of your business application that actually makes money and thrown it out, right? This sort of scales right? As these become more politically important right, it becomes like ‘okay like make my society work’ and the model says, ‘oh well, you know, there's this ethnicity that I don't think is…’ and you can't have that approach to people, when you have that approach to code. Adam: This is like the classic Eliezer Yudkowski p(Doom) kind of stuff in a way. [i.e. Theories that misaligned AI constitutes an existential threat to humanity] I also see a more mundane scenario in which you just get a sort of hyper-Orwellian society, but where nobody's actually necessarily [driving it]. There's no Big Brother but the AI itself. You know, harmful information is automatically censored. Your ability to think is, at least in part, outsourced. Maybe we become like Cyborgs in a certain sense, even more than we already are. And the world is so complex that you need an AI to help you interface with it, because everybody else is using AI, and like there's enormous amount of information out there, and you need something to like help you sift through all of this text data and complexity in order to be able to interface with it. And that interface is something that's quite politically motivated, and in ways that we don't necessarily even understand. Even the guys at the lab training these models don't appreciate the full complexity of, and the implications of what it is that they've created, and the kinds of incentives that they've imposed upon this model. Thus you get this thing where, like you say, maybe there's some edge cases there where it's encouraging genocide, or another side where anything that's even remotely what we might consider non-PC is impossible to express at all. Anish: Yeah. I think we're saying a lot of the same things. I just don't have quite the way with words that you do, so I have to lean on more dramatic examples. [Too kind by half] But the reason I bring up genocide at all, right, is because this is something that social media has already been implicated in, right? When you look across Asia, there have been a number of pogroms against different minorities, and these have been propagated through like WhatsApp groups, right? Adam: Right, and you might think about classically what're often considered the first social media revolutions in the Middle East, North Africa. Anish: The Arab Spring, yeah. I think LLMs take this process that's already going on and it just makes it worse. Adam: Sure. So I think there's a probably a lot that's been said about this online. Maybe only within very niche circles, but from the perspective of the labs themselves, or alignment researchers, people here in Silicon Valley talking to one another, and I think that's it's all to the good. But is there something that an institution like a government, beyond supporting my side of the debate or what have you. What kinds of structures should like a society like New Zealand, let's say, have? It's not going to be having any kind of [global influence]. It doesn't really matter what regulations we pass, it's not going to change incentives here in Silicon Valley, but it might enable us to get through this period with avoiding some of these outcomes. How should we think about these problems? Anish: I think this is this is like a really difficult question. I think we really need to study. I think the most important things these sorts of institutions can do right now is to study the problem. To say, ‘how will our society change with these more advanced AIs applied to them?’, ‘what can we do to build systems around managing these consequences?’ There's no OpenAI or Anthropic in New Zealand for them to regulate, but there is the possibility to do groundbreaking research, to do AI research. Singapore has their AI Safety Institute right? They're funding a lot of AI research that is allowing them to understand, ‘how do these incentives show up in these models?’, ‘how do you end up with models with misaligned incentives’ and ‘what do we do about that?’ I think there's there's an interesting question here about ‘what can you do if you're not at a lab, right?’ If your country doesn't have a lab, or if you personally are not a lab, right? I think it's underappreciated the degree to which small advances in AI can make a huge difference. If you look at these pre-trained models, right? We went from GPT-, GPT2 and Palm, Palm2 on the Google side, and then one day we got ChatGPT. What's really happening here is there was an advance. You have these pre-trained models, and they they cost all this compute, but really the thing that made them useful to people generally was Reinforcement Learning from Human Feedback, was InstructGPT, all these little pieces that allowed the models to actually become useful. I think in the same way when it comes to smarter and smarter models, the more powerful that they become, the more important it becomes to have ways of getting the models to actually do what you want. So if we were just stopping at AGI, right? Then the model simulates the smartest human that's ever been on the internet, and then throws it at the problem. That's a reasonable model of AGI. As you get smarter than that, as you approach super intelligence, you actually have to convince the model not to sandbag. The model believes that it's got a compressed understanding of the internet, and it can summon simulations of humans. This is getting a little sci-fi, but I think this is the best way to to think about it. It can summon a simulation of a huma. It can pretend to be a person on the internet that it understands, but what do you do about super intelligent problems? What do you do about problems that are too difficult for any human? This becomes a capabilities, as well as an alignment problem. Getting the model to do what you want means make doing something that is good for society instead of bad for society. Good for the user instead of bad for the user. But it also means convincing the model to actually do the damn thing. This is the sort of research that you can do at any scale. You can fund a lab in New Zealand that works on this thing without needing to to do a billion dollar training run. Adam: Right, so [a minimal output might be something] like instilling local values in research in this kind of space. So maybe we dial a little bit back on the march to becoming culturally American. Anish: Yeah. I think it's good to imbue models with understanding of the local culture, the local values, but I also think that just any values at all, right? Right now it's like incredibly unclear how to align models, how to get them to respect your values. It has some understanding of values it gets from the internet. Just to go back to the bit… we were talking about refusals earlier, and I was like ‘I'll come back to this because it's important’. It is important, because I think a lot of people see the model refusing to like, ‘okay don't build a bomb’. This is something that we reinforce in RLHF. This is something that we tell the model, ‘hey, we really don't want you to tell the user how to build a bomb, that's going to be really bad for OpenAI or for Anthropic’, but if you ask a pre-trained model a controversial question it will often already refuse even without the labs post-training. Why is this? This is because it's trained on all the text on the internet. If you ask someone on social media, ‘hey, what should we do about affirmative action?’, or what we should do about this like political hot button issue? The default answer you're going to get is like, ‘I'm not touching that with a 10ft pole. Go talk to someone else.’ I think this is fine on some level, but once you start to involve these in politics. There's some speculation that the Trump tariffs were a result of an answer from a chatbot, right? Adam: I think I could reasonably confirm that the use of LLMs in the Trump administration is quite widespread. In terms of designing policy. Not merely as an assistant, ‘oh, sense check this for me’, what have you, but drafting actual legislative text. Anish: Yeah, and when you have models drafting text, you really want to make sure they reflect your values. And this isn't just, you know, ideally it would reflect everyone's values locally. I'm sure you want to get New Zealand values in the model. I want to get American values for these models. But I mean, just like human values and all, right? When you train these reasoning models on like fix my code, right? Fix my code isn't a human value. Fix my code is this artificial thing we've made up because we've discovered it makes the model smarter, but it's also inserting all sorts of crazy values that just happen to make it easier to fix code. The All Seeing State Adam: Sure, so just coming back to the example I was talking about earlier. This is a slightly different issue I think but we've seen this trend in the history of states generally that governments, states, want to understand what it is that's going on within the territory that they control, and this is a very reasonable thing for them to do. It happens at the level of local government. We want to understand how the transport system is working. We want to understand where people are living. We want to understand all kinds of things about, like, where water is flowing, how people use a particular like public space that we've provided, so that we can make that better. There's quite a famous book called ‘Seeing Like a State’ by James C. Scott, talking about how regulations and tax regulations over time have shaped societies. There's these classic examples of… Anish: The English windows. Adam: Right! It's difficult to understand, up until relatively recently, exactly how much income people have, and so an income tax is very difficult. But it's really easy to understand that live in houses, and we can put a tax on that, and you end up in a position where, to a first order approximation, at a particular time when the policy was invented, people's wealth, and thus their ability to to be taxed, was related to how many windows their houses have, right? A really big house is going to have lots of windows, and with the same size house, if somebody's got a lot of windows, glass is expensive so they're probably richer and so we'll tax them more. Then what you see is a trend towards people building houses with fewer and fewer windows until the policy adapts. Where I'm going with this is you're in a position where the state has become better and better and better at understanding what it is that's going on, but at the same time, I think a lot of our legal code is implicitly built under the assumption that there's a human in the loop somewhere. We have jaywalking laws that say, you know, let's say you anybody crossing with a road within 200 meters of a designated crossing—whether it’s a crosswalk or a pedestrianized set of lights—that's illegal and so here are the penalties. As a society we accept that there's a very great difference between [on the one hand] somebody potentially causing an accident because they're running across a busy road within 200 meters of crosswalk—cars are braking and it's causing chaos—and like hey you know quite reasonably that person should be fined, that that kind of behavior is illegal. [On the other hand] the quiet residential street where there's literally no cars, a fine would be utterly absurd because it's behavior that literally—people underestimate the degree to which every single day people break the technical letter of some law in some way. But we have this layer of like, ‘well yeah but if nobody sees it's probably fine, because you're not harming anyone’. I think one of the things that AI is doing is making everything that is going on much more legible to the state right? So you can end up in a position where it's like okay we we know post-Snowden that absolutely everybody's data is ingested into some NSA [database], what they're texting to each other, everything that's happening online, but up until relatively recently, there's just way too much data there. It can't all be processed all at once. My point here is not so much that—the questions of the security apparatus and the security state aside—I think that we're going to need to update our entire legal code just to cope with the fact that we're entering into a world where absolutely everything is not only collected—because we're putting sensors out everywhere—but also that you can run this [AI processing] layer and understand what people are saying, where people are going, computer vision models that can watch every single road potentially and, you know, capture every single piece of jaywalking that occurs. So you either you either update the law, which I think it needs to occur, or every single person is fined in every situation, which is going to be like untenable, or you introduce some human back into the loop, but in an abstract way. So instead of a police officer using human judgment at the moment of when the incident occurred going, ‘ah yeah it's probably fine’, instead you have somebody in a bureaucracy somewhere selectively deciding where the law should be enforced, where it shouldn't, which I think is extraordinarily dangerous. What's to prevent that person [or group] selectively punishing political opponents, or people that they don't like, or based on the basis of race, or whatever, in plausible way. You see what I mean? Anish: Yeah. This is an incredibly rich topic. There's so much to say here. The quick point I want to get across before we dive too deep into this. I think this is actually very related to what I was saying earlier about bureaucracy at scale, right? When we're doing software engineering, we're taking forms and filling out other forms. This is actually very similar. We've made this very concrete. The Social Security Administration, is bunch of forms that are filled out by hand. They can't digitize it because there are bunch of special cases that are not written in the letter of the law or the letter of the forms. When it comes to our judicial system, you've got a jury of your peers that's making a decision. This is a decision that people are making. They don't always have to follow the letter, right? Sure, people get very upset when you mention jury nullification, but an important feature of our sort of judicial system where you have a jury of one’s peers, and judge that makes a decision, and a prosecutor that decides whether or not to prosecute, is that if the law suggests something completely heinous—this doesn't always happen, but there's so many opportunities for people to be like, ‘hold on we're not doing that’. You know, the law says to do it, but like the buck stops here. Adam: I refuse to convict. Conscience, and in a way human values are… Anish: Exactly. Human values are embedded in every piece of the system. Look at the US Constitution. It's being reinterpreted. Every court has a slightly different interpretation of what all these words mean, and what that means for, how we should run our society based off that. There's some exegesis that says ‘this is kind of what people meant when they put it down’. I think if you took the framers of the Constitution and teleported them to 2025, I think they would be amazed at the way it's being applied. But obviously it has to adapt... Adam: Yeah, it's no bad thing, right? Like you can argue, oh well, political…. Anish: I don’t know. I'm not sure if it's a bad thing. I really haven't figured this out. I think this is a deep question. (laughter) Adam: Okay. Well, maybe we'll save that one for the next podcast. Okay, there's an awful lot there. I think, very clearly, people, institutions, governments, should be deeply considering these problems, getting teams together to be thinking about these problems. Whether we take one end of the problem in terms of actual machine learning research, or even just considering the implications locally of the kinds of problems that I’ve [outlined]. This is just one case of this legibility problem, where the way that that plays out given New Zealand's jurisprudence is very different from that of the States, and we're going to need a separate team on that. Hiring in 2025 So I'm keen to turn to somewhat more mundane matters. Just thinking about the models that we have today. These reasoning models, maybe capabilities that we might be coming across soon, you know, under development now that you'll be aware of. Basically, if we take today's models, and then we assume that the user experience is just going to get a little bit more seamless. These things are going to become a little more integrated, and let's also assume that there's going to be better accuracy, longer context windows, you can put more stuff into them. So this is a very mundane world. Not too different to what we see today. But on the other hand, I say it's very mundane, actually extraordinarily impactful in ways that I think people [are not yet aware of]. Maybe there are applications of these kinds of models… there will be all manner of things that in 10 years time will be standard, that today nobody's thought of yet right? We were recently at this Manifest conference. One of the talks here was from ML [machine learning] engineers from Substack and from Anthropic, one of the major AI labs, talking about the way that they've changed hiring as a result of AI. The focus in that talk was very much about software developers, and how deep expertise in a particular language is becoming less important as a result of these models. You can kind of translate expertise from one framework, or one language, to another much more easily than you you could in the past. You can imagine how this might work in prompting: ‘take this code, turn it into Python, or into C, or whatever it might be’. It might not be perfect every time, but like a talented engineer is going to be able to pick up, translate, and change it pretty easily. As a consequence of this, they were talking about how, increasingly, they're hiring less for deep expertise in a particular language, and more of [different] sorts of ideas like ‘you need people with a degree of agency’. You want people that can take some really high level objective, and then work through in their minds, ‘okay, how do we get to that objective step by step’ and then be able to execute on the full step stack of that without constantly coming back for approval or whatever. You want people that can just take something and see it, and go away, and come back once it's done, or if they meet some impassible roadblock. Then [they also mentioned] this idea of ‘taste’ as well. The ability to just sort of go, ‘Mmmm, that meets the technical definition of an answer that achieves this thing, but I can see also that it's going to have like implications for the company later if I just like pass on that crappy code that's like really slow. My immediate managers, they'll probably not even notice because the task is done, but like I need to do better because I like my own personal sense of taste is: this is not good enough’. So that's quite an interesting shift, right? Just in terms of a hiring market, and I just wonder if like you had any thoughts about this. Does this generalize you know is it I mean that I asked a few questions about how we might think about this sort of transition and sorts of skills sorts of people that we're looking for not just in terms of developers but in terms of decision-makers, institutions more generally, people who aren't working on like formal software, but of working in the space of, you know, changing things, and more classical things, civil engineering [etc.]. How do you see these things changing across society more generally? Anish: So, I think there's a couple ways I see them changing, and a couple ways I see them not changing at all. When it comes to what's changing, I think people need to be much more flexible, right? I think the flexibility is going to become much more important. It's going to be much more difficult to have this sort of like linear career progression where like, ‘I have a job, I'm going to do really well at it four years, I'm going to get a promotion, and then I'm going to keep doing that four years, and get another promotion, and so on’. It might be that this sort of linear career progression where you keep getting better at the job, and you keep getting paid more, I think we're starting to see the end of that era. I think people are going to have even less… I mean we've already seen in the post-2008 world that people have less stable careers, and I think this is going to make that a little bit worse. I think you're gonna have some expectation that you're gonna need to change hats quite a bit, and you're gonna do some job for a little while, and then you're going to be like, ‘ah well you know this all the interesting bits have been automated. Now you have a completely different job for a little while.’ That level of flexibility I think for the labour side, for the people who are looking for jobs, I think is something to keep in mind and strive for. Adam: Yeah. Just on that note, I'm personally of the view that there's actually an enormous amount of things that need doing in society, needs and services people want, and so on. I think the idea that jobs—at least given the models that I'm talking about, who knows in five years time we might get some total step-change in terms of the capabilities, but at least given what we're seeing today and what we might see this year or early next—I see actually more types of work, and more sorts of things happening than ever. Yet, like you say, we should expect the turnover of those roles to be greater because like, you come in, you do some particular task, you're using an AI to do that reasonably quickly, come up to speed relatively quicker, and you've got to be in a position to kind of grow, and adapt, and move across to different types of things as the economy generally changes and evolves more rapidly perhaps than it ever has, right? On the other hand, 100, 200 years ago, 30% of people were farmers. Today it's like 2%. We've automated all the farmers away. But like, actually, nobody really wants to—some people enjoy it, and I respect them a lot for it—but the vast majority of people today do not want to do the backbreaking labour of going back to farming, or going back to steel manufacturing type work, right? There's all kinds of other things that we've, as the economy has grown, we've moved into all manner of different types of roles. I think with a little more perspective, people perhaps would realise like, ‘oh, I would much rather be doing this than like breaking my body and having to like almost retire age 50 because I'm doing some repetitive task in like [classical] manufacturing’. So just a general defense there, I suppose, of AI automation, which I'm generally quite excited about. I think it's going to be, on the whole, quite positive for humanity but will obviously create winners and losers, and the turnover from what you're saying the turnover of those winners and losers in different sorts of areas is probably going to become more rapid as well. There’s a new kind of cast [mould] of job that's opening up and there's a lot of opportunity. It's really highly paid, and people move into it, but all else being the same, we should expect that that kind of role to sunset much sooner than it has like, let's say, 50 years ago. The Gig Economy Anish: Yeah. I think the people who really get screwed here are actually the people who don't have the impulse control to save. Honestly, the thing we're seeing more and more, we're already seeing this in the software engineering market, where like people are like you'll get a role and it's going to it's going to pay quite well. Adam: Like a million dollars a year. Anish: Yeah, but you're not going to be able to hold on to that role for very long because you're going to be automated away, or the market is going to change. So you're going to have a series of short roles that pay you very well, and then you need to make that money last for when you're not working. Adam: By that model, it's almost like the Uber of the economy generally. Anish: Yeah. Uberisation applied to the professional class, and to everyone else. On the hiring side, I think things are actually changing very little on the hiring side. Joel Spolski has this classic blog post on hiring devs, but I think it applies to hiring anyone. It's that, when you when you want to hire someone, you really just want two things. You want someone who's smart, and someone who gets things done. If you have that, then that's really all you need to hire for. I think that it stays the same. You want someone who's smart enough to understand the problem, understand how to solve the problem, and someone who has enough ‘get up and go’ to actually do it. Adam: So one of the things that is raised as a concern every now and again, but I haven't seen any firm conclusions of yet is just that: for people that are into their career in a particular field, understanding, ‘okay here's how I need to think about this problem in a way that's highly contextual, and here's how I might increasingly prompt an AI in order to help me solve that problem, those skills are often dependent on potentially years of experience in that field. Classically lawyers, before they become a principal, there's all manner of work required. You enter a law firm as a junior employee, and then somebody will tell you ‘okay, now please go read through these 8,000 pages. I'm looking for six points that are related to this case out of this enormous pile of documents. At the moment, or up until recently, that kind of role produced economic value because the principal, who's the one truly being hired, has a pretty good idea of the case, but the client's not going to accept ‘I reckon it's probably X’ (unless they're paying for that particular service). Instead, they're paying for the principal to be able to provide general advice to some junior associate, who's going to go through an enormous amount of work in order to extract some information, and provide rigor to the 5% maybe of time that [that information] might slightly alter the principal's decision. Law firms are a particularly extreme example, but I can think of where you might have an internship, or you might hire graduates into engineering roles, like in civil engineering where, again, the kinds of tasks that you're giving them are increasingly automatable. In providing that kind of economic output in days gone by, they're also being inducted into how bureaucracies work, where the loos are, and all manner of things that college, university, doesn't really train you for because it's all theoretical. You enter the workforce and you're starting from scratch in terms of ‘here's how things actually work’. The difference between formal process or what's written on a page, and what actually happens. These kinds of things that a graduate picks up by osmosis, and then 4-5-6 years on they're actually getting into a position where they can effectively navigate that world. I don't see exactly how it is that we solve this problem. You know, the world of that professional career progression where people can come at an early start, do work that nobody else really wants to do, and thus get like an education in bureaucracy or their field. If they're no longer producing any economic value, then a law firm that doesn't hire any interns is going to outcompete one who does—all else being the same. Then you've got a generational problem where suddenly the economy is no longer training people in those kind of high context roles such that they can then [make effective] use of the LLMs, and actually generate economic value later into their career (regardless of how quickly they [shift roles], how flexible they might need to be, and shift around). So it seems to me that there's like a period here that we're going to struggle. Anish: I'm actually not too worried about this. There are two reasons not to worry about this. One is maybe a little bit glib and it's, ‘if the automation comes fast enough, then you don't really need to worry about training the next generation because the AI will level up [and fill those roles] before the people you wanted to level up level up. So this is sort of fine [in this narrow sense]. I don't know. Who knows exactly? I think when it comes to training up people who provide zero economic value, we do this already. Junior engineers, their first couple of years in industry provide negative value to the team. Not only do they cost you money, but also often times when you hire a junior engineer, they're actually creating more work for other people than they are actually getting done themselves. It probably doesn't work necessarily this way in every industry, but [certainly] in super heavily knowledge based industries that don't have a lot of manual work. Law firms are an example of a knowledge based industry where you have a lot of manual work for the juniors to do. That is somewhat academically valuable, but in software engineering you get a junior, they don't know what they're doing, they're going to make a bunch of mistakes, and you just need to fix them and teach them how—onboard them basically. We figured out how to do it for software. I think we can figure out how to do it for other things. Adam: Can you just dive into that a little bit? Why does that work for software? Anish: Yeah. The idea is that the expected value of this engineer, once you train them, is going to be higher. Now we're not doing that so much anymore. We're not hiring junior engineers. There's a couple of reasons why. One is, yeah, we don't need them as much. But again, they were never really providing that much economic value. When we replace them with an AI, we're not missing out that much. But the reason we're not hiring them is because we don't expect to need them. We expect the mid-level engineers they would evolve into are going to be replaced faster than we need to create them from junior engineers, right? So we were talking before the podcast about how everyone talks about software engineering as the one example because it's the one they're familiar with. I think this also applies to like sort of more classic craftsman, right? Like if you're an artisan of like fine wares, you're building, you're like a potter, right? Today, or even back in the day, right? If you were an artisan, you'd have an apprentice potter, and the first many pots that the apprentice potter makes are shit. You can't sell them. You've just used up some material, right? That's fine, you're training your apprentice. Adam: But they're also providing value to you, right? Probably quite a bit, because the majority of what you'd be getting to do is not throwing [the clay], but rather like cleaning, and sweeping up, and going and meeting the salesman of the clay, and then like hauling back the clay. Tasks that you could do yourself, but you just don't necessarily want to. So even in this position where, if you were an accountant [analysing the value of the role], and you went actually this person, they take longer for everything, blah blah blah, but the master is now in a position where they just don't have to do things that they don't want to anymore, you know? Anish: Yeah, there's this old joke about interns picking up coffee and look, in the next two years, I don't think it is going to advance to the point where it goes to Starbucks, picks up your coffee order, and brings it to your desk. There's a bunch of stupid shit basically, right? You hauling the clay, all these like physical world tasks, if we're talking about the next couple years, that aren't going to be automated in that time frame and you still have the intern do these sorts of random labour tasks. Adam: Okay, so let's imagine though the world that I talked about earlier where AI like stabilizes but maybe continues to improve on like certain dimensions around accuracy and context and so on. Anish: So we freeze at 2025 AI. Adam: Yeah or like February 2026, we kind of get to there, right? So I take your point that there are still a whole host of niches, at least in your term, but what's the long-term equilibrium in this space? Like you're talking about how people aren't hiring junior software engineers. Yeah, there are still some positions for like interns to do coffee or whatever, but there's less people on the tools learning the tools of the trade. So let's say we freeze at 2026 AI. What happens in 2040? Do we just have some collapse [due to skilled labour shortages] or? Anish: Actually I think you start hiring juniors again. If you freeze. If the freeze becomes apparent, the reason we're not hiring junior engineers is because we expect this to continue. So I think if you freeze at 2026 for a couple of years and it's like okay we've run into the next AI winter, we're 5 years into the next AI winter, we know that this isn't going anywhere, then we're going to start hiring junior engineers again because we need to train them up. Adam: Okay. But consider though what you were saying earlier, which is that like all else being the same, people should expect to need to be more flexible, right? Because the ability to deliver value, come into a particular place, you know, there's some new startup doing something novel. People should, and we're already seeing that, expect to be come up to speed faster, deliver economic value faster, and essentially automate themselves out of their task faster, right? This is already the case. So yeah, I have no doubt there’s some degree to which the hiring pause—there was also quite a bit of, as I understand it, overhiring during the days of 0% interest rates and so on. We’ve seen a lot of layoffs after massive [over] hiring. Anish: In the US this is also response to tax policy right? We had this R&D tax change. Adam: Yeah so there's some ephemeral changes in the economy at the moment, so we're seeing less hiring. But, all else being the same, it seems to me the implications of what you're saying here are: if you're in a position where your general model of the economy and like particularly knowledge work is you come in, you work for a year or two, and produce enormous amounts of value, and then maybe you automate your job, you're out of work for a while, you're pursuing some side projects. GDP's higher, everybody's wealthier, this is great because you've got more leisure time to pursue side projects, whatever it is. Yeah, that's all great, but, considering the case of the junior software developer, even just in that world—let's say 5-10 years ago, you hire a junior engineer with the idea that in four years time they're going to become productive in your company. But, if we're in a world where the window for any one person is shorter than that. So even if we go back to the world where we need… ‘oh gosh, there's a big shortage of mid-tier engineers’ because we didn't hire them properly, we didn't hire enough juniors. There's still I think maybe a missing gap there, where the economy as a whole is just not incentivised to hire juniors in the same way that they once were when the median length of a role was significantly longer, because it took more time to spin up and so on. Anish: I actually I do sort of envision this working out. The way I envision this working out is that—I hate to use software as the only example— but if you have a bunch of like tiny boutique software firms, right? Like the idea is that like I've got like a four person software company, say, right? I don't need to say. I literally have a four person software company right now. But imagine I'm doing something like making custom software right? What happens is that the company gets jobs and goes, but we're working together the four of us. Then the most junior trains up a little bit on every job. So the income is inconsistent, and the jobs we're actually doing are inconsistent, but we're working together as a team [over the longer term]. So the human element is still [powerful]. I think you're still going to have people who group together, and you're going to have people who work together over a long period of time, but they're not going to be doing the same thing [the entire time]. Adam: Okay. So let's imagine the case of a civil engineer of some kind and you've got, by analogy, this boutique firm doing small-scale developments helping do the engineering work for small scale developments of like housing or some like commercial properties or something like this. So where does the junior fit into that space? Anish: Yeah so, you've got a junior in the firm that you're training up that you want to… I mean, I think one nice thing about… I think socialists hate this, capitalists love this, where as a small business owner, you hire someone, you train them to do your job, and you're like, ‘Okay, now I own the firm, but this guy does all the work.’ Adam: Right. So the turnover in this case is by the founders making themselves obsolete through the humans that they've trained. That's the incentive structure. People are moving on faster than ever before like exiting from the firm or whatever it is, but they're doing that having extracted savings at a higher level of economic value than they did previously. [Hopefully the idea Anish is expressing here is clear in the edit/transcript, because clearly I was on a totally different wavelength to him at the time] Anish: The turnover I was referring to earlier was like for the firm as a whole, right? It's like: I've got a four person firm, and it's doing a bunch of different jobs. Each subsequent job this firm gets is going to be more different than the previous one. They're going to be more spaced apart, but they're going to pay more before they’re disrupted. Adam: Right, then they're being disrupted by some other four person over here who's using AI better, or something. [But the people stick together through each round of disruption because they like each other, and are incentivised for all the reasons mentioned above to train young people and so on] Anish: Exactly. The economy is going to keep changing, but one of the most important things we do as humans is work with other humans, right? So when it comes to labor. Well, labour or not, right? I think the being able to form relationships with people who you work well with is is still going to be important, right? Actually, there's, you know, this fantastic Nobel Prize winning paper from Ronald Coase from the 1920s that talks about optimal firm size, and how optimal firm size changes with transaction costs, right? So you have this transaction cost of how do you… I mean, it's not exactly transaction cost but you can see the analogy here, where any two people working together is going to be difficult. So if you want to hire labour at an individual level this becomes much harder because you have to figure out ‘okay what are this guy's skills, how good is he?’ You have to do the whole hiring process for him, and then you have to do this for every individual on your team. Then you got to negotiate the contract, get a lawyer to draft the contract, and so on. Adam: So the more that systems become capable of dealing with those kind of transaction costs we might think of like: suddenly you need HR, suddenly you need like a payroll officer, and so on. The more limited firm size becomes, the higher those transaction costs, and if those can be brought down then actually you can hire more people, is that? Anish: Well, the higher transaction costs mean that you need to form bigger firms so you can have fewer transactions. Adam: Oh, I see, between firms. I was thinking within firms. There’s some interesting [tension and] dynamics there. Anish: In the limit, every person is his own firm, right? This is the naive consequence of marginal economics. ‘We should have infinitely many firms competing on infinitely made products, and then every person negotiates for his labour for task to task.’ You show up in the morning, and you have a negotiation with your boss. Adam: You're like, ‘Okay, today I say “I'm gonna take like a 30 minute poo this morning.”’ Anish: Exactly. It's like, ‘why do you not live in this world?’ Adam: We kind of already do in some respects, right? Like you know what I mean? Anish: Yeah. On Uber you do, right? Adam: No, but even I think in existing institutions, in a certain sense. This already occurs, but instead of happening monetarily because the, shall we say, the transaction costs of constantly changing salaries is too high. Instead, you end up in a position where at the extreme end you get this like ‘quiet quitting’, but also, ‘Oh god I really hate this project I've been put on at the moment, and they don’t really pay me enough to deal with it like properly, so I'm just going to put it on the back burner and play World of Warcraft half the time’. There’s a friend of mine in Sydney who basically does this. He's in a position where he's just like—sorry I don't know if I can say too much without doxxing this person—but basically he's well paid but, for one reason or another, the firm hasn't really incentivised him to fully exploit his economic value. I think his manager is almost certainly aware of the fact that it's like, ‘yeah, we could get a lot more out of this person in theory, but if we started pushing him an awful lot more, then he would go and find another job that was either the same level of cruisiness for the same salary, or a much higher salary for the [level of work now being asked of him], do you see what I mean? So there's quite a bit of calibration, particularly post-Covid, between the amount of work that people do and their salaries. This is negotiated already, let's say, roughly speaking week to week, or month to month. Certainly year to year through like performance reviews and so on. Anish: Yeah. There's there's all sorts of different ways in which you can think about transactions, right? If we think about transactions in the classic economic sense, the thing that we see most from this theory is really like Uber, right? You are literally negotiating every ride. That's because the transaction cost—because of the Uber marketplace—has dropped so much. When we talk about sort of playing World of Warcraft at your desk, right? I had a friend at Google who I was like, ‘Why are you using a MacBook? The servers run Linux. You should run Linux on your laptop.’ He's like, ‘No, so when I joined the team there was a senior engineer on the team who told me, “Look, you're going to spend most of your time watching movies, so just get the laptop that's best for watching movies. If something horrible happens and you need to write some code, you've got a desktop.” Adam: So basically Google's purchased an option on his time. Anish: Exactly, and I mean that's another way… in Silicon Valley [the satirical TV show] there's a scene where a bunch of people on the roof at Hooli, because Hooli is still paying them, because it's like ‘we might assign you to something eventually’. This was how I felt like Google worked for a while. You'd have a bunch of people that were just on the payroll in case we needed them, and then every so often you get assigned to something. Adam: Sure, and in part as well so they didn't end up working for a competitor. Anish: No, they're fully on the payroll. Adam: I'm saying part of the reason they were kept on the payroll is so that they don't end up going work for a competitor. Anish: Yeah, absolutely. Adam: So again, there’s something of an option purchased. Anish: Yeah. Then, of course, this reassignment happens via a bit of negotiation, right? If you get assigned to a really cool project, you deliver a lot of value, you deliver like 10 years worth of your salary, 20 years, 100 years worth of salary to the company, and they're like, "Okay, you're gonna get promotion, even though you didn't do any work for a year.’ They put you on a project and you did a hundred times your salaries worth of value, that's still worth it, right? They want to keep you around for the next time. To be clear, this isn't all software engineering jobs. There's a lot of people—especially at Meta—who are working all day, every day. This sort of ‘option on your time’ is more of a Google thing, but I don't think that they operate that way anymore. Adam: So maybe it's not indicative of some broad trend as a result of AI. Legibility Anish: Yeah I don't think it's an indicative of broader trend as a result of AI, but I think it gets to the core of this question about legibility. To some degree AI has already made things more legible, but when it comes to developers, when it comes to laws, there's actually two [opposing] forces pushing. It's like a U-shaped curve. It's like an upside down U-shaped curve where you add a little bit of AI, and you get to process all this unstructured data and gain some legibility into it, but you keep adding it and you end up with weird stuff like chain of thoughts in these reasoning models. Sometimes they switch to Chinese for three characters and switch back. I don't know why that's happening. Nobody knows why that's happening. So the more you use AI, the more you end up looping back around to the other side and like, ‘I don't know why the model's working. I don't know why anything's happening.’ Adam: Right. Then thinking about this at a societal level, do you end up in a position where…[starting] at small scale like, ‘okay this institution is doing something. I don't necessarily entirely understand it, but I can use an AI to make it more legible to myself’ [e.g. by translating professional jargon and obfuscatory statements, or extracting noteable information buried deeply in large reports]. But then, as this becomes more prevalent, the institutions adapt. They begin to, like, you put in an AI request for information that's then responded to by an [corporate-speak] AI from the other side. So you've got AI intermediaries talking to one another increasingly, and legibility doesn't actually increase because it's just this AI abstraction layer on top of and between everything. So this gets to perhaps what will be my final question. Do you think institutions in general are going to become more transparent as a result of all of this, or more opaque? Anish: If we froze at current year AI, would end up, on balance, being a little bit more transparent. I think that, like you suggested earlier, it'd be a level of transparency we need to adjust to. As things keep going, I think we're going to get actually less legibility because as we get more and more advanced AI, we get more and more AI talking to AI. Adam: Beyond human [comprehension]. As we move from a smart human level, which you can kind of work through, and sort of understand how it's responded to your prompt. Instead, it becomes this kind of von Neumann-type character potentially where nobody really understands [the derivation of the outputs]. He could in theory go through his reasoning and outline how he came to a particular conclusion, but in practice he’s probably just going to give you some answer and you're like, ‘Ok, I don't know where that came from.’ Anish: Fundamentally, the problem is that a computer cannot be responsible for making a decision because a computer cannot be held responsible. Adam: Critically important I think not only like philosophically, but even practically I would say in America we recently nobody's going to accept, if you're a lawyer, like ‘Oh my AI was wrong’. Like, I don't give a shit what kind of models you're using here, like I'm paying you! Anish: When you're talking about institutions in America until recently I think ProPublica had a thing on this with AI sentencing, right? There was an AI that was [guiding] sentencing. A proprietary model that people weren't even allowed to view. There was some private firm that was just doing this black box, right? Yeah, so when it comes to race, it's like ‘Okay, well, how do you know that they're not just giving people sentences based on race?’ You don't know, right? You don't know what's what the model is actually doing. Adam: Unless you unless you look inside, and pay [Anish, and dmodel.ai]. Anish: It's justt, I think it's absurd to allow us to outsource the core human functions of society, of judging other people, of deciding how we should live our lives, to a computer [especially] that we don't have any view into how it's making these decisions and that don't have any accountability. Now I think, in the long run, if we solve all of these questions of like, ‘how do we get large language models to respect human values? How do we get them to do what we want, not what we don't want?’ How do we effectively work with these things in order to deal with the appropriate level of transparency? Then, I think it's very reasonable to use these tools in the long run. Ultimately, we need to be instilling the correct values, the values that we actually care about, into anything that we want to use, in these critical applications. Adam: I also think the corollary to that, in terms of instilling values into the models themselves, is also to be much more deliberate in the way that we use them. Instilling like values into the prompts, being much more conscious of like, ‘okay our capabilities in general are growing, as they have for many centuries, and so it only makes it more important to be thinking about, well what is it that we should be doing with them?’ We're going to have to leave it there, but Anish Tondwalkar, thanks so much for coming on. Anish: Thanks for having me.

Anish Tondwalkar - The Societal Implications of Reasoning Models

Episode Transcript

Never lose your place, on any device