·E883

Prompt Engineering Advice | CXOtalk 883

Episode Transcript

Prompting is the secret skill that taps into a is real capabilities, transforming large language models from flashy demos into engines of real world productivity.

Today on CXO Talk number 883, we unpack how prompting works, what it is, why it matters, and how to get it right.

I'm Michael Krigsman, and with me is Nate B Jones, a widely respected AI expert whose sharp insights and no nonsense advice have earned him nearly 300,000 TikTok followers.

We need to talk a lot about prompting for two reasons.

One, human language is fairly vague.

That's why we invented computer language back when we were programming computers in the 1st place, because it's much more precise.

And now we're using effectively natural language to program computers again, and that's challenging.

The second reason is that even though these models are very intelligent in certain respects, they are not incredibly reliable yet at inferring your intent.

If you are not precise about what you mean or want, they don't do that reliably.

They guess, and they might guess right, and they might guess wrong.

And so both because we have to get clear with our language and because models don't yet infer with tremendous precision, prompting is what bridges that gap.

So when we are prompting, we we're programming the AI.

This is really going to take you back, but like in the old days in the 60s with punch card computing, you would literally bring your little, your little punch card and, and put into the computer and you would run it and you would see in 20 or 30 minutes whether you got that right or not, or maybe longer than that if it was a big program.

We're doing exactly the same thing with natural language now.

We're handing the prompt to an inference model, maybe O3 Pro.

It does take that long, 20 or 30 minutes.

And we're going to come back and we're going to see if our little natural language program did anything.

It's fascinating how time is a circle in that regard, where we're back to where we started.

So the logic of prompting is effectively the logic of software development.

Is that a correct way to say it?

You could say it as like the marriage of software development and business intent.

So in a sense, software development has been predicated primarily on building interfaces that allow business operations to be conducted, business logic to be encoded, etcetera.

But now, because these models have the ability to sort of bring intelligence to bear, you're not just asking it to do 1 specific thing, You're not spending your time engineering a specific interface.

Instead, you're asking the model to think with you.

And so it's this weird mix of the principles of engineering with the business clarity of intent that has always characterized a very strong executive brief, for example.

As the models get better, what does that do to prompting?

Does it make prompting easier or more difficult?

On the one hand, you don't have to do some of the stage management that you had to do in 2022 and 2023 anymore.

You'll recall when ChatGPT first came out, the prompting guides were like, OK, tell it to pretend it's the best editor in the world.

Tell it to pretend this or that.

And then it began to sort of turn into chain of thought prompting.

Tell it to think step by step, do this and do this and do this and do this.

That stage management is thankfully no longer really necessary.

You can instantiate the model by saying you are in a particular space, like you're in a consultative strategist space, you're in a CFO space, whatever.

You can say that, but you don't have to like put in the adjectives and hope and pray that the model understands what you mean.

You can just say this is where we are.

You don't have to specify chain of thought anymore.

The frontline models know to use chain of thought when they need to.

And so in that regard, prompting has gotten simpler.

On the other hand, the importance of specifying what you're looking for, what success criteria looks like, what the constraints are, that's only gotten more important because these models are much more powerful.

And so before, like if it was a very simple ask that you had for a smaller model, you could go back and forth a few times and figure out what you wanted and it was fine.

But if you give something to a frontline model and like it's running for 6 minutes, 8 minutes, 10 minutes, 20 minutes, it comes back and you just did not clearly specify the scope, you're going to be frustrated because you wasted all of that compute.

And so in that sense, some of the some of the, the, the stage management and scripting that you're used to, you don't have to do anymore, but the importance of specifying the work very clearly has grown.

Like you have to really take that seriously now.

So you really need to match the prompt to the model.

A lot of the art of it is in figuring out what is this subject, what is my intent, what is the right model for that?

And once I have all of that figured out, now how do I craft a prompt and then bring in the context the model needs so it can do a good job for me?

For example, with open AIA, number of the models allow you to either include deep reasoning or research or not.

And actually other other companies as well other LLM same thing.

So give us some examples of this.

This is one of the things where model makers have not done a great job at the product surface of explaining what their models do.

For example, deep research is really a very narrow web agent that is trained as a research assistant to go out to look across the entire browsable web, it doesn't yet look behind paywalls and to come back with a consolidated view.

And they train it specifically on citation.

So it's good at citations.

It lists what it knows and why.

Open AI pioneered this with Deep Research, but Deep Research is now available on Perplexity.

It's available on Claude.

It's available with Google.

Lots of others have picked this up because it turns out that reasoning across the web is a lot of what we do.

And so there's just inherent value in report generation.

But people don't realize that all you're getting with Deep Research is the O3 model if it's ChatGPT specifically tuned to web search.

And that is different from whatever else you've been talking about with whatever model you've been talking about in ChatGPT previously.

So if you've been having a conversation with four O for a bit and then you turn on deep research, it's not that four O suddenly picks up a Cape and becomes a superhero and turns into deep research.

It's that you are invoking a separate agentic tool, getting a separate prompt in, starting a new flow, and then that report is going to come back and you're going to be able to continue the chat.

And I think that a lot of people don't think about it that way.

And it's become even more confusing in the last week because O3 Pro on the surface looks very, very similar.

It's got a long thinking time.

You give it a prompt, it goes away, it comes back.

And so people have asked me, did they just release a clone of Deep Research and rename it?

And the answer is no.

The answer is that O3 Pro is a generalizable model with a lot of different tool calls under the surface.

But precisely because it's under the surface, it's difficult to know that staring at the chat window when it takes a similar amount of time and comes back.

And so I think that some of what I do is just try and convey the nuances of these models and how understanding them with a little bit of a fingertip feel can shape the way we prompt.

Again, there's a level of confusion here.

I mean, I use so many different models every single day and I am on an ongoing basis having to kind of experiment.

You know, it's like this whole domain is very immature because the models are changing and the models give indeterminate results in any case.

And that means you keep having to adjust your prompts on an ongoing basis.

It's it's really a waste of time.

I think if it was a waste of time, we wouldn't be seeing the kind of tremendous uptake we see on ground groundswell usage with these models.

One of the biggest challenges with IT and security this year is shadow IT where people are finding these models so useful for the work that they do that they are using them even outside traditional IT security practices.

And in that sense I share your frustration.

I find that like when I am not getting what I want, there's, there's nothing more frustrating than sort of pounding my head on the wall and trying to figure out what the model needs to hear from me so that it can give me what I want.

But net, net, if I look across my overall productivity for the day, for the week, I am so much more productive now, even with all of that factored end than I was two years ago.

And it's because I'm learning enough about how to work with these models that I'm able to get a tremendous amount of value back.

And I think a lot of people are having that experience.

And maybe I shouldn't say it's a waste of time, although I do think it's a waste of time, but let's just say that there's a lot of overhead that seems like it shouldn't be there.

That's a really fair call.

That's basically a complaint and not that it makes any difference at all, because that's the nature of the maturity of these models as products at this point in time.

What I am curious to see answered by the model makers in the next probably 18 months is the extent to which prompting remains a durable skill set that provides tremendous alpha to people who know how to use it well versus the extent to which it commoditizes.

Not necessarily because everyone learns the same amount, but because models get very, very good at inferring intent across a range of prompts for the same subject, and people are widely divergent on what they think will happen.

My own view is I'm trying to take seriously the fact that I expected initially prompting to be a very one off 2022-2023 edge, and that's not been the case.

It's been stronger and stronger over time instead.

So I tend to lean toward the idea that at least for the intermediate term, prompting is going to continue to have a tremendous amount of value because that's what we've seen as a trend so far.

There are people who think that if we can get to a level of generalizability with these models, we will suddenly unlock a tipping point and we will find a way to infer very reliably where we haven't before.

And that might be.

And if that's the case, then suddenly prompting will become less painful and less needed somewhere in the next 18 months.

Subscribe to the CXO Talk newsletter so you can be part of our community.

We have amazing shows coming up.

What makes a good prompt?

Do you have any practical advice?

Number one, be really clear about the outcome that you are looking for and about how the model can know that it's done.

I think a lot of people will be fairly loose about specifying the outcome or they'll be loose about the goal.

They'll be very, very loose or non existent about how the model can know that it's finished adequately.

And the more you can specify and be clear about what you're looking for and what good looks like, the better off you're going to be for the rest of the prompt #2 you want the model to have all the context that it needs to do that job, and you would prefer it to not have any extra context that it doesn't need.

A lot of what we call hallucinations are effectively models reasoning outside your desired context window.

And so if you can be more clean and clear about this is what I want you to focus on in a web search or here's some documents I want you to review.

I want you to keep your thinking focused around this particular, you know, set of meeting transcripts or whatever it is.

It will really help the model to be confident that it's doing the right job and able to deliver a reasoned results that closely matches the kind of work you were looking for.

And so the context piece is another one.

And then the third is really making sure that you understand the the constraints and guardrails that you want to put around.

So if you have a, if you have an outcome or goal, if you have context, you feed it.

You then need to make sure that the model knows don't do this.

Where do I not go?

And I find that that is often one that people either barely put in or tend to avoid because we tend to be thinking in a positive stance of like, hey, this is what I want done.

Let me just give the task and go.

And maybe this is because we're anthropomorphizing models.

Anthropomorphizing models.

We don't tend to regard a senior colleague as someone who needs a tremendous number of warnings and constraints for a task.

We just say, hey, go tackle this.

I'm sure you'll do a great job.

Come back and let me think.

Think about what you get.

These models need those constraints still.

Even if they in many ways are very senior in their thinking, they still need helpful constraints so that they know where the guardrails are in the space.

And they don't start to reason off the rails into a direction that that isn't helpful.

Because at the end of the day, what they're really trying to do is just infer from your utterance what they think you mean.

Figure out where in latent space they can go and get a reasonable pattern match.

Do some searching across the web.

In the case of an inference model, do a lot of that iteratively so they can figure out what's best and then put together something.

And so they do need those guardrails to constrain.

Models are people too, and you can't.

Just as you can't expect your friend or your spouse to read your mind, how can we expect models to anticipate every possibility that's out there and map it to what happens to be in your mind at this given time, what you want when you write this prompt?

And that's the need to be explicit.

And that's why I say models are people too.

We as humans are very, very good at retaining long context from multiple conversations with our colleagues and extracting what's really important out of that and and getting to clear points of discussion.

Like I can talk with a software development manager about a project that's been going for six months.

We can have a really meaningful discussion about the sticking point decisions we've made in the past.

What we need to change.

That is how humans have done work for a long time.

We we iterate over time effectively, the prompt evolves through conversation over time.

It's shared work together with the model.

We can't have the same iterative conversation.

We actually have to front load all of that thinking and give it to it in a really clean prompt so we can get a really clean answer.

And I think part of what's hard for us about prompting is we're conversational people.

We like to chat just like you and I are chatting.

We make meaning that way, but the model needs us to sort of compress that semantic meaning into a, like a really clean initial prompt that will help it to work effectively.

I think this is a very important point that you're making is there is the need to, as you said, to compress the context into a digestible set of words and chunks that the model can then use to execute the explicit task.

Going back to an earlier comment, you made that in effect you are programming into the model, into the and driving the conversation through that programming essentially.

Right, because we humans effectively, collectively derive intent and collectively reach decisions through conversation.

But the model needs you to be the one that provides the intent, that provides the driving force.

There's a higher expectation of human agency in prompting.

Let's jump to some questions.

If you're watching on Twitter, pop your questions into Twitter using the hashtag CXO Talk.

If you're watching on LinkedIn, pop your questions into the LinkedIn chat.

And so this first question on Twitter X goes to Arsalan Khan, who says it seems like asking specific questions in your industry to the AI would deter fake experts.

But how would an end client know the difference between real experts using AI versus just good or fake experts or salespeople using AI?

And if I can restate that question in terms of prompting, if somebody is really, really good at prompting, can't they present the appearance of being an expert where it's almost impossible to tell them apart from somebody that has the PhD in whatever the subject might be?

That is more true than many of us would like to admit.

I think it's part of why there are so many, many consultants springing up, so many tools springing up.

The industry has a need for authenticity, but AI by its nature is enabling many people to claim expertise that they don't genuinely have.

And there's just like there's not a silver bullet solution for detecting text in student essays and saying who, who wrote which which bit of text.

There's also not a silver bullet solution for detecting expertise.

I find in practice, what tends to be most helpful at distinguishing a true expert from the sort of AI generated straw man expert is acknowledge the source material.

It's probably going to be very good because AI helped prepare it, it's very thorough, etcetera.

Make sure you understand it and then ask a question that's designed to push off balance, push out of the comfort one, and a true expert will be able to adjust and have an interesting and thoughtful perspective and not get too frustrated or flustered.

And someone who's depending heavily on the prompting is often going to struggle because they won't be able to actually have that flexible intelligence across the domain that characterizes true expertise.

So you're saying that there is a level of depth that the model doesn't, or that a person who's simply relying on the model doesn't have.

Is that another way of saying?

I'll give you an example.

So I was doing an article on prompting an O3 Pro versus O3, right, Because O3 Pro came out this week and I asked it to prepare a road map because I'm very familiar with Rd.

maps that came up through product management.

I've seen more of them than I would care to admit.

And I asked, I asked for that because I knew I could judge it.

I knew I had the expertise to assess it.

And I was talking with someone afterward and I was saying O3 Pro did a much better job on the road map.

And they were like, well, how did you know?

Aren't Rd.

maps subjective?

And I immediately pulled up 3 or 4 reasons why Rd.

maps are not subjective.

Why?

It's actually a craft you can understand and and 9 out of 10 experts will agree with you that a particular road map is better than another because it's a proactive stance.

It takes into account all of the strategic advantages the company has.

It thoroughly understands the marketplace.

I could just go on and on.

It's all at the top of my head.

And so having that expertise helps you to assess the true quality of model response.

And in a sense, what we're seeing here is that these models are getting to a level of intelligence where their very best work takes an expert to truly understand and appreciate.

We have some really interesting questions that are coming in on LinkedIn right now.

And Greg Walters is responding from the to the point you made earlier.

Nate, where you where you were describing the need for a compressed highly efficient prompt.

And Greg says this isn't the magic in prompt iteration.

Instead of having one compressed, highly efficient and explicit explicit task or prompts shouldn't there?

Shouldn't we be collectively prompting prompting?

It depends on the kind of task that you're looking for.

So this gets back to the relationship between prompting and model selection.

For certain kinds of models, they're more suitable to iterative thinking and iterative brainstorming.

We haven't really talked about the relationship between model and interface, but I find if I'm using advanced voice mode, it's just a very different experience for my brain because I'm talking instead of typing and I am much looser and it's much more conversational.

It is in a sense much more iterative and I keep it that way on purpose.

But if I'm working with a long inference time model, and it's not just that ChatGPT has a monopoly on those.

Opus 4 is a great example from Claude.

I want to be clear in what I'm looking for because frankly, it is expensive to iterate when the cycle times take that long.

And so I pick the problem and I pick the model and that guides me to a prompting style.

We have a really interesting question from Wayne Anderson on LinkedIn and he says this.

How would you address the fear that leaders using large language models could inhibit and erode decision making and critical thinking?

When does effective prompting help and when do you think leader should avoid using AI?

It's kind of like asking, do you want your doctor to avoid using AI?

If we have studies that show that medical reasoning is something that these models are very good at, I would love my doctor to use AI as long as my doctor understands how to use it well.

And so in that sense, my response is I want AI leaders to be using AI all the time.

I just want them to understand the limitations of these models and where they need to think beyond the edges.

And so really, I think it's more precise to say these are extraordinary models.

In some places, they are advancing the far edges of human thought and research.

We have AI developed drugs and pipeline, but they're narrow.

They have like particular ways in which we can prompt them that generate extraordinarily effective results.

And the strength of a good leader is not being only narrow, it's at that T shaped leader where you have that breadth of experience as well.

And So what I would look for a great leader to do with AI is to know when he or she needs to go to AI for a deep, precise, thoughtful perspective on something and then to bring that generalized experience of the business to bear to say this is how I would contextualize that and understand it for my broader problem set.

But let me just go back to the comment that Wayne Anderson made.

When I write certain things, I'll write something and I'll ask ChatGPT or whatever the model is.

What do you think?

And it will make suggestions and this new canvas feature, I guess it's not so new anymore of ChatGPT makes it really easy to like drill down to very small segments.

It produces good results.

But in the back of my mind, I'm thinking to myself, it's giving me kind of the least common denominator, mass market generalized.

Solution.

Not necessarily.

That might be your prompting.

And I think that's what's interesting about these models is that you, you are correct that if you're not intentional about how you frame the models position in latent space, it will default towards something that's more highly probable, which we often translate as the least common denominator.

If you are intentional though, and you want to lean in and say I don't want a mid answer, I don't want a common answer.

I want a really creative answer.

I want a really thoughtful answer.

I want an answer that you haven't heard or seen elsewhere.

Models are perfectly capable of going that far and thinking more creatively, thinking more substantively, but they don't do it by default because the way they're trained is to be helpful for as much of the population as possible.

And so in a sense, our own population distribution shapes the way the model makers are tuning these models for general helpfulness.

And so it's up to us if we want something more on the far side of the distribution to push for it.

We're drifting from prompting here.

Oh, you.

Do that with prompting.

It's prompting going to create the next, you know, set of Bach inventions.

No, I don't think so.

And I think especially in the creative arts, like I would say that like humans have tried, I've actually a huge fan of Bach in the cello suites.

I love them.

I listen to them almost every weekend.

And people have tried to expand on to invent after Bach even through the 20th century.

And in my view, no one has done for the cello what Bach has done for the cello.

And so no, I don't believe that we are in any danger of a machine coming along and doing a better job than Bach at Cello Suites.

Let's jump over to Twitter from Chris Peterson, who says tokens and time measures are all very well, but doesn't every round trip of prompting eat up more electricity and water for cooling, thus making some of the numbers from Open AI and others highly misleading?

No or yes it does, and yes it matters in aggregate, and yes we should talk about power use in aggregate.

I think it's an appropriate conversation to have.

But individual, individual prompt usage by people doesn't compare to some of the other things we do day-to-day that use energy and water.

So taking a hot bath is much, much more expensive in water than any kind of ChatGPT prompt you're going to run.

Watching an hour of football on the big screen is much, much more expensive in electricity.

Then I think it runs up to a couple of 100 ChatGPT prompts.

And so does it matter in aggregate?

Yes, because suddenly a billion of us are using this.

It's important, we should talk about it.

Not saying that we don't have relevant conversations to have, but I think the idea that an individual prompt is fantastically expensive is incorrect when we actually factor in the energy usage of a day-to-day life.

Let's jump over to another question.

This is from Chris Chablonsky on LinkedIn who says Do you have any tips for using Gen.

AI for a data analyst to process a large data set and generate visualizations?

I'm not sure if it's the right tool for the job.

I've sort of talked about this a little bit with folks who are managing large data sets.

And what I find AI extraordinarily good at is handling data sets that don't have clean numeric, numeric data, right?

If you have clean numeric data, we have fantastic tools for that, and they may include machine learning or they may be just traditional sequel, but we're very, very good at handling that efficiently with compute.

I don't know why we would switch that out and ask a large language model to do that when the language model wasn't even designed primarily to be numbers driven.

They use Python And other tools to handle numbers now, and that's great.

But if you're talking about a truly large data set, we have tools that handle those data sets and visualizations really effectively.

And what I find people using in practice when they're looking at large data sets and AI is they're using AI to help them craft SQL statements.

They're using AI to help them think through the data schema that they want to set up.

Sometimes they're using AI to help them prototype visualizations that they will want to get to quickly.

Claude is great for that.

And all of those are sort of, by the way, uses of AI that help you to use that data more effectively.

But that's different from the traditional assumption that you can just sort of type the query in and you will magically get a better answer than you would get with really efficient sequel.

It's a really, really good point that you've got to have an understanding of the particular tool that you're using and what will be the most effective use of that tool.

And as you said, prompts are great if you have a body of data and you're trying to figure out what have I got here and how can I present it?

And is there something that I'm missing?

Prototyping, as you say.

But there are tools out there that are designed for, you know, millions of records and that do it really well.

I don't think you'd want to put millions of records into ChatGPT.

It's not really designed, if you think about what we mean when we talk about context and prompts.

It's designed to look across the overall picture.

And oftentimes with data, we don't just want an overall picture.

We want precision.

And that's something that we have existing tools that do very, very well at.

Let's get into the structure, the nature of large language models, how they think, think in quotes and operate, and what that means for prompts.

Maybe take us down that path a little bit.

It's probably worth calling out that a lot of the difference in how prompting has evolved is being driven by this movement from large language models that are what I would call vanilla.

So that it's just coming back with a response based on weights and vector space developed through pre training data.

Which is what we had into 2024.

And then the newer version, which is inference time, computes models where they have that same underlying architecture, but at the time you press enter and send in your query, they are running threads in the background trying to figure out what the correct response is.

And there's different ways of doing that.

Sometimes it's a combination of expert models in the background that are sort of coming up with answers and deciding amongst themselves.

Sometimes it's running the same query multiple times in parallel in the background trying to find the most common answer.

Regardless of the underlying architecture, the effect of having more time to run cycles on your query is tremendous.

It's it's a night and day difference in terms of the intelligence that the model is able to respond with.

And so that is a lot of what has shaped different prompting.

A lot of the reason we don't have to give chain of thought instructions anymore is because the models already have a way of deeply processing the queries we give them when they are inference models and they don't need our help to do so anymore.

And so when I say you don't need chain of thought, but you want to be clear on your goals and basically saying try and write a prompt that understands that you are going to be running multiple parallel streams of thought in the background or multiple parallel streams of tokens in the background.

Constrain it.

Like if you if you have 10 that are going to run, you don't know it's time.

But let's pretend it's time for simplicity's sake.

Make sure all 10 are focused on what you care about because you want to constrain the scope of the query so that it's actually focused on where you want to go with a conversation.

And so that's why I emphasize so much.

Set a goal, make sure the model knows what good looks like.

Make sure you set guardrails, etcetera, etcetera.

Describe to us what you mean by a chain of thought prompt.

It's where you said I want you to answer my query to a traditional model using pre training data weights and it would come back and answer.

But you wanted the token stream to go through a particular sequence.

And so it's going to go through.

And from a transformer perspective, like the Transformers there, it's basically using your query, it's matching it in vector space.

Once it vectorizes it with what it has for weights and pre training data, it's coming back.

And you're basically saying, let me give you a deeper query with a lot of things I want you to think about and do.

So start with, this is who you are.

You're an expert on marketing.

Second, I want you to think very deliberately about this campaign that I want to launch.

Third, these are the steps of thinking I want you to go through.

First, develop a plan.

Second, critique your plan.

Third, understand the consequences of the plan in the market and you can kind of go through.

And that's like chain of thought, right?

When you do that, you're basically being very particular about the places in vector space that you want the model to go and hit when it's generating the response.

And because models read like humans do, they read top down, when the model hits that point, it's going to be effectively sequentially reasoning back to you because of the way you programmed it.

And so this gets back earlier in our conversation, Michael, when we talked about this idea of natural language programming, that we are effectively programming the model, that was sort of what we were doing.

And all we're saying now is we still have to program the model.

We don't have to program it quite that way anymore.

How do we program it today?

Today when we program the model, we want to be focused more on outcomes and goals.

And in the past it was focused more on process.

And so today if I'm looking for a report like I digested a, you know, 130, a 140 page economic report this morning from the world, I think it was the World Economic Forum, something like that.

It I I wanted the model to understand what I wanted out of the report and the goal of the summary.

I didn't just want a vanilla summary.

And so my focus was on making sure it knew the the angle I wanted on the report.

And I trusted it to know how to read, digest, summarize, think through all the things I would have had to specify earlier.

In that case, again, the the context, giving it the background and the goals becomes the key focus of the prompt as opposed to telling the LLM how to do its job.

That's right, we have another question from Arsalan Khan on Twitter.

Arsalan says to prompt or not to prompt?

When is it appropriate and when is it just a rabbit hole for your confirmation bias?

I think one of the biggest differences in the way people use models right now is people who are focused with their models can use the model as a mirror that focuses on a particular subject really effectively.

And there are people who are less focused and the mirror becomes a scatterer for them.

It scatters their thinking.

They become more confused as they use it.

And I've seen both.

What I find is interesting about the critical thinking piece.

Imagine the mirror, and typically it faces you right.

Then it becomes a reflection of yourself.

You're absolutely right.

There's no critical thinking there.

It's just coming back with confirmation.

But if you're smart, you can turn the mirror away from yourself and you can focus it on something else and you can come back with a disconfirming or divergent opinion.

And so I will frequently ask the model to fight with me.

I will ask it to disagree.

I will ask it to come up with a Steel Man argument because I think it's much more interesting and my thinking gets sharpened when I do that.

Actually I do the same thing.

I very often will say to the model, be very critical.

Don't worry about hurting my feelings, be sharp.

That's right, like an iron sharpens iron vibe is what I like to go go for.

Makes sense.

What's the best way to again craft that prompt?

We were trying to accomplish something.

Shall I show my screen?

Would it be helpful to just kind of take a peek at a prompt I wrote?

Sure, let's do that.

All right.

This is a real prompt that I wrote and this is an example of me picking something where I feel pretty good about sort of my overall ability to assess quality of response.

But I don't have a direct answer to this question.

And this was part of a sub stack article that I was writing to test O3 Pro.

So this is four O 3 Pro.

You can see it up there and I'm asking it to step through this analysis with me.

So think you're a senior product leader brought in to design a 12 month AI adoption road map for a real firm.

First I I could have given the model the choice of firm and I tried a separate prompt where I gave it that option.

That was very interesting.

In the end, I wanted something with a company that I was familiar with since I was working on it for testing purposes.

So I used Datadog.

I ask it to do some very specific information gathering.

So build the source corpus.

I want publicly available information, I want 10 KS, I want job postings, I want SEC FINRA guidance.

And then I want 3 responses.

And I actually specify the word count output and I specify what I want there, right?

There's a strategy memo first, there's a tech stack overview, and there's a regulatory constraints piece.

And So what what's interesting is by using the word internal, I am suggesting to the model that the model can craft these inside the chain of thought that it's running behind the scenes without me having to see it.

And then Step 2, produce.

Now I'm starting to ask for output.

I'm starting to ask the model to come back with one document with an executive summary, month by month road map, a KPI per quarter, anticipated failure modes and mitigations, and an advisor briefing.

And then I'm giving it styling that I want.

So I want it to be really brutally honest.

This is an example of not looking for confirmatory thinking.

I do not want tables, I want just bullets if need be and I would like.

To get a sense of what shaped your recommendations right, I want to know where you got some of this thinking from.

And then I give it a limit at the top.

It can't be more than 7500 words.

So it ran, it thought about it, it was a 6-7 minute, basically a 7 minute run.

So it chooses Datadog, which I specified, It does a little bit of basic research on Datadog.

It builds the source corpus, so it gives you a sense of what's in the box there.

Market contacts, Datadog's Edge.

It's starting to adopt the persona.

So it's saying where we lag and talking about sort of other updates in the competitive space, getting into growth goals.

And what I love here is that it actually called out like the statement by the CEO by by Olivier around what they're looking for and why.

And it's taking that into effect and taking that into account the way, frankly, a good road map builder should.

It's looking at client mix.

And this is a situation where it's done its own research to come up with that assessment and given sort of a very rough assessment of that.

It's looking at the AI aspirations it can find from each of the different C-Suite members.

It's looking at strategic gaps to close.

This is all just in preparation and hasn't even really started the assignment yet.

It's just kind of thinking it through.

It's now going into the current stack in great detail, looking at vendor contracts, security posture.

You can see where we're going here.

Eventually it's actually going to get to what it wants to say.

And it's actually a very cogent thesis.

It talks about how you sort of dominate the data exhaust space and what that means.

And then it starts to get into the the road map piece.

But my point here, like we could go through this, but we don't have time.

The the point is basically because I structure the prompt carefully, I got exactly what I was looking for back.

What if you're not trying to get a research report, but you're trying to do, say, small, small research?

Find out the answer to some set of questions, for example.

I did 1.

I don't know if I have it handy or not.

I will see if it's there may be emerging trends and investing.

Yeah, I think it's this one.

This is a much shorter report.

See that?

That's the whole thing right there like that.

It feels short comparatively and it was a very short ask.

Please analyze this economic report and I'm really interested in again, I'm trying to push it.

I want to understand emerging trends and I want to understand areas not commonly discussed, right?

I'm looking for it to sort of push beyond, but it's not a very long prompt per SE.

And then it just jumps right in.

It reads all 138 pages.

It gives me a snapshot.

It talks about commodities and where they're at globally, miss pricing issues driven by LNG, and it basically goes through themes it's seen.

And then at the end, it Nets it out like this is the big picture assessment for the next 6 months.

This is the macro assessment and this is how you start to lean in.

And what's interesting is this is much more specific.

I ran the same prompt with O3 and O3 Pro and it was interesting to see the relationship between the two because O3 focused on on this sort of bifurcation piece and O3 Pro had a slightly different perspective.

And I know we have like 8 minutes so we probably don't have time to get into it, but I thought it was fascinating to run a short prompt on both and see the differences.

We have a question from LinkedIn and this, and I was going to ask something very similar, which is this is from Laura Finlayson and she says with a prompt like this, which model will do the best at retaining the prompt information for future use?

She built and refined her job application prompt in Gemini, but it seems to forget some of the deliverables each time she goes back with a new job description.

Claude allows her to save a project, but she doesn't love its writing.

And so we need to talk about which model is the better model to use and how do you choose.

And we only have a few minutes left, but this, you know, we could go on forever here.

So what do we do?

There's two things going on there.

The 1st is memory.

And ChatGPT really has a killer feature edge with memory right now because they do have, it's not perfect, but they have a memory feature that enables the models to start to actually have a living context of information about other chats you've had inside the same model surface, right?

So if it's in ChatGPT, it doesn't matter if you're talking with O3 or 4-O or whatever.

There's going to be a loose understanding of recent conversations you've had along with some specific facts that the model has remembered about you that you can actually audit and check in the settings section.

That turns out to be very useful for problems like this where you want it to do a repetitive task and you want it to have a sense that it's done the task before.

Even so, I still find I want to be precise about each of the assets I needed to process if I need that.

And that is one of the reasons why I do tend to favor long prompts that I will keep in a Notion page or keep elsewhere that I can just copy and paste in as needed because I don't want it to forget anything.

I don't want to go to that trouble of writing out that prompt again.

I just wanted to remember every single thing and do it again.

And I wish that I had an answer for you, that these were going to be flexible deep memory models that just would remember that you did exactly like this and never forget that step.

We're just not there yet.

And so prompts are part of how we bridge that gap.

One of the problems that I have is I like to try prompts on different models to see the results and compare the results.

I think it's it leads to a lot of creative thinking and it becomes a a real burden and an obstacle because I've interacted for 20 minutes with Model A and now I want to go to Model B and I've got to start all over again.

There is a way to make that slightly less painful.

So what I like to do is if I want to transition, I like to ask the model I've been chatting with.

Could you please give me a very detailed summary of our conversation so far and make sure it's as clean and clear as you possibly can make it?

And then it will do that and it will give me a great summary of the conversation.

I then pull that summary into a new model conversation I'm starting and say, here's where we're at right now.

I would love to continue this conversation with you.

And this is what I'm looking for.

And it's still a little bit painful, but it's it's less painful than it would be otherwise.

Which model or which company do you gravitate towards the most?

You have access to everything.

What do you use the most?

The memory feature is one of the most powerful product features I can remember on ChatGPT because I find that the fact that it remembers something about me drives a recursive behavioral loop for me.

I'm very aware of it, right?

Like I've I've worked in product for a long time, I know what they're doing, but it still works because I find that having a model that remembers me a bit is super, super helpful.

And so Chet GPT drives a lot of interest for me.

O3 is a daily driver for me.

I pick it up by default, but that doesn't stop me from going other places.

Like when I am working on a complex piece of writing, I will use perplexity.

I will use both of the new Claude models, Opus 4 and Sonnet 4I will sometimes go to Gemini 2.5 Pro.

And so I almost look at those as like additional pieces that I want to go to for specific things.

Sonnet 4 is great for writing.

I love it.

Opus 4, I love the way it does like very thoughtfully considered reading and research.

There's something qualitative about it that's very strong there.

And so even though I end up in Chat GPTA lot, that doesn't stop me from reaching my fingers into the rest of the ecosystem and and grabbing what's useful.

I also really like ChatGPT in general.

They make it pretty easy.

And that memory feature, you know, you start, you start with your prompt and you feel like it has so bizarre to say this.

It has like this intuition of what makes sense for you.

Right.

And that sense of being recognized, I think is very powerful from a product experience perspective and people respond to it.

Marketers talk about personalization and usually personalization is well, we've watched their shopping cart and in the past they've bought XYZ product and so we'll recommend the next product that they'll really, really like or the next movie or whatever.

But we're talking here about a level of subtlety with personalization that's like light years beyond the typical marketing personalization that it that we know of.

It really is and it's going to be super interesting in the next 6 months or 1818 months to see how the product platform evolves for ChatGPT as they build on this memory feature and they add more you know new models, etcetera.

I, my sense is especially as they lean into the partnership with Shopify, they're going to learn more into commerce.

They're going to be opportunities for personalization with commerce that we've never seen before.

But we'll just have to see how that evolves.

Should I pay the $200 a month for ChatGPT pro?

I know you do, but should I?

Is it worth it or should I just pay the 20 bucks a month that I pay right now?

That depends on the kind of user that you are.

And so I have seen the article that came out, I think it was this week that basically said ChatGPT has done such a phenomenal job pushing value down the chain to the free tier.

Why would we pay it all?

Because it's it's so impressive.

And I think for a lot of average daily use, that is the correct assessment That is true for me.

I want, it's not, it's not even just that I want it so I can test it and show it to people.

It's that I want to have no token limits and no usage limits on the smartest models out there because I find myself doing better with my own brain if I have the smartest thinking partner possible.

And so we've talked a lot around the edges of what sort of thinking and intelligence means.

That's probably a conversation for another day, but from a economics perspective, if I have a thinking partner like that and I push the edges like that and it helps me make one or two better decisions in a given month, the ROI is off the charts.

At $200 a month, it's very, very easy to do that math.

And so I think it depends on what you're looking for it to achieve.

I want the best possible results from the thinking because if I'm spending time.

Pay the 200.

You'll get O3 Pro and it's, it's qualitatively better in a way that you will notice it.

It has a resonance to it where like the insights it has stick in my head and I'm like chewing them over in a way that I haven't had with other models, which is super interesting.

Well.

I'll have to try it.

Well, with that, we are out of time.

A huge thank you to Nate Jones.

Thank you so much for taking your time to be with us today.

It's so valuable for us when you're here.

It was such a delight.

I enjoyed it.

A tremendous thank you for having me, Michael.

I'm glad we got to talk about prompting.

I felt like we could have gone on for hours because it's such an interesting topic, but I think we really got to a lot of cool stuff over the course of this 60 minutes together.

And thank you to everybody who watched.

Now, before you Go, subscribe to the CXO Talk newsletter so you can be part of our community.

We have amazing shows coming up.

We have the chief technology officer of AMD coming up, and we have all kinds of amazing people.

So go to cxotalk.com, subscribe to the newsletter, and we'll see you again next time.

Thanks so much everybody and hope you have a great day.