Turner – Using AI in Medical Education

Episode Transcript

I like to always begin here with Alan Turing because I kind of think when Alan Turing first asked, can machines think?

He probably didn't envision a future where, a simple voice command to a phone could reveal the best restaurant, say, in Cincinnati, where I am.

But last week was restaurant week here, and I enjoyed a meal that was recommended by AI.

And when we start to really think about this, the advancements of AI from when Turing actually wrote his seminal paper, to today, it's it's pretty pretty remarkable.

We've witnessed chess and jeopardy champions bow to the intellect of machines like Deep Blue and Watson.

And when Siri was introduced, AI became not just a tool, but a companion.

Learning our routines and our case in music, even, you know, monitoring our health that we all wear some some sort of device like that.

In my own kitchen, Alexa actually orchestrates the daily rhythms.

She's the reasons that alarms get set or lights get turned off.

She's also the reason that gummy bears tend to always make it on to my grocery list.

She orchestrates, all of my children's impromptu dance parties.

And while their their musical tastes are still developing, it's really impressive to see how naturally they interact with the technology that when I was much younger felt like science fiction.

So I guess my point here is that we're not just spectators in this AI, evolution.

We're participants and especially in medicine.

Today, I'm not here to ask you if machines can think, but instead, I wanna share how I believe it can be harnessed to revolutionize the way that we practice medicine and how we are training our future physicians.

But before we go into all of that transformative effects of AI in medical education, I think that it's really important to level set.

We all need a very clear understanding of what AI is.

AI, you can think of is the art of creating smart machines.

It's a little bit like computers trying to play the role of the human mind.

It's not just about following a set of instructions, but instead AI is machines learning from experience, adapting to new inputs, and doing his, tasks that have historically been done by humans.

When we're talking today, we're not talking about robots taking over the world.

We're talking about the type of AI that can suggest what type of restaurant you might like to eat at or the kind of AI that helps a doctor diagnose a disease faster than ever before.

This type of AI is not just programmed.

It learns, it grows, and it evolves.

In the context of AI, you you may or may not have heard terms like machine learning, deep learning, natural language processing, and more recently, probably neural networks.

Each of these is a different type of technique in artificial intelligence.

So machine learning has a lot of parallels in ways that we train future pulmonologists or critical care fellows or residents.

And so think about it this way.

You give a system a whole lot of data.

So this could be anything from chest radiographs to, residency evaluations.

And the system learns to identify specific patterns and make decisions just as a resident learns to navigate the multifaceted nature of diseases.

This process of learning can happen in, 4 different ways.

There are actually more, but these are the 4 main ways that you'll probably encounter.

Supervised learning, unsupervised learning, semi supervised learning, and reinforcement learning.

Interestingly, several several of similarities can also be drawn here between, these processes in the way that trainees are rated on sub competency milestones for ACGME.

Supervised learning uses labeled data.

It's like having a seasoned physician looking over your shoulder and saying up this here's pulmonary edema, and that's a new orthorex.

In this context, the AI is provided with labeled data from which to learn from.

The AI then takes this data and, learns to make its own accurate predictions or diagnosis with unsupervised learning.

However, there are no labels at all, just raw data.

It's like the AI being on a solo rotation without any labels or direct oversight.

The AI has to sift through everything and has to learn to identify trends or patterns hiding, finding hidden structures.

And in this way, the AI can start to, uncover insight that maybe even humans can't identify.

Semi supervised learning is exactly what it sounds like.

It's a combination of both.

Right?

And this can be thought of in the blend of instruction and autonomy that we give our fellows.

And finally, then we have reinforcement learning.

This is like the AI night on call in the ICU.

It learns in real time through trial and error, and it receives feedback, similar to the natural consequences of clinical decisions.

Every interaction, every outcome sharpens that predictive power.

So let's take a couple of examples, and I like to use images because I think it's really easy to understand.

An AI experts historically love to use images of fruit or as you'll see in a couple of examples, cats.

So so for supervised learning, imagine that we have a bunch of images of apples, and we have a database and each image of the apple, which is different, is labeled as apple.

And so we train a model on this and we say, look at all of these things.

These are all apples and the the model learns.

So then when you actually use the model and show it a novel picture of an apple, you can say, what is this?

It'll say, it's an apple.

Unsupervised learning, it's the similar concept.

You have a bunch of pictures of different types of fruit, apples, bananas, peaches, and you expose the AI to all of these.

But the AI has to figure out the trends.

Right?

So it it figures out, oh, these things are yellow.

These things are round.

These things have leaves.

These things are a shade of orange, and it can categorize them.

It doesn't know apples and apple, but it knows it goes with this category.

It doesn't know what bananas or banana, but it goes with this category.

With reinforcement learning, it's a little bit different.

It's all in real time.

So you train a model on a bunch of different images, and then you show it a picture of an apple.

Say, what is this?

And it probably gets it wrong the first few times.

Like here, it says it's a mango, and you provide reinforcement learning.

And you can think of this as like cookies and zaps.

Right?

So it says you're it's a mango.

No.

And you zap it.

Right?

Not really, but that's that's kind of the the idea reward or, or or or punishment.

And so the model learns from that.

Okay.

And so then the next time you show it an apple and you ask it, what is this?

Say it's an apple.

Yes.

It has learned, and this occurs over and over again.

The next, kind of if you drill down, next level of this is deep learning, and this takes this whole process a step further.

It's a specialized approach within machine learning that uses what we call neural networks.

And you can think of these as a series of algorithms and they're modeled after the human brain.

These, neural networks are designed to recognize pattern with incredible depth and nuance.

They're made up of layers of interconnected nodes or artificial neurons, although that is where the similarities between the brain and end.

And each of these layers process information and learn from it.

So let's take another example of images, cats and dogs.

So, imagine we have a stack of photographs, a mix of cats and dogs, and we want our AI to sort through these and be able to recognize which is which and also identify any novel picture it encounters.

So here we have a very simplistic neural network, and it has layers of interconnected nodes.

And again, each of these layers is gonna be a specialist.

One layer may be an expert in teasing out textures, another in discerning shapes.

Another could be on, I don't know, identifying colors.

So each of these layers build on the work of the previous layer and it refines the AI understanding until it processes the entire image and it can say, oh, it's a cat with the same confidence that a child would.

So in pulmonary medicine, for example, deep learning with neural networks, can analyze thousands of chest x rays learning to detect subtleties, distinguishing between a wide array of conditions from pneumonias to pulmonary fibrosis, potentially with even greater accuracy than ex experienced radiologists.

So in a very short time, deep learning has become the powerhouse behind the most cutting edge AI applications that we're seeing today.

And this takes us to a pivotal moment in the AI timeline.

The founding of OpenAI in 20/50.

So OpenAI's mission has always been to push the boundary of what AI can achieve, and they define this by, trying to to discover what's called AGI, artificial general intelligence.

We can get to that later.

But they focus on building and refining a special series of neural networks, and these are called generative pre trained transformers or GPTs.

Interestingly enough, the transformer architecture was not discovered by OpenAI, but instead Google.

But it's the way OpenAI has applied it that has been so transformative.

Now these these types of, neural networks, they're not just algorithms.

There are vast reservoirs of knowledge and these are capable of understanding and generating human like texts.

And in late 2022, OpenAI introduced chat CPT to the world, A tool that was so advanced that it could converse and answer queries and even simulate reasoning.

And I'm sure that each of you has at least heard of chat GPT and likely has interacted with it to some to some extent.

But if we take a step back, this reminds me of a time when I experienced something completely new that really shifted my perspective.

This was back in college when Napster first onto the scene, and it's dating me, but I didn't at that point really conceptualize the impact that something like Napster would have.

Instead, I was really, really excited about how awesome my summer of 2000 playlist was going to be.

And then everything changed.

The release of Napster was a watershed moment for the the music industry.

It completely upended traditional business models, and it paved the way for the rise of digital media and completely transformed how we consume them.

For me, the release of chat GPT feels like even more of a seismic shift.

This isn't just another step forward in something like natural language processing.

It is the kind of we that disrupts our notion of what AI can do, setting the stage for a new era of human machine interaction.

And what I'm most excited about is how this new technology could transform medical education.

So what's the first thing that we do when we encounter such a tool?

We play with it.

Right?

We test its limits, and we ask it what the meaning of the world is.

So I wanna give you an example.

I did this too, but because I really, really like numbers, I asked it to write me an infinitude of primes with every line that rhymes.

And what amazed me at the time now now mind you, this was last year, was that it it did it, and not only that it was correct, but it was also witty.

Yes.

I think I can, though it may take a clever plan.

I'll start by noting Euclid's proof, which shows that primes aren't just aloof.

Assume we have a finite list of primes that none have been missed, multiply them altogether, and add 1 just to be clever.

So it goes on flawlessly rhyming or marrying the rhyme of verse and the rigor of mathematical proof.

So I show you this not just because I think it's, you know, kind of fun, but I want to use it as an example to explain how AI can accomplish something like this.

At its core, GPT is a language model, and it's designed to predict the next word.

It's at its core.

It is just a fancy auto complete.

So a user inputs a prompt into GPT and GPT generates coherent and fluent text that reads like it was written by a person.

GPTs can do this because their training is on massive massive data set of texts such as the common crawl.

The common crawl is a collection of 1,000,000,000 and 1,000,000,000 of web page from Reddit to Twitter to all sorts of things.

Right?

And it has a vast amount of data and this allows GPT to learn the patterns of human language and predict the next word in a sentence similar to the way that your phone can do that, based on, you know, text that you you you tend to off frequently, send.

But what sets GPT apart is that it can generate entire sentences, paragraphs, or even articles, that sound as though they were written by humans.

The last year of technological advancements with generative AI has been an absolute whirlwind.

OpenAI's GPT series were now up to GPT 4 turbo, and we're about to get 5 Anthropics Claude, where Opus is rivaling that.

The open source models like, the French Mistral series, Falcon, Meta's llama series.

Last week, Meta, Llama 3 dropped, and it's absolutely amazing.

So this goes on and on and on.

These models have progressed at a really, really rapid rate, and it's not just in the area of text, but also images.

The majority of the slides that I've shown today, and I will show to you, have been developed with the a of aid of AI models such as DALL E 3, Midjourney, stable diffusion.

This image here, which looks like a professional headshot of me, was generated using a combination of text prompts and real life photos of my of my cell.

But AI can generate more than text and images.

Let's consider for a moment the unique qualities of human speech.

Its ability to convey not just information, but emotion, intention, and personality.

We are now capable of harnessing AI to replicate these qualities.

In fact, with less than 60 seconds of audio record recording, it's possible to train an AI model on your own voice, enabling it to generate highly realistic speech in a range of languages and tones.

Now here's the twist.

For the past few minutes, you haven't been listening to me.

Or to be more precise, you haven't been listening to the real me.

The voice you've been hearing is an AI generated replica of my voice.

So we always that was, you know, something that that we we made, and we actually ended up using this and and testing it originally by putting it into, a chatbot and creating a bot that we could type in and it would just talk.

And we we actually had it call my mother, and it's my mother talked to this, to this bot thinking that it was me for a half an hour.

And after that half an hour, it wasn't because she figured out that it actually wasn't me.

I just I felt so badly for Turing testing my mom that we had to we had to stop.

But my mom has kind of been become our witness test for all of our new technology.

We just basically test it on my mom and we see what happens.

But AI Here we go.

So another example of this, just a couple week months ago, Soera, the most advanced text to video model was released.

Once again, accelerating our paradigm of what's possible with AI.

Now I wanna start to shift.

I want you to start to think about the possibilities for AI in medical education.

Imagine a world where medical trainees can interact with virtual patients, practice complex procedures on digital models, and receive real time feedback from an AI system.

The traditional models and methods of of acquiring knowledge are being transformed, ushering in a new era of medical education that was once only imagined.

So you may have heard, about the GPT models, taking and passing step 1.

There's been tons and tons of papers published on this.

I particularly like this one, because it goes through the different ones and and and shows it kinda longitudinally.

Also, this, was done in 2023 by, a bunch of people from Microsoft and OpenAI.

And what's remarkable to me, was it didn't achieve perfect scores, of course, but GPT 4 did surpass the national average of medical students.

Moreover, you know, this this blow this blew my mind the first time I saw this because GPT 4 wasn't, like, exposed to a bunch of USMLE style questions or even old USMLE questions during its training.

Instead, it learned from vast Internet database like the common crawl, and this contains the best and the worst of humanity.

Right?

It wasn't said issue after issue of the New England Journal of Medicine, but it did watch every single episode of house.

So it goes further though.

In this paper, they actually, they they actually go and look at the the the questions that GPT got wrong.

And when asked to explain its reasoning behind the incorrect answer, the rationales provided by g p g p t 4 were given to physicians, and those physicians actually agreed that under certain circumstances, GPT's answer could be correct.

And, not too long ago, early in, 2023, the New England Journal of Medicine started putting out this new podcast.

It's called AI Grand Rounds.

This particular episode is really, really good.

Some of the other ones are not too bad.

At the end of March last year, they did an entire episode on GPT, and this had a bunch of different use cases in medicine.

And, they particularly, focused this interview, with Peter Lee, who's the corporate vice president of Microsoft.

And he was describing how in Boston, they put GPT into the hospitals and asked, physicians to use it in their day to day work flow.

And one particular case, really stood out to me.

They were describing, how they, you know, ask physicians to use this, and there was this one oncologist.

And this oncologist had a patient with late stage pancreatic cancer.

And this this patient really wanted, surgery and experimental immunotherapy.

But the oncologist, you know, had determined that this really wasn't the right path, and it wouldn't lead to extending the patient's life.

In fact, it could have have negative effects.

But the physician was having a really, really hard time, kind of communicating this to the patient because the patient was very fixated, on on this one form of of, treatment.

So, the doctor came back to Lee's group, and Lee's group was like, hey.

Like, why don't you, interact with GPT and see see, you know, what comes out of it?

And so, the oncologist, told GPT everything.

And, GPT said, you know, provided some examples about how to talk to the patient and explain to her, why they weren't gonna go with the surgery.

And, it produced it produced great results and great ideas, and the oncologist used these and moved back to the patient.

And the patient ultimately, you know, they were able to come to an understanding and the patient agreed to not go for the experimental treatment.

And at the end and that's that's amazing in and of itself.

There's been a lot of studies showing that the patients actually prefer, explanations by, large language models like GPT.

But what was really stunning to me was at the end of all of this, Peter, Peter Lee reported that the oncologist went back to GPT and and and thanked it.

And, GPT responded, well, what about you?

How are you holding on?

Are you getting all of the help that you need?

So I've, you know, given you some examples of some pretty cool emergent behaviors that have been exhibited by AI, and specifically large language models.

But it would be really unbalanced and irresponsible if we didn't also take some time to address the undesirable hate behaviors which have surfaced.

One such behavior that you're probably most familiar with would be hallucinations.

And hallucinations are a phenomenon in which a model generates text that appears to be coherent and meaningful, but it's not grounded in reality.

And this is really, important because the models produce this information with such confidence.

It's important.

It's a really important thing, I think, for us to think about, especially in medicine as it can lead to misinformation or misguided recommendations.

And this could have detrimental effects.

Right?

Not only in physician training and education, but also in patient care.

So let's start with a more benign example.

Early in 2023, I asked, GPT, and this was the 3.5 model.

This was before 4 existed, for a list of articles on natural language processing in medical education because I I was, doing a lit review, and it responded education because I I was, doing a lit review and it responded really confidently, and it gave me a bunch of citations, from real medical education journals authored by real people.

So here's what it returned.

And I'm looking through these, and I was like, oh, okay.

And I noticed this one here.

So Dan Schumacher.

Dan Schumacher is a colleague colleague of mine.

He works, right across the street from me at Cincinnati Children's Hospital.

And, I'm looking at this and I'm like, I don't I I know Dan really well.

Like, we're we're in a lab together.

I know his work.

I don't remember seeing this this this paper.

That's odd.

And then I saw this one by Sanjay Desai, who's the head of the AMA.

And again, like, know that work, but don't remember seeing him ever publishing anything about machine learning.

So I was a little bit taken aback, and it turns out that it responded with references to imaginary papers authored by real people with relative relevant experiences.

Right?

So did it lie to me?

Let's consider why this happens.

At its core, remember, these large language models are just fancy auto completes.

The training data included information like doctor Desai's affiliation with the AMA, and it included information about how the AMA is outstandingly interested in medical education and precision education.

It included doctor Schumacher's scholarly pursuits in qualitative analysis, which he is known for, and narrative assessments in medical education.

But remember, these models don't actually know anything the way that humans do.

They don't understand truth.

They can't verify with authenticity the information that they're generating.

When GPT or any large language model creates a citation or generates any output, it's not checking databases or confirming with PubMed, right?

It's estimating in this example, what a citation ought look like based on the patterns that had been trained on.

Sometimes it gets it right.

Other times it combines real elements, genuine journal titles, real researchers names into a citation that sounds plausible but is entirely fictitious.

This is a major barrier and we have to be able to navigate it.

And ultimately, we have to be able to control for it if we are actually going to apply large language models, generative AI, or any AI to high stakes complex environments like patient care and medical education.

But this seems to be something that is really hard to tackle.

So how do we begin?

I'm gonna give you one example of how we have started to approach this.

So one night, I was sitting around with many of my colleagues.

We have this, kind of, like, monthly, big meeting of our lab, and this included Dan Schumacher.

Right?

And we were kinda half joking about how AI is, like, padding his already impressive publication record.

And this but this ended up leading to a really serious conversation.

Right?

How do we harness technology responsibly when it shows traits like hallucinations or, I mean, potentially worse, right, like power seeking behavior if you can imagine.

So ultimately, we, we considered scenario based strategic planning.

This is a method that's embraced by visionaries and strategists from the boardrooms of Shell to, the command centers of the Department of Defense.

It's designed to navigate the uncharted potential of the future.

And we thought, hey, like this this approach is good enough for them.

Like, maybe we can apply it here and offer a unique lens to envision the role of AI in medical education, you know, in the future.

So let me let me take a minute to describe education, you know, in the future.

So let me let me take a minute to describe exactly how this works.

Think about it as the art of preparing for multiple tomorrows.

It begins with developing a detailed description of multiple possible futures.

Right?

So you're gonna make a lot of these.

And these aren't just wild guesses.

Right?

But these are informed structure narratives that explore different outcomes in the future based on current data that we have at our hands.

From there, the forces that shape our world are brought together.

So they'll bring in people from science and agriculture, finance, medicine, etcetera.

And they're brought together and they're assigned to one of these scenarios.

You have a group of all of these people assigned to scenario 1, a group of all of these people to scenario 2, and they tease out the opportunities and the risks that each of them might hold.

It's a collective effort, right, to kind of forecast and strategize and adapt.

And by examining each of these results, of each possible future side by side, you can start to see common threats across all of them.

And the trends that emerge are then used to shape policies and practice and even set research agendas.

Now we're a bunch of academics and we don't have access to the same resources as the Department of Defense, so we tried to do the next best thing that we could come up with.

We actually took several different large language models, and we had them play the role, the various roles in this process.

And what happened was it resulted in 4 distinct scenarios of what 20 40 might look like in the realm of medical training with AI having a very strong role in it.

So I'll kind of share really briefly what each of these were.

The first role or the the first world, or future was one of AI harmony.

And in this future, AI was embraced across all society.

It enhanced medical education and health care.

It provided opportunities for personalized learning, and personalized health advice.

But the only way this was possible was because there was extreme focus on ethical management, and this was crucial for the responsible use of AI and also equal benefit of distribution.

The second world, was exactly the opposite.

It was one of AI conflict.

And in this future, AI had been weaponized to lead, and led to, compromised health care.

Trust was eroded.

Physician stress was through the roof.

Disinformation, was being spread across the world and disrupted medical knowledge, medical education, and health care.

And that ended up suppressing critical thinking and diverse, perspectives.

The next one next one was one of ecological balance.

And in this future, there was, a very heavy focus on AI's impact on the environment.

AI aided in informed decisions that were directed at societal benefits, not necessarily the individual.

And this transformed health care to prioritize wellness.

However, it also resulted in a lot of ethical dilemmas in how we align global health and AI initiatives, but still balance individualized care.

Another one that arose was and this is the last one, was one of existential risks where uncontrolled AI propagated and, you know, was an existential risk.

And what resulted was this prompted a shift away from technology reliance.

And doctors went back to things like pen and paper vials.

Right?

Abandoning electronic health record.

And this had significant impair impact on health care efficiency and ethical dilemmas arose for physicians trying to balance this global crisis with individual patient care.

So the resulting themes we actually, had analyzed across these worlds in which was much, much more detailed than what I just showed you with another large language model.

And it identified benefits, such as streamlining health care systems, accelerating research, enhancing personalized learning, but also risks, misinformation, loss of privacy, the erosion of human expertise.

So our exercise here resulted in 4 distinct, but, I mean, admittedly exaggerated, right, futures.

But here's the thing.

Like, our this this wasn't an attempt to predict anything.

Right?

We were just trying to provoke thought and challenge ourselves to consider the many different ways that AI might be able to transform, medical education.

And ultimately, this this process that we went through helped us, create a focused call to action.

And this was able to urge other medical educate people in medical education to develop ethical frameworks, foster collaboration, and continuously assess AI's impact on an ongoing basis.

And we had already see all of this playing out.

It wasn't just us.

There's been a lot of papers not exactly like this, but these kind of call to actions of how how we can navigate.

And over the past year, we've seen a lot of examples of AI being integrated into medical education.

And so for example, the double AMC has has, started to form these, taskers.

I'm actually part of 1.

I'm part of the, technology advancement committee for selection.

Right?

And we're writing guidelines on how institutions can ethically and responsibly integrate AI into the selection process, such as, medical school acceptance, the match and beyond.

Also, we're seeing examples in training already in ways that amplify it rather than diminishing it.

And I actually got to take a really deep dive into these type of things, like how is it being integrated into medical education now, when I was invited to contribute to a supplemental edition of academic medicine called the next era of assessment using prediction or precision education to center on equitable care of patients.

And this just came out.

So in this paper, we were asked to kind of reimagine medical education assessment through the lens of AI, and specifically focus on precision medical education, which is what AMA is really, on about right now.

And so we we use this to explore, how current research in AI, especially in machine learning and deep learning could be used to augment this model and tackle inherent limitations in traditional assessment methods.

A lot of them were really powerful, and so I picked a few to share with you right now.

So first, we looked at the category of proactive data collection because collecting comprehensive longitudinal data on learners is really important, both for competent competency assessment, but also implementing a vision of precision education.

But it's been really hard historically, right, because lack of resources, lack of faculty, and also lack of time.

So at NYU, Verity Shea and her colleagues, broke this paper, which is super cool, and it describes the development of a high performing machine learning model.

And what this thing does is it classifies residency admissions notes into, different levels of quality.

So how good is it?

Right?

And it does this within the clinical environment.

So this study actually introduces a scalable and reasonably objective way to assess and provide feedback on clinical reasoning documentation, a task that has been historically really, hard to do because it's time consuming and also really subjective.

It paves the way for a more personalized and data driven approach to medical education.

And so here we see a step forward where technology and pedagogy are are converging.

The next one was, we looked at predictive outcomes.

And in in the paper, there's tons of examples.

I'm just picking out really cool ones to share with you right now.

They're all cool.

But, so so here, we wanted to take a look in to, how trainee performance can, impact patient outcomes.

Right?

So looking at that that relationship, because that's one of the goals in in medical education is to produce competent, compassionate physicians.

This is hard to do in today's complex team based care environment because, you know, attribution, contribution is a hard thing to tease out in individual performance.

And then how do you make decisions about that?

So in this example, this this paper was published in Nature and it describes the, an approach to assessing surgical performance by analyzing specific gestures during a nerve sparing robot assisted radical prostatectomy.

So how it does this is it breaks down the surgery into discrete gestures.

And what they discovered was that certain gestures correlated with improved patient outcomes.

So utilizing machine learning modules, the gestures were then that that were seen to correlate were then used to predict patient, outcomes more accurately than the traditional clinical features that had been being used.

And what's really interesting to me is that there's a growing evidence linking technical skills to patient outcomes.

And this has led certifying bodies such as the American Board of Surgery to explore the value of like, video based assessment, as an adjunct to the existing mechanisms for board certification.

Another one we looked at was bias in assessment.

So, you know, while AI may help solve a lot of the challenges in assessment, we also have to be, vigilant about bias.

Right?

Because we know that bias is in textual data.

And we know that this these models are trained on vast amounts of text textual data that is created by humans and humans are biased.

And so we need to be careful because this could lead to skewed interpretations or outcomes when the data that's used to train the AI models could then lead to perpetuate stereotypes, discriminations, or other forms of unfair treatment.

And, something that's kinda neat though, is that recent studies have actually been used to detect AI to mitigate bias like this 2023 study.

It describes, the development of a comprehensive and robust framework called NBIAS that's able to identify words and phrases in text that may be biased.

And so you can imagine running text through something like this before it's chosen to, train an AI.

Some of our work currently is actually, using this, and we are developing a, a biased AI agent.

And this agent is going to be sitting outside of, our database of narrative assessments of medical students in the clinical year.

And it will, in real time, read those narrative assessments and flag potential, statements that may be biased for CCCs to review.

And so lastly, I wanna talk a little bit about personalized analytics and precision medical education.

This is something that I am very passionate about.

So individualizing the educational journey to match a learner's needs.

It's a it's a component of many, many, many educational theories.

Right?

And it's the core tenant of precision medical education.

But the prevailing paradigm in medical education operates under a one size fits all approach.

And there's been tons of discussion, in the literature about the potential that AI has in solving this issue.

However, most of the projected benefits and challenges are just speculative.

Right?

They're perspective pieces.

There's not a lot of actual studies out there, and the ones that do exist are at a very localized level.

Like, somebody, like, has Chat GPT on their computer, and they, made some clinical scenarios and, printed them out and then somebody used them.

Right?

They're they're not, like, using these platforms at a scale level.

They're not scaling them to large number of learners or across institutions.

So I'd like to end by sharing some of the work that we are doing to address this gap.

So, again, this this medical education experience fails often to, address the diverse learning needs of trainees.

This results in hidden gaps and learning, which can compromise readiness for residency, future practice, and ultimately impact the quality of patient care.

And this isn't a new phenomenon.

Right?

So so, a educational philosopher and psychologist, Benjamin Bloom, in the 19 eighties demonstrated that 1 on 1 tutoring could actually augment learner performance by 2 standard deviations or 2 Sigma, making a below average student above average and an average student exceptional.

However, offering 1 on 1 experiences to every single student has been historically impractical due to resource constraints, faculty availability, and financial barriers.

Even if we had piles of money laying around, there's just not enough faculty or time to accomplish this.

So, over the last couple of decades, though, there's been an emergence and a push, toward individualized learning.

And this has resulted in a bunch of different theories, like the master adapted learning, learner, competency based medical education, and now precision medical education.

So there's a lot of calls to action, to address these barriers and challenges, but not much headway on a scalable solution because these are all really great ideas, but very challenging to actualize.

So in 2023, we developed 2 Sigma, and 2 Sigma is, the the this is, the prototype version that I'm showing you here, and this was developed in an effort to start to address this need.

Two Sigma is a generative AI platform, and it personalizes, learning in medical education with the aim of actually advancing precision medical education.

It utilizes intelligent algorithms for natural human computer interaction.

And this particular one did it through text to text, but our, more advanced version does voice to text, and we even have some other multimodal, processes right now.

This specific prototype focused on the generation of clinical scenarios.

So it was designed to simulate real world interaction and foster clinical decision making skills.

Each session began with the AI introducing a patient case with observable details.

This was the goal was to mirror real life clinical situations where additional information had to be sought out by the student.

The students then interacted with this AI as if it was a real patient making decisions, requesting actions like vitals, IV placement, diagnostic tests.

The AI would respond conversationally as the patient.

If the student talked to as the patient, it also would return results of requested actions or tests.

Basically, it continuously challenged the student to diagnose based on evolving information.

Most cases ended upon the, the identification of a diagnosis with a couple of cases, continued for management practice.

Following each of the interactions, the AI, delivers comprehensive feedback on various competencies, including diagnostic accuracy, decision making, efficiency, cost effectiveness, etcetera.

And given that the pre clerkship students at the University of Cincinnati often have very limited opportunities for real world clinical interactions, they were identified as the ideal candidates for piloting the 2 Sigma platform.

So following, rigorous, development and AI training, we deployed it to nearly 200 second year medical students for testing.

And this was funded by the American Medical Association.

And in just under or sorry.

Just over a month's time, the students generated almost 2,000 unique sessions lasting an average of 16 minutes each.

And this was exciting because it established proof of context for leveraging generative AI to create novel experiences with tutoring to enhance medical education.

We were excited, because now we proved that we could actually scale this.

And one one, anecdote that I like to to to tell, though, we clearly have a lot of work to do to to to see if this pans out was, my my colleague who did this with me, doctor Matt Kelleher, at the end of 2nd year, all of the students have to take, this in person OSCE.

And the results are reviewed, like, in real time, 1 on 1 with the students.

And Matt, you know, does this.

He he's the director, and he, you know, kept noticing, like, oddly, in this year, unlike any of the other years that he had been doing this for a long time, students were really hesitant to, order too many labs or tests.

They were very concerned about cost effectiveness, and you couldn't figure out why.

And then finally, at the end, of all of these interviews, there was, like, 1 or 2 last students.

And finally, one of the students said, well, yeah.

Like, I was really I didn't wanna order too many tests because 2 sigma kept giving me feedback on how, like, when I was ordering extraneous tests, like, that wasn't cost effective.

So I've been trying to, like, tone it down.

Not proof, not even correlation, but interesting nonetheless.

So we believe, yes, that this initial deployment represents a significant step forward.

We've got a lot of work to do.

And, we're in kind of the next phase where we are scaling across the across the, curriculum of undergraduate medication, medical education, and going into residency and even fellowship.

But we have work to do in act understanding accuracy and reliability, case demographics, and safety.

So what we've been doing is, we've expanded the module beyond just the clinical scenarios, which I explained, but we have question banks now, that generate NVMe style questions.

We have a tutoring function, patient presentation skills.

I talked a little bit about our bias agent.

We've made something called the the supervisor, which will go in and actually pause these things, at certain points and then kind of query the student about their their their clinical decision making.

We're actually creating a coach, a reflective coach that will learn all about the student in real time, understand all of their assessments, and be able to help them debrief in a compassionate way about their, interactions with patients.

So we have a lot of these different, modules that are, that are currently in process.

And we're actually trying to scale this in a way that we can create what's called an agentic AI system.

So this is something that is becoming very popular in AI circles.

And this paper came out, from open AI about the development of agentic system.

So an agentic system is some a system that integrates AI and can be trusted, under certain circumstances to carry out actions completely autonomously without the supervision of a human.

So why is all this important?

This is important because we believe that this type of AI has the can be a solution to actualizing precision medical education in a meaningful and scalable manner.

By doing this, we can empower trainees and students, to, take a hold of their own learning.

And by highlighting areas of strengths, opportunities in a safe way though that actually aligns with growth objectives.

You can offer every learner personalized learning experiences, allowing them to reach their full potential.

We also believe this has implications for the system level.

We can improve medical education, aligning assessments with growth, providing new levels of understanding about the development of complex skills like clinical reasoning, using more relevant data sources like conversations instead of just, you know, multiple choice tests.

And in turn, these advancements could positively impact transit transition points across the medical education continuum, reducing the risk of learning plateaus, making handoffs more transparent.

And finally, we believe it has the potential to promote equity.

The AI doesn't know the background of a student.

The AI is actually a lot cheaper to run than to buy something like UWorld if we can get it to generate questions.

And beyond that, you know, we could use it to detect potential bias in narrative assessment.

And finally, we believe that, you know, AI could be the tipping point that actualizes precision medical education, ensuring that every physician is exceptional in improving health care outcomes for everybody.

Thank you.

Turner – Using AI in Medical Education

Episode Transcript

Never lose your place, on any device