
Hard Drugs
·E2
Proteins: Weird blobs that do important things
Episode Transcript
1
00:00:00,320 --> 00:00:04,640
In today's episode, we're going to talk
about the wonderful world of proteins.
2
00:00:04,640 --> 00:00:08,000
Proteins are all around our body.
We use them in our daily lives,
3
00:00:08,000 --> 00:00:10,800
and they do amazing things to keep us going.
4
00:00:10,800 --> 00:00:17,120
Protein design just won a Nobel Prize
and we are going to do a mini-series of
5
00:00:17,120 --> 00:00:23,920
episodes here to talk about AlphaFold and
other AI systems used to design proteins,
6
00:00:23,920 --> 00:00:29,520
whether people can increasingly design dangerous
proteins, not just medicines, and whether protein
7
00:00:29,520 --> 00:00:36,720
design can help us get cures for some of the
toughest diseases that still plague humanity.
8
00:00:39,600 --> 00:00:45,280
But first, let's start with the basics. You might
remember being in high school biology and seeing
9
00:00:45,280 --> 00:00:51,440
a simple diagram of a cell. It probably looked a
bit like a fried egg or a sunny side up. There was
10
00:00:51,440 --> 00:00:56,080
the nucleus, which was a bit like the egg yolk.
And then there were a few other things scattered
11
00:00:56,080 --> 00:01:01,600
around, like mitochondria and ribosomes,
but that was a massive simplification.
12
00:01:01,600 --> 00:01:07,840
In reality, cells are incredibly busy. There
are billions of molecules in every cell,
13
00:01:07,840 --> 00:01:13,680
including loads of proteins, which have different
functions. So let me just think about what are
14
00:01:13,680 --> 00:01:18,400
the different things that the proteins are
doing? Well, there are structural proteins;
15
00:01:18,400 --> 00:01:23,920
they provide shape and strength to cells.
There are storage proteins; they store
16
00:01:23,920 --> 00:01:28,640
little molecules. There are signalling proteins
that help cells communicate with each other.
17
00:01:28,640 --> 00:01:34,240
So insulin, for example, is a hormone, and
it's made in the pancreas and it tells cells
18
00:01:34,240 --> 00:01:39,920
to take up glucose from the bloodstream,
and that lowers blood sugar after eating.
19
00:01:39,920 --> 00:01:45,600
There are also transport proteins that
move molecules between cells. Haemoglobin,
20
00:01:45,600 --> 00:01:50,640
for example, is a protein in red blood cells
that binds to oxygen and carries it around in
21
00:01:50,640 --> 00:01:56,320
the blood. There are also enzymes — enzymes speed
up chemical reactions in our body, by lowering
22
00:01:56,320 --> 00:02:02,720
the activation energy needed for them. There are
regulatory proteins that control other proteins
23
00:02:02,720 --> 00:02:08,800
and pathways. And there are defence proteins
that protect us from attack; so antibodies are
24
00:02:08,800 --> 00:02:13,760
a type of protein. Snakes and spiders have
venoms, which are proteins that help them
25
00:02:13,760 --> 00:02:20,640
disable their threats. There are so many different
types of jobs that a protein might have, and many
26
00:02:20,640 --> 00:02:26,320
proteins have multiple jobs at the same time.
And this means that this basic diagram view,
27
00:02:26,320 --> 00:02:32,640
that you might've had of a cell, was quite
simple. In reality, the cell is extremely
28
00:02:32,640 --> 00:02:39,840
busy. It's more like a bustling city, and there
are literally billions of molecules, proteins,
29
00:02:39,840 --> 00:02:46,560
DNA, RNA, fats, sugars, and ions — all moving
around, reacting and interacting with each other.
30
00:02:46,560 --> 00:02:53,520
Every part of the cell has its own job and it's
a bit like different districts in the city.
31
00:02:53,520 --> 00:02:59,840
There's a great blog post by Niko McCarty where he
describes this, and I thought it would be helpful
32
00:02:59,840 --> 00:03:06,480
just to have a sense of what's going on. He says,
"A microbe's guts are a veritable Times Square,
33
00:03:06,480 --> 00:03:12,080
crowded with sugars, proteins, and water molecules
that ricochet and smash into each other billions
34
00:03:12,080 --> 00:03:19,040
of times each second. Space is limited. A
bacterium's insides are 70% water by mass;
35
00:03:19,040 --> 00:03:25,760
the other 30% is dominated by proteins first,
followed by RNA and lipids. DNA accounts for
36
00:03:25,760 --> 00:03:34,080
just 1%. And all of this stuff fits inside a
volume that is one quadrillionth of a litre.
37
00:03:34,080 --> 00:03:37,920
That's a lot of proteins and I
can't even see one of them.
38
00:03:37,920 --> 00:03:46,160
Right? They're so small. And so if you think
of this city — of each cell — the nucleus is
39
00:03:46,160 --> 00:03:51,600
something like the city hall, it's managing the
information; it has instructions for what should
40
00:03:51,600 --> 00:03:57,840
happen. There are mitochondria; the power stations
of the cell. There are ribosomes that construct
41
00:03:57,840 --> 00:04:02,240
new proteins. And then there are proteins, that
are the workers and the machines of the city,
42
00:04:02,240 --> 00:04:07,120
but they're also the structural components and the
signalling molecules and all of these things.
43
00:04:07,120 --> 00:04:11,600
Our body is doing so much
with all of those proteins.
44
00:04:11,600 --> 00:04:19,520
Are proteins used outside of the body too?
They are! In fact, if you've done any cooking,
45
00:04:19,520 --> 00:04:25,520
you would know, for example, that chemical
reactions change the proteins that you're cooking
46
00:04:25,520 --> 00:04:32,720
with. So, for example, if you cook an egg white,
it becomes firm when it's cooked. That's because
47
00:04:32,720 --> 00:04:38,880
the heat denatures the proteins — it makes them
unfold — and then it makes them coagulate into a
48
00:04:38,880 --> 00:04:44,400
different kind of mesh, and that makes it opaque.
There's also gluten, which is a protein that gives
49
00:04:44,400 --> 00:04:50,160
bread its stretchy texture — that's made of two
proteins. There are also lots of proteins that
50
00:04:50,160 --> 00:04:57,360
are used in industry and biotechnology. If you've
done your laundry recently, you might have used
51
00:04:57,360 --> 00:05:04,880
a detergent that was made of enzymes, and the
enzymes break down stains, like fat or blood.
52
00:05:04,880 --> 00:05:10,240
Then there are a bunch of proteins that are used
in baking and brewing and textile manufacturing.
53
00:05:10,240 --> 00:05:14,720
Of course there are lots of proteins that are
used in medicine as well. So I mentioned that
54
00:05:14,720 --> 00:05:21,600
antibodies are a type of protein, and lots of
medicines are types of antibodies. There's also
55
00:05:21,600 --> 00:05:29,680
insulin, which people use in diabetes; it's
a protein that is also a therapeutic drug.
56
00:05:29,680 --> 00:05:37,120
What actually are proteins? What do
they look like and how do they form?
57
00:05:37,120 --> 00:05:45,680
Proteins are long chains of amino acids. You
can sort of think of that as like beads on a
58
00:05:45,680 --> 00:05:53,200
string. And then that string, or that chain, is
folded into some kind of 3D shape. The string
59
00:05:53,200 --> 00:06:00,720
is the protein's backbone, and each bead is an
amino acid. Each amino acid has unique features.
60
00:06:00,720 --> 00:06:06,240
So as this string falls into a structure, you
can kind of imagine that maybe happening at a
61
00:06:06,240 --> 00:06:12,640
small scale — maybe there's like a little helix
of the string in some place, or maybe there are
62
00:06:12,640 --> 00:06:18,320
two parallel strings next to each other. But
imagine that... we have to kind of zoom out
63
00:06:18,320 --> 00:06:25,200
and this whole 3D shape of the protein could also
be connected to another protein; it could be two
64
00:06:25,200 --> 00:06:33,200
proteins together, making a protein complex.
How is that made? I know I eat some protein,
65
00:06:33,200 --> 00:06:38,880
but I think we make some too.
That's right. So you have lots of
66
00:06:38,880 --> 00:06:46,320
DNA in your cells, and the DNA, which is the
code of life, is the instructions for which
67
00:06:46,320 --> 00:06:52,320
proteins to make and how they should look.
The DNA is transcribed into RNA, which is
68
00:06:52,960 --> 00:07:00,800
typically this temporary molecule, and then the
RNA is then translated into protein by ribosomes.
69
00:07:01,840 --> 00:07:10,240
They sort of form one-by-one into this chain, and
then rapidly fold into a much bigger structure.
70
00:07:10,800 --> 00:07:16,640
This was kind of interesting to me because when
I was reading this, I was thinking, okay, how did
71
00:07:16,640 --> 00:07:23,520
the first protein that was ever discovered look?
What did people think when they first saw it?
72
00:07:24,080 --> 00:07:32,960
And that was fascinating because the first protein
whose structure was determined was in 1958,
73
00:07:32,960 --> 00:07:40,560
and that was myoglobin. This was determined
by John Kendrew, a British scientist. When he
74
00:07:40,560 --> 00:07:47,360
discovered this, it was only four years after
the discovery of DNA's structure — DNA is of
75
00:07:47,360 --> 00:07:55,840
course very beautiful; it has this symmetrical
structure, of this helix. And he was really
76
00:07:55,840 --> 00:08:01,920
disappointed when he figured out what myoglobin
looked like. He wrote in this paper: "Perhaps the
77
00:08:01,920 --> 00:08:07,040
most remarkable features of the molecule are
its complexity and its lack of symmetry."
78
00:08:07,040 --> 00:08:13,520
Oh no, it's ugly.
But in hindsight, the irregularity is exactly what
79
00:08:13,520 --> 00:08:23,360
makes proteins so powerful. It's not really like
DNA, which has this kind of linear messaging — it
80
00:08:23,360 --> 00:08:30,480
has the code, and then the code just linearly
turns into RNA. But a protein is actually doing
81
00:08:30,480 --> 00:08:37,600
multiple things. It's in the cell being bombarded
sometimes with lots of different molecules,
82
00:08:37,600 --> 00:08:43,600
and it needs to be able to recognise these
different shapes and structures, and sometimes,
83
00:08:43,600 --> 00:08:50,720
it has multiple functions — and this function
of every protein depends on that 3D structure.
84
00:08:51,760 --> 00:08:57,360
The folded shape means that there are like
little pockets, grooves and surfaces that
85
00:08:57,360 --> 00:09:04,240
the protein uses to bind to other molecules,
or carry out specific chemical reactions,
86
00:09:04,240 --> 00:09:10,960
or even receive signals and then change shape in
response. That means the same protein molecule
87
00:09:10,960 --> 00:09:16,160
might be doing multiple things at once. It could
be doing a chemical reaction, but also binding
88
00:09:16,160 --> 00:09:21,360
to something else, and then when it gets some
regulatory signal, it could be changing shape and
89
00:09:21,360 --> 00:09:28,240
stopping that chemical reaction from happening.
So there's benefits to being a weird blob. There's
90
00:09:28,240 --> 00:09:39,440
nothing wrong with being a weird blob.
I thought it would be fun if we both share
91
00:09:39,440 --> 00:09:48,240
some fun facts about proteins. I found these from
the book Biology by the Numbers, which is a great
92
00:09:48,240 --> 00:09:56,400
textbook, and it's also free online. The authors
create these rough estimates and pull together key
93
00:09:56,400 --> 00:10:03,920
numbers on lots of different things related to
cell biology. Some of them are rough estimates,
94
00:10:03,920 --> 00:10:09,120
but they're kind of our best guess right now.
Hit me.
95
00:10:09,120 --> 00:10:15,520
Alright, first one, how many
proteins are in a human cell?
96
00:10:15,520 --> 00:10:21,600
They're busy, so I'm going to guess a lot.
And I'm going to guess it depends on the cell,
97
00:10:21,600 --> 00:10:30,320
but I will go with a hundred million.
That is a lot, and it does depend on
98
00:10:30,320 --> 00:10:38,480
the cell. But the estimate for the average
number is ten billion proteins per cell.
99
00:10:38,480 --> 00:10:44,320
Oh no. Two orders of magnitude wrong, not
a good start. Okay, well, I've got one.
100
00:10:47,040 --> 00:10:54,320
Which is bigger: the protein or the
mRNA that codes for the protein?
101
00:10:54,320 --> 00:11:04,873
Um... surely the protein is bigger, no? Why would
the instructions be bigger than the protein?
102
00:11:04,873 --> 00:11:10,480
That's what I always think, and it's the other
way around. So the mRNA is bigger — you look at
103
00:11:10,480 --> 00:11:18,720
them side by side - well, images of 'em - and
the mRNA is like 10 times bigger. Because each
104
00:11:18,720 --> 00:11:23,840
amino acid is coded for by three nucleotides,
and the nucleotides themselves are bigger and
105
00:11:23,840 --> 00:11:31,520
heavier. So it's counterintuitive to me,
but you know, it makes sense, I guess,
106
00:11:31,520 --> 00:11:36,480
when you think about it physically.
That does make sense... well,
107
00:11:36,480 --> 00:11:39,600
I don't know if that makes sense. I feel
like I need to think about this more.
108
00:11:39,600 --> 00:11:44,160
Yeah, it doesn't make sense from a computer
science point of view, but from a physical point
109
00:11:44,160 --> 00:11:48,240
of view it feels like, yeah.
Right.
110
00:11:48,240 --> 00:11:54,080
I have one. So, you know, as a small person,
I wanted to find out which protein was the
111
00:11:54,080 --> 00:11:59,040
smallest. Do you have any guesses?
The protein that's the smallest? Well,
112
00:11:59,040 --> 00:12:05,840
the definition of a protein... I wonder if I'm
allowed to have- it's got to have at least two
113
00:12:05,840 --> 00:12:12,000
amino acids, so I know it's not going to be
less than two, but that probably wouldn't
114
00:12:12,000 --> 00:12:16,320
count as a protein because it wouldn't fold
into anything, wouldn't have much function.
115
00:12:16,320 --> 00:12:23,920
So I'm going to guess philosophically,
two, and then, literally, more than two.
116
00:12:23,920 --> 00:12:31,920
Well, you're right. I think the typical definition
of a protein is something that floats on its own
117
00:12:31,920 --> 00:12:37,920
in water and can fold into a stable shape. If
you use that definition, then the smallest ones
118
00:12:37,920 --> 00:12:44,080
are some 20 to 30 amino acids long. There are
actually lots of really tiny proteins, and these
119
00:12:44,080 --> 00:12:51,680
tiny proteins are called "micro proteins", and
they're less than a hundred amino acids or so. One
120
00:12:51,680 --> 00:13:01,600
example that's actually even smaller than 20 or 30
is somatostatin, which is a hormone that controls
121
00:13:01,600 --> 00:13:07,680
other hormones — so it controls growth hormone and
insulin. — and that's only 14 amino acids long.
122
00:13:07,680 --> 00:13:12,640
Oh wow, it's that small. Oh okay.
Right. It still has a stable shape,
123
00:13:12,640 --> 00:13:18,320
because parts of the chain are connected to each
other. So it's not considered a typical protein,
124
00:13:18,320 --> 00:13:23,680
but it's a Itpeptide and it's very small.
Got it, okay. What's the biggest? I think you
125
00:13:23,680 --> 00:13:28,160
know the answer to this one.
I think I do. Is it titin?
126
00:13:28,160 --> 00:13:32,640
It's titin. That's the biggest human protein at
least, I don't know outside of humans. But that
127
00:13:32,640 --> 00:13:37,840
one is 33,000 amino acids long.
I got one. What's the most
128
00:13:37,840 --> 00:13:43,120
abundant protein on earth?
I am going to guess it has something
129
00:13:43,120 --> 00:13:49,040
to do with photosynthesis, because that seems
like one of the biggest functions on earth.
130
00:13:49,040 --> 00:13:56,480
Very good guess. So it's kind of a tie, and
we're not really sure which one is more abundant,
131
00:13:56,480 --> 00:13:59,200
so that was a bit of a trick question.
Oh wow.
132
00:13:59,200 --> 00:14:08,080
But one of them is RuBisCO, and that is used in
photosynthesis; it's used to grab carbon from the
133
00:14:08,080 --> 00:14:17,040
air and turn it into useful organic material. And
that's used by all photosynthetic organisms. And
134
00:14:17,040 --> 00:14:22,720
scientists estimate that there are about five
kilogrammes of RuBisCO per person on earth.
135
00:14:22,720 --> 00:14:28,880
Oh my god. What?! Wow.
I guess there are a lot of plants.
136
00:14:28,880 --> 00:14:32,640
Yeah, fair enough. They're winning.
They're winning... for now...
137
00:14:32,640 --> 00:14:37,040
There's actually the second, which
might be ahead. We're not sure-
138
00:14:37,040 --> 00:14:42,480
Oh right.
-and that is collagen. That is
139
00:14:42,480 --> 00:14:51,040
used as a kind of structural protein, and it makes
up about 30% of the protein mass in your body — so
140
00:14:51,040 --> 00:14:57,280
about three kilogrammes of collagen per person.
But it's not just humans that have collagen,
141
00:14:57,280 --> 00:15:05,920
it's also the livestock and all animals. That
means there's- well, the total number- the total
142
00:15:05,920 --> 00:15:12,320
mass of livestock is also enormous, right? And so
this means there's roughly four to six kilogrammes
143
00:15:12,320 --> 00:15:17,520
of collagen per person on earth.
Ready for another fun fact?
144
00:15:17,520 --> 00:15:20,240
Yes.
Well, enzymes are a
145
00:15:20,240 --> 00:15:28,480
type of protein that speed up reactions... so how
much do you think enzymes speed up reactions?
146
00:15:28,480 --> 00:15:37,520
Mmm... a thousand times, maybe? Two thousand?
I feel like... a lot. But I don't know.
147
00:15:37,520 --> 00:15:45,920
A lot. A lot. And I bet some do a thousand, but
if you're really looking at the best of the best,
148
00:15:45,920 --> 00:15:54,400
we're talking billions of times, and possibly
trillions of times, so we're talking millions
149
00:15:54,400 --> 00:16:01,040
of reactions per second per enzyme in some
cases, and just totally changing what is
150
00:16:01,040 --> 00:16:06,640
happening at the molecular level.
That's crazy. That means, I guess,
151
00:16:06,640 --> 00:16:10,640
some reactions just wouldn't happen
if the enzymes weren't there.
152
00:16:10,640 --> 00:16:14,000
Oh, absolutely. Yeah. I mean,
statistically speaking, yeah.
153
00:16:14,000 --> 00:16:18,720
So we were talking about protein folding
the other day, and I was thinking: well,
154
00:16:18,720 --> 00:16:22,480
how fast do proteins fold into
shape? Do you have any guesses?
155
00:16:22,480 --> 00:16:25,758
Oh... that is a tough one because, well, we just
had a very long protein that took forever, but
156
00:16:25,758 --> 00:16:29,680
I bet most proteins don't take long at all. The
folding has to happen quickly, otherwise they'll
157
00:16:29,680 --> 00:16:40,560
get distracted by other forces. So I will go with
tenths of seconds, no, hundredths of seconds.
158
00:16:40,560 --> 00:16:44,480
Pretty close. So, on average,
proteins fold in milliseconds,
159
00:16:44,480 --> 00:16:52,800
but some proteins fold really quickly, in micro
seconds, which are a millionth of a second. And
160
00:16:52,800 --> 00:16:57,440
I guess you're right that it really does have
to happen fast, because there's so much other
161
00:16:57,440 --> 00:17:03,280
stuff going on in the cell. It could just be
bombarded with something else before it folds.
162
00:17:03,280 --> 00:17:04,868
Yeah, well, no fun. One final one
from me. How quick do they move? Let's
163
00:17:04,868 --> 00:17:04,941
say you're in a cell. How quick does
the protein move across the cell?
164
00:17:04,941 --> 00:17:05,035
I love the idea that I've shrunk myself to the
size that I can fit inside a cell. And now I'm
165
00:17:05,035 --> 00:17:05,119
trying to race with these little proteins.
To get across a cell... uh... I dunno. A
166
00:17:05,119 --> 00:17:05,200
second? Maybe half a second? I dunno.
A small protein could be 10 milliseconds
167
00:17:05,200 --> 00:17:05,300
to get across a cell. The thing, though, is that
cells are small. So if you haven't shrunk yourself
168
00:17:05,300 --> 00:17:05,385
all the way down, and are just visualising
the human scale, how long would it take a
169
00:17:05,385 --> 00:17:05,479
protein to move a whole centimetre? Well, then
you'd need 20 days for some of the proteins.
170
00:17:05,479 --> 00:17:05,559
Well, so at first I thought you said -
okay, that's quite fast - they're taking
171
00:17:05,559 --> 00:17:05,646
10 milliseconds to cross the cell. But 20
days to travel one centimetre is quite slow,
172
00:17:05,646 --> 00:17:05,714
I could do that much faster.
Yeah, I think you're going to win.
173
00:17:05,714 --> 00:17:10,560
... but maybe not if I'm shrink to that size.
Okay, I got another one. How fast are enzymes
174
00:17:10,560 --> 00:17:15,600
colliding with other molecules in the cell? Or
how many collisions are there per second?
175
00:17:15,600 --> 00:17:20,640
Okay. I have the sense that things are
just crazy up in there and everyone's
176
00:17:20,640 --> 00:17:26,720
sort of bumping around. So I'm going to
say a thousand collisions a second.
177
00:17:26,720 --> 00:17:33,600
Well, you were right with the idea.
Oh no, I should have just said "A lot."
178
00:17:33,600 --> 00:17:42,640
But I think the estimate is 500,000 molecules
are colliding with an enzyme per second.
179
00:17:42,640 --> 00:17:47,920
Wow.
And that's a lot! And that makes me think that
180
00:17:47,920 --> 00:17:54,000
proteins have to be really specific in how they
bind to their targets. It's like, you know, if
181
00:17:54,000 --> 00:18:00,720
you're at a really crowded party and you're trying
to find a friend, you would just bump into so many
182
00:18:00,720 --> 00:18:05,600
people before you actually find your friend. So
you have to actually be able to recognise them
183
00:18:05,600 --> 00:18:12,160
among the 500,000 random strangers around you.
Yep. That's tricky. Okay, Saloni,
184
00:18:12,160 --> 00:18:18,000
what's your favourite protein?
My favourite protein is tubulin. It's part
185
00:18:18,000 --> 00:18:25,680
of microtubules. The microtubules are kinda the
skeletons of your cells... That sounds a bit grim,
186
00:18:25,680 --> 00:18:33,840
actually. But they are basically formed of these
hollow tubes that are made of this protein,
187
00:18:33,840 --> 00:18:41,040
and each of the little structures is kind of
like a tiny corn kernel. That tube can sort of
188
00:18:41,040 --> 00:18:47,920
assemble and disassemble in response to signals,
and that means that the entire skeleton can kind
189
00:18:47,920 --> 00:18:54,480
of assemble and disassemble... which means the
whole cell can change its shape or its size and
190
00:18:54,480 --> 00:19:02,880
move around, because of these microtubules.
The microtubules also act as tracks to move
191
00:19:02,880 --> 00:19:09,360
things around, so they're a bit like a cellular
railway or something, which I think is just super
192
00:19:09,360 --> 00:19:14,720
cool. And I remember learning about this in
my undergrad and just seeing some diagrams
193
00:19:14,720 --> 00:19:18,320
and thinking, wow, that's amazing.
That's a good one. I haven't even
194
00:19:18,320 --> 00:19:25,680
better one though, which is gluten
in bread! Woo! I'm a bread guy.
195
00:19:25,680 --> 00:19:29,760
That's a good one.
We each have our favourites.
196
00:19:29,760 --> 00:19:36,960
This was the first of a series of mini episodes
we're doing on proteins. Stay tuned for our next
197
00:19:36,960 --> 00:19:45,280
episode on the history of Insulin. And if you like
this, share it with your friends and subscribe.