
localfirst.fm
·E23
#23 – Sujay Jayakar: Dropbox, Convex
Episode Transcript
1
00:00:00,254 --> 00:00:04,304
There's another kind of interesting
decision here on Dropbox by
2
00:00:04,304 --> 00:00:06,824
design was always like a sidecar.
3
00:00:06,824 --> 00:00:09,974
It's always something that just
sits and it looks at your files.
4
00:00:09,974 --> 00:00:12,434
Your files are just regular
files on the file system.
5
00:00:12,794 --> 00:00:17,408
And if Dropbox, the app isn't running,
your files are there and they're safe,
6
00:00:17,408 --> 00:00:21,638
and it's something that you know,
regular apps can just read and write
7
00:00:21,638 --> 00:00:26,988
to, and in some sense like Dropbox
was unintentionally local-first
8
00:00:27,008 --> 00:00:28,508
from that perspective, right?
9
00:00:28,538 --> 00:00:31,658
Because it's saying that no
matter what happens, your data
10
00:00:31,658 --> 00:00:32,918
is just there and you own it.
11
00:00:33,984 --> 00:00:36,084
Welcome to the localfirst.fm podcast.
12
00:00:36,444 --> 00:00:39,174
I'm your host, Johannes Schickling,
and I'm a web developer, a
13
00:00:39,174 --> 00:00:42,234
startup founder, and love the
craft of software engineering.
14
00:00:42,654 --> 00:00:46,194
For the past few years, I've been on a
journey to build a modern high quality
15
00:00:46,194 --> 00:00:50,034
music app using web technologies, and
in doing so, I've been following down
16
00:00:50,034 --> 00:00:51,984
the rabbit hole of local-first software.
17
00:00:52,494 --> 00:00:55,374
This podcast is your invitation
to join me on that journey.
18
00:00:56,154 --> 00:00:58,924
In this episode, I'm
speaking to Sujay Jayakar.
19
00:00:59,439 --> 00:01:02,319
Co-founder of Convex and
Early Engineer at Dropbox.
20
00:01:02,739 --> 00:01:06,609
In this conversation, Sujay shares
the story on how the Sync Engine
21
00:01:06,669 --> 00:01:11,169
powering Dropbox was built initially
and later redesigned to address all
22
00:01:11,169 --> 00:01:13,209
sorts of distributed systems problems.
23
00:01:13,689 --> 00:01:18,999
Before getting started, also a big thank
you to Jazz for supporting this podcast.
24
00:01:19,299 --> 00:01:21,099
And now my interview with Sujay.
25
00:01:22,211 --> 00:01:23,051
Hey, Sujay.
26
00:01:23,081 --> 00:01:24,701
So nice to have you on the show.
27
00:01:24,701 --> 00:01:25,421
How are you doing?
28
00:01:25,901 --> 00:01:26,501
Doing great.
29
00:01:26,506 --> 00:01:26,736
Great.
30
00:01:26,981 --> 00:01:27,911
Really happy to be here.
31
00:01:28,361 --> 00:01:30,461
I'm super excited to have you on the show.
32
00:01:30,491 --> 00:01:35,244
I've been using your work really
since over a decade at this point
33
00:01:35,244 --> 00:01:39,351
when I was really getting into
using computers productively.
34
00:01:39,681 --> 00:01:45,124
And we just the other time had another
really interesting guest, Seph Gentle on
35
00:01:45,124 --> 00:01:50,534
the podcast, who has worked on a really
fascinating tool, called Google Wave
36
00:01:50,534 --> 00:01:52,904
back then that had a big impact on me.
37
00:01:53,204 --> 00:01:56,264
And you've been working on another
technology that had a big impact
38
00:01:56,264 --> 00:02:01,351
on me, which is Dropbox and still
has a very positive impact on me.
39
00:02:01,531 --> 00:02:05,431
That was all the way back then
over 10 years ago in 2014.
40
00:02:05,731 --> 00:02:11,181
I don't think I need to explain to
the audience what Dropbox is, but, I
41
00:02:11,181 --> 00:02:15,441
want to hear it from you, like what
led you to join Dropbox, I think very
42
00:02:15,441 --> 00:02:19,851
early on and just hearing a little
bit just embedded in your personal
43
00:02:20,001 --> 00:02:24,231
context when you joined it, and then
we're gonna go dive really deep into
44
00:02:24,231 --> 00:02:27,201
all things syncing related, et cetera.
45
00:02:27,201 --> 00:02:27,951
How does that sound?
46
00:02:28,731 --> 00:02:29,721
Yeah, that sounds great.
47
00:02:30,021 --> 00:02:31,761
It's actually a really funny story.
48
00:02:31,803 --> 00:02:34,533
my career here in
technology started in 2012.
49
00:02:34,853 --> 00:02:37,883
I was actually studying mathematics.
50
00:02:37,883 --> 00:02:44,243
I was going to go work at the NSA doing
cryptography, and I was born in India.
51
00:02:44,408 --> 00:02:48,273
but I'm a naturalized citizen for
the United States, and, you have to
52
00:02:48,273 --> 00:02:52,343
be, have security clearance to go do
these types of cryptography things.
53
00:02:52,993 --> 00:02:58,063
And you know, my clearance kept
on dragging on and on and on and
54
00:02:58,223 --> 00:03:01,873
they like interviewed my roommates
and apparently just a very sketchy
55
00:03:01,873 --> 00:03:06,113
guy so I had an offer to go work
there, but it kept on dragging on.
56
00:03:06,323 --> 00:03:10,543
And then my roommate, at the time was a
computer science major who wanted some
57
00:03:10,833 --> 00:03:15,459
like someone to go with him to the career
fair and, just started chatting with the
58
00:03:15,459 --> 00:03:18,909
Dropbox people and you know, it's about
like a hundred people around that time.
59
00:03:19,329 --> 00:03:23,786
And, chatting turned into hanging out
at dinner, turned into interviewing
60
00:03:23,786 --> 00:03:26,906
and being a math person, I did
my interviews all in Haskell and
61
00:03:26,906 --> 00:03:28,466
didn't know any real programming.
62
00:03:28,909 --> 00:03:33,619
and then yeah, that turned into doing an
internship, dropping out of undergrad and.
63
00:03:34,090 --> 00:03:35,391
just following the dream.
64
00:03:35,391 --> 00:03:38,841
And so I worked on, at Dropbox,
I worked on a bunch of things.
65
00:03:38,841 --> 00:03:40,971
I started off working on
our, like, growth team.
66
00:03:40,971 --> 00:03:43,761
So I did a lot of like email system.
67
00:03:43,761 --> 00:03:47,051
Like I did this, I worked on this thing
called the space raise, like a promotion.
68
00:03:47,361 --> 00:03:48,771
Oh, I remember that.
69
00:03:48,771 --> 00:03:49,161
Yes.
70
00:03:49,161 --> 00:03:53,631
I think I've, I've earned quite a lot
of like free storage, which I think
71
00:03:53,631 --> 00:03:55,701
over the time has like gone down.
72
00:03:56,001 --> 00:03:58,641
But that was a very smart
and effective mechanism.
73
00:03:58,641 --> 00:04:01,371
I surely invited all my friends back then.
74
00:04:01,371 --> 00:04:05,901
I couldn't afford a premium plan
being a broke student that worked.
75
00:04:07,381 --> 00:04:11,381
And then from there worked on
the sync engine for some time.
76
00:04:11,411 --> 00:04:15,961
And then right now I'm the co-founder and
chief scientist of a startup called Convex
77
00:04:16,141 --> 00:04:20,251
and my three co-founders and I met working
on this project called Magic Pocket,
78
00:04:20,251 --> 00:04:25,474
where Dropbox stores hundreds of petabytes
now exabytes of files, for users.
79
00:04:25,474 --> 00:04:26,854
And we used to do that in S3.
80
00:04:27,064 --> 00:04:32,284
And so the three of us worked together on
a team to build Amazon S3, but in-house
81
00:04:32,284 --> 00:04:33,994
and migrate all of the data over.
82
00:04:34,364 --> 00:04:39,116
so we did that for a few years and then,
Worked on rewriting the entirety of
83
00:04:39,116 --> 00:04:42,866
Dropbox, the sync engine, the thing that
runs on all of our desktop computers.
84
00:04:43,139 --> 00:04:46,919
we rewrote it to be really correct
and scalable and very flexible.
85
00:04:47,279 --> 00:04:48,369
and shipped that.
86
00:04:48,650 --> 00:04:52,673
after that left Dropbox in 2020 I
was trying to decide if I wanted
87
00:04:52,673 --> 00:04:54,353
to get back to academics or not.
88
00:04:54,353 --> 00:04:59,513
So I did some research in networking
and then decided to start Convex in 2021.
89
00:04:59,843 --> 00:05:03,531
Certainly curious, which sort of
research has had your interest the
90
00:05:03,531 --> 00:05:07,611
most in this sort of transitionary
per period, but maybe we stash that
91
00:05:07,672 --> 00:05:11,494
for a moment and go back to the
beginning when you joined Dropbox.
92
00:05:11,805 --> 00:05:15,451
you mentioned there were around a
hundred people working there currently.
93
00:05:15,728 --> 00:05:20,078
how do I need to imagine the technology
behind Dropbox at this point?
94
00:05:20,378 --> 00:05:26,714
it clearly started all out with like,
desktop focused like daemon project,
95
00:05:27,054 --> 00:05:33,204
like daemon process that's running on your
machine somehow keeps track of the files
96
00:05:33,294 --> 00:05:36,774
on your system and then applies the magic.
97
00:05:37,044 --> 00:05:43,138
So explain to me how things worked
back then and what was it like to
98
00:05:43,138 --> 00:05:46,258
work at Dropbox when there were
around about a hundred people.
99
00:05:46,888 --> 00:05:49,678
Yeah, I mean, it was
pretty magical, right?
100
00:05:49,678 --> 00:05:54,058
Because the company had, I think gotten
so many things right on the product side
101
00:05:54,058 --> 00:05:55,968
and then those showed up in technology.
102
00:05:55,968 --> 00:06:00,238
But just this feeling of like Dropbox
being this product that just worked right?
103
00:06:00,238 --> 00:06:01,468
It was for everyone.
104
00:06:01,738 --> 00:06:05,758
It was not just for technologists, but
anyone should be able, anyone who's
105
00:06:05,758 --> 00:06:09,478
comfortable using a computer should
be able to install Dropbox and have
106
00:06:09,538 --> 00:06:12,118
a folder of theirs become magical.
107
00:06:12,508 --> 00:06:15,748
And without understanding anything
about how it works, they should
108
00:06:15,748 --> 00:06:19,198
just think of it as like an
extension of what they know already.
109
00:06:19,468 --> 00:06:19,708
yeah.
110
00:06:19,708 --> 00:06:22,948
And so like the ways that that showed
up I think were really interesting.
111
00:06:22,948 --> 00:06:26,048
At the time there was a very strong
culture of like reverse engineering.
112
00:06:26,678 --> 00:06:29,998
So to have this daemon that runs locally.
113
00:06:30,238 --> 00:06:34,348
You know, there was one of the amazing
early moments in Dropbox was that,
114
00:06:34,564 --> 00:06:38,614
if like you open up finder or explore
and you have the overlays on it.
115
00:06:39,094 --> 00:06:43,294
Like that used to be done by
like attaching to the finder
116
00:06:43,294 --> 00:06:45,124
process and injecting code into it
117
00:06:47,854 --> 00:06:51,994
and to the point where, uh, when some
folks had gone to talk to Apple at the
118
00:06:51,994 --> 00:06:57,364
time and about like working with the
file system and everything like the,
119
00:06:57,874 --> 00:07:02,584
there were teams at Apple that asked
Dropbox, how did you do that in Finder?
120
00:07:05,374 --> 00:07:08,674
So you wanted to offer the
most native experience.
121
00:07:08,674 --> 00:07:11,044
There weren't the necessary APIs for that.
122
00:07:11,314 --> 00:07:12,724
And so you just made it happen.
123
00:07:12,754 --> 00:07:13,444
That's amazing.
124
00:07:13,449 --> 00:07:13,459
Yeah.
125
00:07:14,164 --> 00:07:14,464
Yeah.
126
00:07:14,804 --> 00:07:19,744
And so that that idea of like, how do
you create the best user experience,
127
00:07:19,744 --> 00:07:26,464
something that you know, for the purpose
of making non-technical users feel very
128
00:07:26,464 --> 00:07:28,654
confident and feel very safe using it.
129
00:07:28,834 --> 00:07:32,584
That was another, I think, really
deep like company value of like
130
00:07:32,584 --> 00:07:36,544
being worthy of trust and taking
people's files very seriously.
131
00:07:36,604 --> 00:07:39,544
You know, I like remember having a
friend who was in residency at the
132
00:07:39,544 --> 00:07:44,314
time and he was telling me that he
keeps all of his, like some of his non
133
00:07:44,314 --> 00:07:50,014
HIPAA stuff, but like his things that
he looks at on Dropbox and you know,
134
00:07:50,014 --> 00:07:51,754
pulls him up and he's consulting 'em.
135
00:07:51,754 --> 00:07:54,374
And there's a part of me which
is terrified by that, right?
136
00:07:54,374 --> 00:07:57,934
Like we think of software as
something where like throwing a 500
137
00:07:57,934 --> 00:07:59,554
error is fine every once in a while.
138
00:08:00,004 --> 00:08:03,054
And a Dropbox that was there was
a culture of making users feel
139
00:08:03,054 --> 00:08:04,284
like they could really trust us.
140
00:08:04,314 --> 00:08:08,274
And then that showed up for things
like making sure that, like when
141
00:08:08,274 --> 00:08:11,604
we give feedback to users, if we
put that green overlay in finder.
142
00:08:12,189 --> 00:08:16,689
They know that no matter what happens,
they could throw their laptop in a pool.
143
00:08:16,689 --> 00:08:20,739
They could like they, anything could
happen and their files are safe.
144
00:08:20,869 --> 00:08:24,609
Like if their house burns down, they
don't have to worry about that thing.
145
00:08:24,879 --> 00:08:29,469
And that's like all of that reverse
engineering and all of the emphasis
146
00:08:29,469 --> 00:08:31,269
on correctness and durability.
147
00:08:31,479 --> 00:08:34,209
It was all in service of that feeling,
which I think was really cool.
148
00:08:34,763 --> 00:08:38,273
so on the engineering side, at the
time it was like in hyper growth mode.
149
00:08:38,273 --> 00:08:40,793
So they had a Python desktop client.
150
00:08:40,973 --> 00:08:43,433
Almost all of Dropbox was
in Python at the time.
151
00:08:43,793 --> 00:08:49,943
And so there's a pre my py, like big
rapidly changing desktop client that
152
00:08:50,283 --> 00:08:53,693
needed to support Mac, windows and Linux
and all these different file systems.
153
00:08:53,963 --> 00:08:58,313
and then on the server, it was like we
had one big server called Meta Server,
154
00:08:58,646 --> 00:08:59,936
meta, I think was from metadata.
155
00:09:00,243 --> 00:09:03,513
and that like ran almost all of Dropbox.
156
00:09:03,573 --> 00:09:06,223
We stored the metadata in MySQL.
157
00:09:06,673 --> 00:09:11,463
The files were stored in S3, and then
we had a separate notification server
158
00:09:11,463 --> 00:09:13,713
for managing pushes and things like that.
159
00:09:13,923 --> 00:09:17,433
And so it was like kind of classic
architecture and like reach was
160
00:09:17,463 --> 00:09:20,403
starting to reach the limits of
its scaling even at that time.
161
00:09:20,913 --> 00:09:24,106
And, those were a lot of the things
we worked on over the next 10 years.
162
00:09:24,616 --> 00:09:25,126
Wow.
163
00:09:25,366 --> 00:09:27,766
So was the server also written in Python?
164
00:09:27,766 --> 00:09:29,566
So it was all one big python shop.
165
00:09:30,076 --> 00:09:30,436
Yeah.
166
00:09:30,886 --> 00:09:32,836
And the server was all written in Python.
167
00:09:33,373 --> 00:09:39,043
we, had some pretty funny bugs
that were due to it's kind of
168
00:09:39,043 --> 00:09:40,213
crazy to think about it now.
169
00:09:40,213 --> 00:09:44,743
You know, we, you working in TypeScript
and full time and to think of, like
170
00:09:44,743 --> 00:09:48,433
back in the day we just had these like
hundreds of thousands, millions of lines
171
00:09:48,433 --> 00:09:54,383
of code with no type safety and with
all types of crazy meta programming and
172
00:09:55,053 --> 00:09:57,553
decorators and meta classes and stuff.
173
00:09:57,553 --> 00:10:00,193
And yeah, so there was a, it was
all in Python when I showed up.
174
00:10:00,235 --> 00:10:04,453
it was not all in Python and not all in
one big monolithic service when I left.
175
00:10:04,814 --> 00:10:09,644
So you mentioning joining when there
were around a hundred people and you
176
00:10:09,644 --> 00:10:14,894
probably already at this point had
like multitudes more in terms of users.
177
00:10:15,274 --> 00:10:21,004
Being in hypergrowth, it is sort of this
race against time where you only have
178
00:10:21,004 --> 00:10:26,344
so much time to work on something, but
growth may be outrun you already and
179
00:10:26,344 --> 00:10:28,624
things are already starting to break.
180
00:10:28,624 --> 00:10:33,514
Or You know like, okay, if things
gonna grow like this, this system will
181
00:10:33,514 --> 00:10:36,508
break and it's gonna be pretty bad.
182
00:10:36,808 --> 00:10:42,171
So tell me more about how you were
dealing with like the constant r
183
00:10:42,321 --> 00:10:48,778
race against time to rebuild systems,
redesign systems, putting out fires.
184
00:10:49,018 --> 00:10:49,948
What was that like?
185
00:10:50,224 --> 00:10:53,374
Yeah, and I think there's like kind of
an interesting place to take this on.
186
00:10:53,374 --> 00:10:56,584
I think like the normal things
were on scale right there.
187
00:10:56,584 --> 00:10:57,274
Those were like.
188
00:10:57,619 --> 00:11:00,416
One, kinda class of problems of
being able to handle the load.
189
00:11:00,626 --> 00:11:04,849
But I think one kind of really
interesting, dimension of this that led
190
00:11:04,849 --> 00:11:09,829
to our decision to start rewriting all
of the sync engine in 2016 was actually
191
00:11:09,829 --> 00:11:11,749
just like customer debugging load.
192
00:11:12,619 --> 00:11:17,449
You know, we have we had hundreds of
millions of active users and they were
193
00:11:17,539 --> 00:11:20,149
using Dropbox in all types of crazy ways.
194
00:11:20,389 --> 00:11:24,019
Like one of the stories is someone
was using Dropbox with like, I think
195
00:11:24,019 --> 00:11:27,559
it was running on some, I don't know
if it was like a raspberry pie or
196
00:11:27,559 --> 00:11:28,849
something, something on his tractor.
197
00:11:28,879 --> 00:11:32,749
Like the guy ran a farm and he
was using Dropbox to sink like
198
00:11:32,749 --> 00:11:35,089
pads in text files to his tractor.
199
00:11:35,533 --> 00:11:37,633
And I might be getting some
of the details wrong, but
200
00:11:37,633 --> 00:11:38,353
it's something like that.
201
00:11:38,353 --> 00:11:43,243
And so people would just use Dropbox
in all types of crazy ways on crazy
202
00:11:43,243 --> 00:11:47,913
file systems with kernel modules
running that are messing things around
203
00:11:47,913 --> 00:11:52,251
or so I think, You know, in terms of
getting ahead of scale, I think we found
204
00:11:52,251 --> 00:11:58,644
ourselves around 2015, 2016, in the
place where for the syn engine on the
205
00:11:58,644 --> 00:12:03,864
desktop client, the entire team just
spent all of its time debugging issues.
206
00:12:04,644 --> 00:12:08,934
We had this principle of like anything
that's possible, anything that a
207
00:12:08,934 --> 00:12:13,254
protocol allows, anything that some
threading race condition that's
208
00:12:13,254 --> 00:12:15,864
theoretically possible will be possible.
209
00:12:16,404 --> 00:12:17,934
And then we would see it, right?
210
00:12:17,934 --> 00:12:20,514
Like users would write in
saying, my files aren't sinking.
211
00:12:20,814 --> 00:12:24,414
And then we would look at it and we would
spend months debugging each one of these
212
00:12:24,414 --> 00:12:30,218
issues and trying to read the tea leaves
from traces and reports and reproductions.
213
00:12:30,218 --> 00:12:33,878
And it'll be like, oh they
mounted this file system over here
214
00:12:33,878 --> 00:12:36,278
and then this one and this one
are in a different file system.
215
00:12:36,278 --> 00:12:40,188
So moving the file actually did
a copy, but then the X adders
216
00:12:40,188 --> 00:12:42,138
were in, preserved this and that.
217
00:12:42,478 --> 00:12:46,168
You know, in terms of that theme of like
getting ahead of scale, like I think there
218
00:12:46,168 --> 00:12:51,238
was first this realization that like the
set of possible things that can happen in
219
00:12:51,238 --> 00:12:54,508
the system is just astronomically large.
220
00:12:54,598 --> 00:12:57,298
And all of them will happen
if they're allowed to.
221
00:12:57,718 --> 00:13:01,498
And we do not have, no matter
how much like incremental time
222
00:13:01,498 --> 00:13:04,798
we put into debugging things, we
will never be able to keep up.
223
00:13:05,128 --> 00:13:08,188
And the cost of doing that is
that the entire team is working
224
00:13:08,188 --> 00:13:09,628
on maintenance like this.
225
00:13:09,628 --> 00:13:11,098
We couldn't build any new features.
226
00:13:11,578 --> 00:13:15,958
So I think that was a motivation then for
the rewrite to is can we find like points
227
00:13:15,958 --> 00:13:20,758
of leverage where if we just invest a
little bit in technology upfront, like by
228
00:13:20,758 --> 00:13:25,768
architecting things a particular way, can
we just eliminate a much bigger set of
229
00:13:25,768 --> 00:13:29,638
potential work from debugging and working
with customers and stuff like that.
230
00:13:29,974 --> 00:13:33,974
So maybe this is a good time
to take a step back and try to
231
00:13:33,974 --> 00:13:38,354
better understand what was Dropbox
sync Engine actually back then?
232
00:13:38,654 --> 00:13:45,041
So from just thinking about it through
like a user's perspective, I have maybe
233
00:13:45,041 --> 00:13:48,094
two computers, and I have files over here.
234
00:13:48,094 --> 00:13:53,179
I. I want to make sure that I have the
files synced over from here to here.
235
00:13:53,569 --> 00:13:59,299
So I could now think about this as
sort of like a Git style, approach.
236
00:13:59,539 --> 00:14:01,219
Maybe there's other ways as well.
237
00:14:01,489 --> 00:14:05,329
walk me through sort of like through the
solution space, how this could have been
238
00:14:05,389 --> 00:14:07,459
approached and how was it approached?
239
00:14:07,592 --> 00:14:12,032
is there some sort of like diffing
involved between different file states
240
00:14:12,032 --> 00:14:14,222
over time, those are being synced around.
241
00:14:14,462 --> 00:14:17,792
Do you sync around the
actual file content itself?
242
00:14:18,062 --> 00:14:19,142
Help me to understand.
243
00:14:19,142 --> 00:14:24,752
Building a mental model, what does it mean
back then for the sync engine to work?
244
00:14:25,187 --> 00:14:25,697
Yeah.
245
00:14:25,757 --> 00:14:26,087
Yeah.
246
00:14:26,207 --> 00:14:28,427
It's a super interesting question, right?
247
00:14:28,427 --> 00:14:31,517
Because I think like you're saying,
there's so many different paths one
248
00:14:31,517 --> 00:14:34,667
can take and it's, I think one of
those things where like if someone
249
00:14:34,667 --> 00:14:37,307
asks, like design Dropbox in an
interview question, there's like
250
00:14:37,517 --> 00:14:39,767
definitely not one right answer, right?
251
00:14:39,797 --> 00:14:44,417
It's like there are so many trade-offs and
like different forks in the decision tree.
252
00:14:44,777 --> 00:14:48,767
I think one of the first things is
that, so you have your desktop A and you
253
00:14:48,767 --> 00:14:52,547
have your, maybe you have your desktop
and your laptop, and one of the first
254
00:14:52,547 --> 00:14:55,877
decisions for Dropbox is that we would
have a central server in the middle,
255
00:14:56,417 --> 00:15:01,097
that there would be a Dropbox file system
in the middle that Dropbox, the company
256
00:15:01,097 --> 00:15:05,897
ran, and we did that from this trust
perspective, we wanted to say that we
257
00:15:05,897 --> 00:15:10,217
will run this infallibly when you get
that green check mark when it's there.
258
00:15:11,177 --> 00:15:15,077
You know, even if an asteroid destroys
the eastern side of the United
259
00:15:15,077 --> 00:15:17,737
States, like we will have things
replicated in multiple data centers.
260
00:15:18,267 --> 00:15:22,367
And that you know, and then
also it's accessible anywhere
261
00:15:22,367 --> 00:15:23,027
on the internet, right?
262
00:15:23,027 --> 00:15:24,287
You can go to the library.
263
00:15:24,347 --> 00:15:26,897
This is not so common these days but
I remember when I was a student, like,
264
00:15:26,897 --> 00:15:29,747
go to the library, log into Dropbox
and read all your things right?
265
00:15:30,030 --> 00:15:31,800
rather than having to
bring a USB stick around.
266
00:15:32,180 --> 00:15:36,570
And so I think that is the first
decision, but it's not necessary, right?
267
00:15:36,570 --> 00:15:39,450
Like there were plenty of
distributed, entirely peer to
268
00:15:39,450 --> 00:15:42,090
peer file syncing, designs, right?
269
00:15:42,420 --> 00:15:44,700
And so that was the first decision.
270
00:15:44,970 --> 00:15:48,680
And I think the kind of second decision
was that if we imagine our desktop and
271
00:15:48,680 --> 00:15:52,760
our laptop and you have the server in
the middle, the desktop might be on
272
00:15:52,760 --> 00:15:55,760
Windows, the laptop might be on Mac OS.
273
00:15:56,030 --> 00:15:59,360
So I think that decision to
support multiple platforms.
274
00:15:59,705 --> 00:16:01,745
Is like another really interesting one.
275
00:16:02,105 --> 00:16:05,855
This is like where I think Git and
Dropbox can be a little bit different.
276
00:16:06,065 --> 00:16:09,395
And that Git is at the end of
the day quite Linux centric.
277
00:16:09,605 --> 00:16:11,965
It's case sensitive for its file system.
278
00:16:12,275 --> 00:16:15,875
It deals with directories and it
makes particular assumptions about
279
00:16:15,875 --> 00:16:17,195
how directories should behave.
280
00:16:17,555 --> 00:16:19,335
And that was something with Dropbox.
281
00:16:19,335 --> 00:16:22,575
We wanted to be consumer, we wanted
to support everything and we wanted
282
00:16:22,575 --> 00:16:24,225
it to feel very automatic, right?
283
00:16:24,225 --> 00:16:28,095
That like, someone shouldn't have to
understand like what a, like unicode,
284
00:16:28,095 --> 00:16:29,895
normalization disagreement means.
285
00:16:29,895 --> 00:16:30,285
Right?
286
00:16:30,495 --> 00:16:34,275
Where in Git like in really bad settings,
like you might have to understand
287
00:16:34,275 --> 00:16:38,205
that, that you're right, you with an
accent differently on Mac and Windows.
288
00:16:38,732 --> 00:16:40,555
so I think that's the
kind of like next, side.
289
00:16:40,555 --> 00:16:43,645
So then Dropbox has its design
for a file system and it's a
290
00:16:43,645 --> 00:16:47,925
central, it's like the hub and all
those folks are your phone, your.
291
00:16:48,258 --> 00:16:49,925
desktop, your laptop and whatnot.
292
00:16:50,385 --> 00:16:53,288
and then so to kind of get
down to the details a bit more.
293
00:16:53,468 --> 00:16:56,618
So then, yeah, we have a process that
runs on your computer, that's the
294
00:16:56,618 --> 00:17:02,528
Dropbox app, and that watches all of
the files on your file system, and then
295
00:17:02,528 --> 00:17:07,088
it looks at what's happened and then
syncs them up to the Dropbox server.
296
00:17:07,268 --> 00:17:10,448
And then whenever changes happen on
the Dropbox server, it syncs them down.
297
00:17:11,062 --> 00:17:15,112
there's another kind of interesting
decision here on Dropbox by
298
00:17:15,112 --> 00:17:17,632
design was always like a sidecar.
299
00:17:17,632 --> 00:17:20,782
It's always something that just
sits and it looks at your files.
300
00:17:20,782 --> 00:17:23,242
Your files are just regular
files on the file system.
301
00:17:23,602 --> 00:17:28,215
And if Dropbox, the app isn't running,
your files are there and they're safe,
302
00:17:28,215 --> 00:17:32,445
and it's something that you know,
regular apps can just read and write
303
00:17:32,445 --> 00:17:37,795
to, and in some sense like Dropbox
was unintentionally local-first
304
00:17:37,815 --> 00:17:39,315
from that perspective, right?
305
00:17:39,345 --> 00:17:42,465
Because it's saying that no
matter what happens, your data
306
00:17:42,465 --> 00:17:43,725
is just there and you own it.
307
00:17:44,257 --> 00:17:46,957
and you know, there are
other systems, right?
308
00:17:46,957 --> 00:17:52,597
Like if you use NFS a like a network
file system, then if you unmount it or
309
00:17:52,597 --> 00:17:53,987
if you lose connection to the server.
310
00:17:54,657 --> 00:17:58,447
You might not be able to actually open
any files that you have the metadata for.
311
00:17:58,897 --> 00:17:59,227
Right.
312
00:17:59,227 --> 00:18:04,493
And I remember from a user perspective,
the local-first aspect, I really went
313
00:18:04,513 --> 00:18:08,083
through like all the stages where I
had a computer that wasn't connected
314
00:18:08,083 --> 00:18:11,983
to the internet yet, and that at some
point I had an internet connection.
315
00:18:12,313 --> 00:18:16,957
But, files were always where like
everything depended on files.
316
00:18:16,957 --> 00:18:20,647
Like if I didn't have a
file, things wouldn't work.
317
00:18:20,647 --> 00:18:22,127
Everything depended on files.
318
00:18:22,127 --> 00:18:26,627
There were barely websites that
where you could do meaningful things.
319
00:18:26,947 --> 00:18:30,130
certainly web apps
weren't very common yet.
320
00:18:30,640 --> 00:18:35,410
And then Dropbox made everything
seamlessly work together.
321
00:18:35,950 --> 00:18:41,140
And then when web apps and SaaS
software more came along, I was a
322
00:18:41,140 --> 00:18:43,240
bit confused because I felt Okay.
323
00:18:43,240 --> 00:18:48,639
I t gives me some collaboration, but seems
to be a different kind of collaboration
324
00:18:48,639 --> 00:18:50,499
since I had collaboration before.
325
00:18:50,889 --> 00:18:56,085
But I also understood the limitations
of, when I'm working on the same doc
326
00:18:56,085 --> 00:19:00,762
file, through Dropbox, which gets
sort of like the first copy, second
327
00:19:00,762 --> 00:19:05,729
copy, third copy, and now I need
to somehow manually reconcile that.
328
00:19:05,789 --> 00:19:08,519
And when I saw Google
Docs for the first time.
329
00:19:09,149 --> 00:19:14,609
That was really like a revelation because,
oh, now we can do this at the same time.
330
00:19:14,609 --> 00:19:19,079
But at the same while I saw that,
I still remember the feeling
331
00:19:19,079 --> 00:19:20,969
where, but where are my files?
332
00:19:20,969 --> 00:19:22,409
This is my stuff now.
333
00:19:22,409 --> 00:19:23,459
Where, where is it?
334
00:19:23,909 --> 00:19:29,899
And that trust that you've mentioned
with Dropbox, I felt like I lost some,
335
00:19:30,109 --> 00:19:35,549
some control here and it required a
lot of trust, in those tools that I
336
00:19:35,549 --> 00:19:37,529
started now step by step, embracing.
337
00:19:37,559 --> 00:19:41,279
And frankly, I think a lot of those tools
didn't deserve my trust in hindsight.
338
00:19:41,879 --> 00:19:48,254
I still feel like we've lost something
by no longer being able to like call
339
00:19:48,254 --> 00:19:50,624
the foundation our own in a way.
340
00:19:50,954 --> 00:19:54,764
And I'm still hoping that we kind of
find the best of both worlds where
341
00:19:54,764 --> 00:19:58,634
we get that seamless collaboration
that we now take for granted.
342
00:19:58,634 --> 00:20:00,344
Something like that Figma gives us.
343
00:20:00,682 --> 00:20:06,080
but also the control and just being
ready for whatever happens, that's
344
00:20:06,080 --> 00:20:08,330
something Dropbox gave us out of the box.
345
00:20:08,617 --> 00:20:12,037
I just wanna share this sort of
like anecdote and like almost
346
00:20:12,037 --> 00:20:15,817
emotional confusion as I walk
through those different stages
347
00:20:16,117 --> 00:20:17,997
of how we work with software.
348
00:20:18,837 --> 00:20:19,257
Totally.
349
00:20:19,257 --> 00:20:22,617
And we've ended up in a place
that's not great in a lot of ways.
350
00:20:22,617 --> 00:20:22,977
Right.
351
00:20:22,977 --> 00:20:27,867
And I think you know, I think part
of the sad thing, and maybe from
352
00:20:27,897 --> 00:20:32,907
even like an operating systems design
perspective is that I feel like files
353
00:20:32,907 --> 00:20:35,007
have lots of design decisions that are.
354
00:20:35,472 --> 00:20:36,672
Packaged up together.
355
00:20:36,972 --> 00:20:39,972
You know, like one of the amazing
things about files is that
356
00:20:39,972 --> 00:20:41,322
they're self-contained, right?
357
00:20:41,502 --> 00:20:44,712
Like on Google, I don't know what
Google's backend looks like for Google
358
00:20:44,712 --> 00:20:49,322
Docs, but they probably have like all
of the metadata and pieces of the data
359
00:20:49,322 --> 00:20:53,622
spread and different rows in a database
and different things in an object store.
360
00:20:53,922 --> 00:20:57,192
And just even thinking about like
the physical implementation of that
361
00:20:57,192 --> 00:21:00,522
data, it's like scattered around
probably a bunch of servers, right?
362
00:21:00,522 --> 00:21:01,842
Maybe in different data centers.
363
00:21:02,172 --> 00:21:05,712
And there's something really nice
about a file where a file is just
364
00:21:05,712 --> 00:21:08,292
like a piece of state, right?
365
00:21:08,292 --> 00:21:09,552
That is just self-contained.
366
00:21:09,912 --> 00:21:13,632
And I think the thing that I think
is one of the things I think is very
367
00:21:13,632 --> 00:21:18,042
unfortunate is like from a operating
systems perspective is that that decision
368
00:21:18,042 --> 00:21:24,312
then has also been coupled with a very
anemic API like with files, they're just
369
00:21:24,342 --> 00:21:30,072
sequences of bytes that can be read
and written to and impended and there's
370
00:21:30,072 --> 00:21:31,992
no additional structure beyond that.
371
00:21:32,532 --> 00:21:33,452
And I think like.
372
00:21:33,782 --> 00:21:37,882
Folks the way that things have
evolved is that we've given up on, too
373
00:21:37,902 --> 00:21:41,082
have more structure, too make things
like Google Docs, too be able to
374
00:21:41,082 --> 00:21:46,032
reconcile and have collaboration and
interpret things more than just bites.
375
00:21:46,302 --> 00:21:49,092
We've also given up this ability
to package things together.
376
00:21:49,585 --> 00:21:53,873
Mac os had like a very kind of
baby step in this direction with I
377
00:21:53,873 --> 00:21:54,803
think they're called bundles.
378
00:21:54,833 --> 00:21:57,863
Like the things where like if
you have like your.app, they're
379
00:21:57,863 --> 00:21:59,363
actually a zip file, right?
380
00:21:59,610 --> 00:22:03,510
And there's all types of ways, all
types of brain damage for how this
381
00:22:03,510 --> 00:22:05,130
like, doesn't actually work well.
382
00:22:05,130 --> 00:22:05,670
You know?
383
00:22:05,670 --> 00:22:07,740
But the idea is kind
of interesting, right?
384
00:22:07,740 --> 00:22:10,950
It's like what if files had some
more structure and what if you still
385
00:22:10,950 --> 00:22:15,300
considered something, an atomic unit,
but then it had pieces of it that
386
00:22:15,300 --> 00:22:17,100
weren't just uninterpretable bites.
387
00:22:17,400 --> 00:22:20,705
And I think that's like, the
path dependent, way that we've
388
00:22:20,705 --> 00:22:21,665
ended up where we are today.
389
00:22:22,295 --> 00:22:23,075
That makes sense.
390
00:22:23,165 --> 00:22:28,525
So going back to the sync engine
implementation did the Python process
391
00:22:28,525 --> 00:22:34,195
back in the day did that mostly index
all of the files and then actually
392
00:22:34,218 --> 00:22:39,394
send across the actual bites probably
in some chunks, across the wire?
393
00:22:39,394 --> 00:22:45,468
Or was there some more intelligent
and diffing happening client side
394
00:22:45,588 --> 00:22:50,601
that you would only send kind of the
changes across the wire and how do I
395
00:22:50,601 --> 00:22:55,341
need to think about what is a change
when I'm dealing with like a ton of
396
00:22:55,341 --> 00:22:57,561
bites before and a ton of bites after?
397
00:22:57,924 --> 00:22:58,074
Yeah.
398
00:22:58,074 --> 00:22:59,364
It's really, really good questions.
399
00:22:59,364 --> 00:23:03,784
I think maybe like the first
starting point is that like files
400
00:23:03,784 --> 00:23:07,464
in Dropbox were stored, just broken
up into four megabyte chunks.
401
00:23:07,734 --> 00:23:10,764
And that was just a decision at the
very beginning to pick some size.
402
00:23:11,394 --> 00:23:15,384
And on the server, the way that those
chunks were stored is that they,
403
00:23:15,414 --> 00:23:20,364
each four megabyte chunk was stored
by key to by its shot to 56 hash.
404
00:23:20,764 --> 00:23:22,834
So we would assume that
those are globally unique.
405
00:23:23,074 --> 00:23:27,514
So then if you had the same copy
of a bunch of file, or you had
406
00:23:27,514 --> 00:23:30,964
a file copied many times in your
Dropbox, we would only store it once.
407
00:23:31,504 --> 00:23:34,654
And that would just happen
organically because we would say
408
00:23:34,654 --> 00:23:38,974
like, okay, I looked at this file,
it has three chunks A, B, and C.
409
00:23:39,364 --> 00:23:42,964
And then the client would ask the
server, do you have A, B, and C?
410
00:23:43,294 --> 00:23:47,734
Like the server would say, yes, I have
B and C already, please send A, then we
411
00:23:47,734 --> 00:23:52,098
would upload A. so there was already like
at the file level there was this like
412
00:23:52,098 --> 00:23:55,728
kind of very coarse grained Delta sync.
413
00:23:56,071 --> 00:23:57,708
at the four megabyte chunk layer.
414
00:23:58,231 --> 00:24:01,748
and then the kind of, it's funny,
these things evolve, right?
415
00:24:01,748 --> 00:24:05,228
Like then the next thing we layered on
up top was that in that setting where
416
00:24:05,228 --> 00:24:09,398
you decided B and C were there already
and you needed to upload a then with
417
00:24:09,428 --> 00:24:15,308
a, the desktop client could use rsync
to know that there was previously a
418
00:24:15,308 --> 00:24:19,568
prime and do a patch between the two
and then send just those contents.
419
00:24:19,918 --> 00:24:23,578
the kind of thing that was pretty
interesting is that a lot of the content
420
00:24:23,578 --> 00:24:29,588
on Dropbox was very incompressible
stuff like video, images, so the
421
00:24:29,784 --> 00:24:34,314
benefits of deduplication both
across users or even within a user.
422
00:24:34,524 --> 00:24:39,984
And the benefit of like rsync was not
actually as much as one might think,
423
00:24:40,434 --> 00:24:43,824
at least from the like, terms of
bandwidth going through the system.
424
00:24:43,854 --> 00:24:47,534
It wasn't that reductive because a lot of
this content was just kind of unique and
425
00:24:48,104 --> 00:24:50,364
not getting updated in small patches.
426
00:24:51,429 --> 00:24:56,559
And on your server side, blob store, now
that you had those hashes for those four
427
00:24:56,559 --> 00:25:02,619
megabyte chunks, that also means that you
could probably deduplicate some content
428
00:25:02,679 --> 00:25:08,979
across users, which makes me think of
all sorts of other implications of that.
429
00:25:09,069 --> 00:25:12,369
When do you know it's
safe to let go of a junk?
430
00:25:12,736 --> 00:25:16,876
do you also now know that, you
could kind of go backwards and
431
00:25:16,876 --> 00:25:20,506
say like, oh, from this hash, we
know this is sensitive content.
432
00:25:20,986 --> 00:25:25,993
And have some further implications
for, whatever we don't need to go too
433
00:25:25,993 --> 00:25:28,659
much into depth on that now, but, yeah.
434
00:25:28,659 --> 00:25:32,259
I'm curious like how you thought
of those design decisions and
435
00:25:32,259 --> 00:25:33,549
the possible implications.
436
00:25:34,119 --> 00:25:34,479
Yeah.
437
00:25:34,539 --> 00:25:38,289
Yeah, for the first one yeah,
like distributed garbage collection
438
00:25:38,289 --> 00:25:39,759
was a very hard problem for us.
439
00:25:39,819 --> 00:25:44,349
We called it vacuuming and in terms
of making Dropbox economics work out
440
00:25:44,349 --> 00:25:48,963
of, like, when we couldn't afford to
keep a lot of content that was deleted
441
00:25:48,963 --> 00:25:50,283
that we couldn't charge users for.
442
00:25:50,583 --> 00:25:54,453
So that was you know, there's all
additional complexity where different
443
00:25:54,453 --> 00:25:58,389
users would have like the ability to
restore for different periods of time.
444
00:25:58,509 --> 00:26:01,689
So we would say like, anything
that's deleted, it doesn't actually
445
00:26:01,689 --> 00:26:05,199
get deleted for 30 days or a year
or whatnot based on their plan.
446
00:26:05,583 --> 00:26:09,813
so then, yeah, like having to do
this like big distributed mark and
447
00:26:09,813 --> 00:26:14,043
sweep garbage collection algorithm
across hundreds of petabytes,
448
00:26:14,043 --> 00:26:18,243
exabytes of content that was something
that we had to get pretty good at.
449
00:26:18,243 --> 00:26:23,006
And when we designed Magic Pocket,
where we, implemented S3 in-house, we
450
00:26:23,006 --> 00:26:28,226
had specific primitives for making it a
little bit easier to avoid race conditions
451
00:26:28,226 --> 00:26:31,016
where like, if a file was deleted.
452
00:26:31,961 --> 00:26:34,601
And we decided that no
one needed it anymore.
453
00:26:34,631 --> 00:26:38,241
But then just at that point in time,
someone uploads it again, making sure
454
00:26:38,241 --> 00:26:40,421
that we don't accidentally delete it.
455
00:26:40,781 --> 00:26:43,481
So that was like, yeah,
definitely a very tricky problem.
456
00:26:43,531 --> 00:26:48,614
And I think in retrospect this is like
an interesting design exercise, right?
457
00:26:48,614 --> 00:26:52,784
And that if deduplication wasn't actually
that valuable for us, we could have
458
00:26:52,934 --> 00:26:57,464
eliminated a lot of complexity for this
garbage collection by not doing it right.
459
00:26:58,001 --> 00:26:59,671
I think for the second thing, yeah.
460
00:26:59,671 --> 00:27:06,731
So at the beginning when Dropbox started,
if you had a file with A, B and C and you
461
00:27:06,731 --> 00:27:10,831
uploaded it, it would just check, does
A, B and C exist anywhere in Dropbox?
462
00:27:11,351 --> 00:27:16,958
And, that got changed over time
to be does do you as your user
463
00:27:17,348 --> 00:27:18,948
have access to A, B, and C?
464
00:27:19,448 --> 00:27:24,143
And you know, 'cause otherwise you could
use this for all types of purposes, right?
465
00:27:24,143 --> 00:27:27,583
To see if there exists some
content anywhere in Dropbox.
466
00:27:27,613 --> 00:27:32,573
And, that was something where we
would in the case where the user was
467
00:27:32,573 --> 00:27:38,033
uploading A, B, and C, say none of them
were present in their account, we would
468
00:27:38,033 --> 00:27:42,833
actually force them to upload it, incur
the bandwidth for doing so, and then
469
00:27:42,833 --> 00:27:45,173
discard it if B and C existed elsewhere.
470
00:27:46,085 --> 00:27:46,345
Yeah.
471
00:27:46,345 --> 00:27:47,091
Very interesting.
472
00:27:47,121 --> 00:27:50,878
I mean, this would be an interesting
rabbit hole just to go down just the
473
00:27:50,878 --> 00:27:54,658
kind of second order effects of that
design decision, particularly at
474
00:27:54,658 --> 00:27:56,783
the scale and importance of Dropbox.
475
00:27:57,083 --> 00:27:59,213
But maybe we save that for another time.
476
00:27:59,513 --> 00:28:04,359
So going back to the sync engine, now that
we have a better understanding of, how it
477
00:28:04,359 --> 00:28:06,999
worked in that shape and form back then.
478
00:28:07,449 --> 00:28:12,219
You've been already mentioning before,
like as things as usage went through
479
00:28:12,219 --> 00:28:16,813
the roof, all sorts of different
usage scenarios also expanded.
480
00:28:17,268 --> 00:28:22,749
you had all sorts of more esoteric
ways, how you didn't kind of even think
481
00:28:22,809 --> 00:28:25,209
before that it would be used this way.
482
00:28:25,239 --> 00:28:27,369
Now all of that came to light.
483
00:28:28,099 --> 00:28:33,216
I'm curious which sort of, helper
systems you put in place that you could
484
00:28:33,216 --> 00:28:39,446
even have a grasp of what's going on
since a part of the trust that Dropbox
485
00:28:39,586 --> 00:28:44,476
owned or that earned over time, was
probably also related to privacy.
486
00:28:44,716 --> 00:28:49,126
So you, you couldn't just like read
everything that's going on in someone's
487
00:28:49,126 --> 00:28:54,766
system, so you're probably also relying
to some degree on the help of a user
488
00:28:55,036 --> 00:28:57,076
that they like send something over.
489
00:28:57,076 --> 00:28:57,406
Yeah.
490
00:28:57,436 --> 00:29:02,716
Walk me through like the evolution
of that and that you, like as
491
00:29:02,716 --> 00:29:06,376
an engineer, if there's a bug
reproducing that bug is everything.
492
00:29:07,006 --> 00:29:09,316
So walk me through that process.
493
00:29:09,766 --> 00:29:13,306
Yeah, and you know, like we had a very
strict rule, right, where it just,
494
00:29:13,366 --> 00:29:15,316
we do not look at content, right?
495
00:29:15,773 --> 00:29:20,323
and so that was the thing when
debugging issues, the saving grace is
496
00:29:20,323 --> 00:29:22,573
that for most of the issues we saw.
497
00:29:22,923 --> 00:29:28,003
They were more metadata issues around
like sync, not converging or sync, getting
498
00:29:28,003 --> 00:29:32,383
to the client thinking it's in sync
with the server, but them disagreeing.
499
00:29:32,691 --> 00:29:35,799
so we had a few pretty,
yeah, like pretty interesting
500
00:29:35,799 --> 00:29:37,539
supporting algorithms for this.
501
00:29:37,569 --> 00:29:41,769
So one of them was just simple like
hang detection, like making sure, like
502
00:29:41,949 --> 00:29:45,249
if, when should a client reasonably
expect that they are in sync?
503
00:29:45,869 --> 00:29:49,439
And if they're online and if
they've downloaded all the recent
504
00:29:49,439 --> 00:29:53,189
versions and things are getting
stuck, why are they getting stuck?
505
00:29:53,189 --> 00:29:55,649
So are they getting stuck because
they can't read stuff from the
506
00:29:55,649 --> 00:29:57,749
server, either metadata or data?
507
00:29:57,959 --> 00:30:00,509
Are they getting stuck because they
can't write to the file system and
508
00:30:00,509 --> 00:30:01,819
there's some permission errors?
509
00:30:02,079 --> 00:30:06,683
So I think having very fine-grained
classification of that and having the
510
00:30:06,683 --> 00:30:11,653
client do that in a way that's like not
including any private information and
511
00:30:11,653 --> 00:30:14,753
sending that up for reports and then
aggregating that over all of the clients
512
00:30:14,753 --> 00:30:19,643
and being able to classify was a big part
of us being able to get a handle on it.
513
00:30:20,059 --> 00:30:23,699
And I think this is just generally
very useful for these sync engines.
514
00:30:23,996 --> 00:30:27,056
the biggest return on investment we
got was from consistency checkers.
515
00:30:27,676 --> 00:30:32,949
So part of sync is that there's the same
data duplicated in many places, right?
516
00:30:33,219 --> 00:30:36,849
Like, so we had the data that's
on the user's local file system.
517
00:30:37,179 --> 00:30:41,199
We had all of the metadata that we stored
in SQLite or we would store like what
518
00:30:41,199 --> 00:30:42,939
we think should be on the file system.
519
00:30:43,689 --> 00:30:46,569
We would store what the latest
view from the server was.
520
00:30:46,569 --> 00:30:49,509
We would store things that were
in progress, and then we have
521
00:30:49,509 --> 00:30:50,589
what's stored on the server.
522
00:30:50,799 --> 00:30:55,269
And for each one of those like hops, we
would have a consistency checker that
523
00:30:55,269 --> 00:30:57,639
would go and see if those two matched.
524
00:30:57,969 --> 00:31:02,139
And those would, that was like the
highest return on investment we got.
525
00:31:02,139 --> 00:31:05,649
Because before we had that, people
would write in and they would
526
00:31:05,649 --> 00:31:07,179
complain that Dropbox wasn't working.
527
00:31:07,779 --> 00:31:10,509
And until we had these consistency
checkers, we had no idea the
528
00:31:10,509 --> 00:31:13,419
order of magnitude of how
many issues were happening.
529
00:31:13,869 --> 00:31:16,029
And when we started doing
it, we're like, wow.
530
00:31:16,599 --> 00:31:17,379
There's actually a lot.
531
00:31:18,026 --> 00:31:22,886
So a consistency check in this regard
was mostly like a hash over some
532
00:31:22,886 --> 00:31:24,506
packets that you're sending around.
533
00:31:24,866 --> 00:31:30,326
And with that you could verify, okay, up
until like from A to B to C to D, we're
534
00:31:30,326 --> 00:31:35,816
all seeing the same hash, but suddenly
on the hop from D to E, the hash changes.
535
00:31:35,876 --> 00:31:36,266
Ah-huh.
536
00:31:36,296 --> 00:31:37,196
Let's investigate.
537
00:31:37,736 --> 00:31:38,396
Exactly.
538
00:31:38,726 --> 00:31:42,926
And so, and to do that in a way
that's respectful of the users,
539
00:31:42,986 --> 00:31:45,356
even like resources on their system.
540
00:31:45,356 --> 00:31:50,006
Like we wouldn't just go and blast their
CPU and their disc and their network to go
541
00:31:50,006 --> 00:31:51,836
and like turn through a bunch of things.
542
00:31:51,836 --> 00:31:54,896
So we would have like a sampling
process where we like sample a random
543
00:31:54,896 --> 00:31:58,166
path in the tree and the client
and do it the same on the server.
544
00:31:58,463 --> 00:32:02,333
we would have stuff with like Merkle
trees and then when things would diverge,
545
00:32:02,333 --> 00:32:07,643
we would try to see like, is there a way
we can compare on the client and see like
546
00:32:07,643 --> 00:32:12,004
for example one of the kind of really
important, goals for us as an operational
547
00:32:12,004 --> 00:32:14,494
team was to have like the power of zero.
548
00:32:14,764 --> 00:32:17,464
I think it might be from AWS or something.
549
00:32:17,464 --> 00:32:19,294
My co-founder James, has
a really good talk on it.
550
00:32:19,764 --> 00:32:25,704
but we would want to have a metric of
saying that the number of unexplained
551
00:32:25,764 --> 00:32:28,790
inconsistencies is zero and one 'cause.
552
00:32:28,790 --> 00:32:31,730
Then the nice thing right, is that
if it's a zero and it regresses,
553
00:32:31,730 --> 00:32:33,080
you know that it's a regression.
554
00:32:33,350 --> 00:32:38,780
If it's at like fluctuating at like 15
or like a hundred thousand and it kind
555
00:32:38,780 --> 00:32:42,530
of goes up by 5%, it's very hard to know
when evaluating a new release, right?
556
00:32:42,530 --> 00:32:44,390
That like that's actually safe or not.
557
00:32:44,824 --> 00:32:49,204
so then that would mean that whenever we
would have an inconsistency due to a bit
558
00:32:49,204 --> 00:32:55,234
flip, which we would see all the time
on client devices, then we would have to
559
00:32:55,444 --> 00:32:57,454
categorize that and then bucket that out.
560
00:32:57,604 --> 00:32:58,804
So we would have a baseline.
561
00:32:59,659 --> 00:33:03,319
Expectation of how many bit flips there
are across all of the devices on Dropbox.
562
00:33:03,679 --> 00:33:06,589
And we would see that that's
staying consistent or increasing or
563
00:33:06,589 --> 00:33:09,829
decreasing, and that the number of
unexplained things was still at zero.
564
00:33:10,215 --> 00:33:12,885
now let's take those detours
since you got me curious.
565
00:33:13,125 --> 00:33:16,065
Uh, what would cause bit
flips on a local device?
566
00:33:16,602 --> 00:33:20,982
I think a few, few causes, one of them
is just that in the data center, most
567
00:33:20,982 --> 00:33:24,822
memory uses error correction and you have
to pay more for it, usually have to pay
568
00:33:24,822 --> 00:33:26,472
more for a motherboard that supports it.
569
00:33:26,862 --> 00:33:27,672
at least back then.
570
00:33:27,736 --> 00:33:30,532
now like on client
devices we don't have that.
571
00:33:30,602 --> 00:33:34,302
So this is a little bit above
my pay grade for hardware cosmic
572
00:33:34,302 --> 00:33:36,632
rays or thermal noise or whatever.
573
00:33:36,632 --> 00:33:40,002
But memory is much more
resilient in the data center.
574
00:33:40,315 --> 00:33:44,355
I think another is just that, storage
devices are very greatly in quality.
575
00:33:44,415 --> 00:33:49,335
Like your SSDs and your hard drives are
much higher quality inside the data
576
00:33:49,335 --> 00:33:51,495
center than they are on local devices.
577
00:33:51,855 --> 00:33:52,515
And so.
578
00:33:53,160 --> 00:33:54,150
You know, there's that.
579
00:33:54,447 --> 00:33:57,297
it also could be like I had
mentioned that people have all
580
00:33:57,297 --> 00:33:58,797
types of weird configurations.
581
00:33:59,097 --> 00:34:03,387
Like on Mac there are all these
kernel extensions on Windows, there's
582
00:34:03,387 --> 00:34:05,007
all of these mini filter drivers.
583
00:34:05,007 --> 00:34:07,437
There are all these things
that are interposing between
584
00:34:07,827 --> 00:34:11,127
Dropbox, the user space process
and writing to the file system.
585
00:34:11,427 --> 00:34:15,297
And if those have any memory safety
issues where they're corrupting memory
586
00:34:15,387 --> 00:34:19,434
'cause of the written in archaic C
you know, or something that that's
587
00:34:19,454 --> 00:34:20,654
the way things can get corrupted.
588
00:34:20,834 --> 00:34:22,244
I mean, we've seen all types of things.
589
00:34:22,244 --> 00:34:26,709
We've seen network routers get
having corrupting data, but usually
590
00:34:26,924 --> 00:34:28,394
that fails some checksum, right?
591
00:34:28,424 --> 00:34:33,464
Or we've seen even registers on CPUs
being bad where the memory gets replaced
592
00:34:33,614 --> 00:34:38,114
and the memory seems like it's fine, but
then it just turns out the CPU has its
593
00:34:38,114 --> 00:34:40,214
own registers on CHIP that are busted.
594
00:34:40,214 --> 00:34:44,204
And so all of that stuff I
think just can happen at scale.
595
00:34:44,234 --> 00:34:44,624
Right.
596
00:34:45,050 --> 00:34:45,770
that makes sense.
597
00:34:45,770 --> 00:34:51,774
And I'm happy to say that I've hadn't
had yet to worry about flip bits, whether
598
00:34:51,774 --> 00:34:56,824
it's being for storage or other things,
but huge respect to whoever had already
599
00:34:56,824 --> 00:34:59,591
to, tame those parts of the system.
600
00:34:59,951 --> 00:35:05,444
So, you mentioning the consistency check
as probably the biggest lever that you
601
00:35:05,444 --> 00:35:11,324
had to understand which health stage
your sync engine is in the first place.
602
00:35:11,698 --> 00:35:18,928
was this the only kind of metric and
proxy for understanding with how well
603
00:35:18,928 --> 00:35:22,618
the syn system is working or were
there some other aspects that gave
604
00:35:22,618 --> 00:35:25,618
you visibility both macro and micro?
605
00:35:26,071 --> 00:35:30,511
Yeah, I mean, I think this yeah,
the kind of hangs, so like knowing
606
00:35:30,511 --> 00:35:33,991
that something gets to a sync state
and knowing the duration, right?
607
00:35:33,991 --> 00:35:38,514
So the kind of performance of that
was one of our top line metrics.
608
00:35:38,514 --> 00:35:40,474
And the other one was
this consistency check.
609
00:35:40,814 --> 00:35:43,524
And then first specific
like operations, right?
610
00:35:43,524 --> 00:35:47,374
Like uploading a file, like how much
bandwidth are people able to use
611
00:35:47,624 --> 00:35:53,124
because for like, people wanted to
use Dropbox, but, and upload lots,
612
00:35:53,124 --> 00:35:57,324
like huge data, like huge number of
files where each file is really large.
613
00:35:57,594 --> 00:36:01,584
And then they might do it on in
Australia or Japan where they're
614
00:36:01,944 --> 00:36:03,234
far away from a data center.
615
00:36:03,234 --> 00:36:06,774
So latency is high, but bandwidth
is very high too, right?
616
00:36:06,774 --> 00:36:09,914
So making sure that we could
fully saturate their pipes and all
617
00:36:09,914 --> 00:36:12,114
types of stuff with debugging.
618
00:36:12,724 --> 00:36:13,654
Things in the internet, right?
619
00:36:13,654 --> 00:36:16,774
People having really bad
routes to AWS and all that.
620
00:36:16,974 --> 00:36:18,324
so we would track things like that.
621
00:36:18,568 --> 00:36:20,968
I think other than that it was
mostly just the usual quality stuff,
622
00:36:20,968 --> 00:36:25,298
like just exceptions and making
sure that features all work.
623
00:36:25,388 --> 00:36:30,154
I think when we rewrote this system
and we, designed it to be very correct.
624
00:36:30,274 --> 00:36:34,404
We moved a lot of these things into
testing before we would release.
625
00:36:35,024 --> 00:36:38,734
So we this is I think one of the, to
jump ahead a little bit, we designed,
626
00:36:38,794 --> 00:36:44,974
decided to rewrite Dropbox's sync engine
from this big Python code base into Rust.
627
00:36:45,304 --> 00:36:49,294
And one of the specific design decisions
was to make things extremely testable.
628
00:36:49,729 --> 00:36:53,239
So we would have everything be
deterministic on a single thread,
629
00:36:53,509 --> 00:36:56,989
have all of the reads and rights
to the network and file system,
630
00:36:56,989 --> 00:36:59,119
be, through a virtualized API.
631
00:36:59,416 --> 00:37:03,616
So then we could run all of these
simulations of exploring what would
632
00:37:03,616 --> 00:37:08,026
happen if you uploaded a file here and
deleted it concurrently and then had a
633
00:37:08,026 --> 00:37:09,976
network issue that forced you to retry.
634
00:37:10,306 --> 00:37:14,716
And so by simulating all of those in
ci, we would be able to then have very
635
00:37:14,716 --> 00:37:18,466
strong in variance about them that
knowing that like a file should never
636
00:37:18,466 --> 00:37:21,796
get deleted in this case, or that
it should always converge, or things
637
00:37:21,796 --> 00:37:26,326
like the sharing that this file should
never get exposed to this other viewer.
638
00:37:26,904 --> 00:37:31,043
I think like the, having much, like
having stronger guarantees was something
639
00:37:31,043 --> 00:37:36,443
that we only could really do effectively
once we designed the system to make
640
00:37:36,443 --> 00:37:38,093
it easy to test those guarantees.
641
00:37:38,828 --> 00:37:39,188
Right.
642
00:37:39,188 --> 00:37:40,268
That makes a lot of sense.
643
00:37:40,268 --> 00:37:43,568
And I think we're seeing more
and more systems, also in the
644
00:37:43,568 --> 00:37:45,704
database world, embrace this.
645
00:37:45,704 --> 00:37:49,012
I think TigerBeetle is,
is quite popular for that.
646
00:37:49,394 --> 00:37:53,828
I think the folks at Torso are
now also embracing this approach.
647
00:37:54,102 --> 00:37:56,772
I think it goes under the
umbrella of simulation testing.
648
00:37:57,218 --> 00:37:58,448
that sounds very interesting.
649
00:37:58,448 --> 00:38:03,788
Can you explain a little bit more how
maybe in a much smaller program would
650
00:38:03,788 --> 00:38:08,318
this basically be Just that every
assumption and any potential branch,
651
00:38:08,348 --> 00:38:13,958
any sort of side effect thing that might
impact the execution of my program.
652
00:38:13,958 --> 00:38:19,868
Now I need to make explicit and it's
almost like a parameter that I put into
653
00:38:19,868 --> 00:38:25,735
the arguments of my functions and now I
call it under these circumstances, and I
654
00:38:25,735 --> 00:38:31,375
can therefore simulate, oh, if that file
suddenly gives me an unexpected error.
655
00:38:31,675 --> 00:38:33,385
Then this is how we're gonna handle it.
656
00:38:33,865 --> 00:38:34,795
Yeah, exactly.
657
00:38:34,795 --> 00:38:38,845
So it's like and there's techniques
that like the TigerBeetle folks, like
658
00:38:38,845 --> 00:38:42,745
we, we do this at Convex in rust with the
right, like abstractions, there's like
659
00:38:42,745 --> 00:38:45,235
techniques to make it not so awkward.
660
00:38:45,235 --> 00:38:50,815
But yeah, it is like this idea of like,
can you pin all of the non-determinism in
661
00:38:50,815 --> 00:38:54,895
the system can, whether it's like reading
from a random number generator, whether
662
00:38:54,895 --> 00:38:58,765
it's looking at time, whether it's reading
and writing to files or the network.
663
00:38:58,945 --> 00:39:04,425
Can that all be like pulled out so
that in, production it's just using the
664
00:39:04,425 --> 00:39:06,865
random AP or the regular APIs for it.
665
00:39:07,258 --> 00:39:10,558
so there's like for any of these
sync engines, there's a core
666
00:39:10,558 --> 00:39:13,318
of the system which represents
all the sync rules, right?
667
00:39:13,318 --> 00:39:16,198
Like when I get a new file
from the server, what do I do?
668
00:39:16,528 --> 00:39:19,468
You know, if there's a concurrent
edit to this, what do I do?
669
00:39:19,748 --> 00:39:23,953
and that I. Core of the code is often
the part that has the most bugs, right?
670
00:39:23,953 --> 00:39:27,403
It has the, it doesn't think about
some of the corner cases or if
671
00:39:27,403 --> 00:39:30,853
there are errors or needs retries
or doesn't handle concurrency.
672
00:39:30,853 --> 00:39:32,053
It might have race conditions.
673
00:39:32,323 --> 00:39:36,883
So the kind of, I think the core idea
for determination, determin deterministic
674
00:39:36,883 --> 00:39:43,033
simulation testing is to take that core
and just kind of like pull out all of the
675
00:39:43,033 --> 00:39:45,283
non-determinism from it into an interface.
676
00:39:45,403 --> 00:39:49,213
So time randomness, reading and
writing to the network, reading
677
00:39:49,213 --> 00:39:52,753
and writing to the file system, and
making it so that in production,
678
00:39:52,933 --> 00:39:54,703
those are just using the regular APIs.
679
00:39:55,033 --> 00:39:58,873
But in a testing situation,
those can be using mocks.
680
00:39:59,023 --> 00:40:02,383
Like they could be using things
that for a particular test
681
00:40:02,383 --> 00:40:06,253
and wants to test a scenario or
setting it up in a specific way.
682
00:40:06,673 --> 00:40:09,223
Or it could be randomized, right?
683
00:40:09,223 --> 00:40:14,543
Where it might be that reading from
Like time, the test framework might
684
00:40:14,603 --> 00:40:18,923
decide pseudo randomly to advance it
or to keep it at the current time or
685
00:40:18,923 --> 00:40:20,873
might serialize things differently.
686
00:40:21,143 --> 00:40:27,293
And that type of ability to have random
search explore the state space of
687
00:40:27,353 --> 00:40:30,833
all the things that are possible is
just one of those like unreasonably
688
00:40:30,833 --> 00:40:32,813
effective ideas, I think for testing.
689
00:40:33,203 --> 00:40:37,373
And then that like getting a
system to pass that type of
690
00:40:37,373 --> 00:40:38,963
deterministic simulation testing.
691
00:40:39,503 --> 00:40:42,893
It's not at the threshold of having
formal verification, but in our
692
00:40:42,893 --> 00:40:47,457
experience it's pretty close and with
a much, much, smaller amount of work.
693
00:40:48,117 --> 00:40:50,427
And you mentioning
Haskell at the beginning?
694
00:40:50,457 --> 00:40:55,467
I still remember when I, after a a lot of
time having spent writing unit tests in
695
00:40:55,517 --> 00:41:00,017
JavaScript and I, back then, in the other
order, I first had JavaScript and then I
696
00:41:00,017 --> 00:41:04,817
learned Haskell, and then I found quick
test and was quick test, Quick Check.
697
00:41:05,183 --> 00:41:06,113
which one was it?
698
00:41:06,563 --> 00:41:07,493
I think it was Quick check, right?
699
00:41:07,873 --> 00:41:08,383
Well, right.
700
00:41:08,383 --> 00:41:13,424
So I found Quick Check and I could express
sort of like, Hey, this is this type.
701
00:41:13,664 --> 00:41:18,614
It has sort of those aspects to it,
those invariants and then would just
702
00:41:18,614 --> 00:41:20,534
go along and test all of those things.
703
00:41:20,534 --> 00:41:23,564
Like, wait, I never thought
of that, but of course, yes.
704
00:41:23,864 --> 00:41:27,824
And then you combine those and you
would get way too lazy to write unit
705
00:41:27,824 --> 00:41:32,354
tests for the combinatorial explosion
of like all of your different things.
706
00:41:32,354 --> 00:41:36,494
And then you can say, sample it
like that, and like, focus on this.
707
00:41:36,778 --> 00:41:40,958
and so I actually also, started
embracing this practice a lot more in the
708
00:41:40,958 --> 00:41:45,488
TypeScript work that I'm doing through
a great project called Prop Check.
709
00:41:45,994 --> 00:41:52,069
and that is, picking up the same
ideas and for particularly those
710
00:41:52,069 --> 00:41:56,509
sort of scenarios where, okay,
Murphy's Law will come and haunt you.
711
00:41:56,969 --> 00:41:58,829
this is in distributed systems.
712
00:41:58,829 --> 00:42:00,509
That is typically the case.
713
00:42:00,796 --> 00:42:05,623
Building things in such a way where
all the aspects can be, specifically
714
00:42:05,623 --> 00:42:07,873
injected and the, the sweet spot.
715
00:42:07,873 --> 00:42:12,043
If you can do so still in an ergonomic
way, I think that's the way to go.
716
00:42:13,063 --> 00:42:15,373
It's so, so valuable, right?
717
00:42:15,373 --> 00:42:15,643
And yeah.
718
00:42:15,643 --> 00:42:20,323
And yeah, the ability to, for prop tasks,
for quick check for all of these to
719
00:42:20,323 --> 00:42:23,113
also minimize is just magical, right?
720
00:42:23,113 --> 00:42:27,023
Like it comes up with this crazy
counter example and it might be
721
00:42:27,143 --> 00:42:31,693
like a list with 700 elements, but
then is able to shrink it down to
722
00:42:31,693 --> 00:42:33,613
the, like, real core of the bug.
723
00:42:33,913 --> 00:42:35,483
It's magic, right?
724
00:42:35,803 --> 00:42:38,038
And you know, I mean, I think
this is something like, you know.
725
00:42:38,653 --> 00:42:40,453
A totally different theme, right?
726
00:42:40,453 --> 00:42:44,353
Like one thing at Convex we're exploring
a lot is like coding has changed a lot
727
00:42:44,353 --> 00:42:46,423
in the past year with AI coding tools.
728
00:42:46,693 --> 00:42:50,413
And one of the things we've observed
for getting coding tools to work very
729
00:42:50,413 --> 00:42:54,763
well with Convex is that these types
of like very succinct tests that can
730
00:42:54,763 --> 00:42:59,863
be generated easily and have like a
really high strength to weight or power
731
00:42:59,863 --> 00:43:03,449
to weight ratio are just really good
for like autonomous coding, right?
732
00:43:03,449 --> 00:43:06,629
Like, if you are gonna take like
cursor agent and let it go wild,
733
00:43:06,839 --> 00:43:10,499
like what does it take to just let it
operate without you doing anything?
734
00:43:10,589 --> 00:43:13,229
It takes something like a prop test
because then it can just continuously
735
00:43:13,229 --> 00:43:18,149
make changes, run the test, and not know
that it's done until that test passes.
736
00:43:18,846 --> 00:43:20,316
Yeah, that makes a lot of sense.
737
00:43:20,316 --> 00:43:25,356
So let's go back for a moment to the
point where you were just transitioning
738
00:43:25,686 --> 00:43:32,016
from the previous Python based sync
engine to the Rust based sync engine.
739
00:43:32,016 --> 00:43:36,963
So you're embracing simulation
testing to have a better sense of
740
00:43:36,963 --> 00:43:41,253
like all the different aspects that
might influence the outcome here.
741
00:43:41,579 --> 00:43:44,289
walk me through like how you, went about.
742
00:43:44,559 --> 00:43:46,479
Deploying that new system.
743
00:43:46,659 --> 00:43:52,119
Were there any sort of big headaches
associated with migrating from the
744
00:43:52,119 --> 00:43:54,249
previous system to the new system?
745
00:43:54,549 --> 00:43:57,849
since you, for everything, you
had sort of a defacto source
746
00:43:57,849 --> 00:43:59,979
of truth, which are the files.
747
00:43:59,979 --> 00:44:04,659
So could you maybe just forget everything
the old system has done and you just
748
00:44:04,659 --> 00:44:09,646
treat it as like, oh, the, user would've
just installed this fresh, walk me
749
00:44:09,646 --> 00:44:14,056
through like how you thought about
that since migrating systems on such
750
00:44:14,056 --> 00:44:16,970
a big scale is typically, quite dread
751
00:44:17,340 --> 00:44:19,495
Yeah, dreadsome is, yeah.
752
00:44:19,575 --> 00:44:20,415
appropriate word.
753
00:44:20,720 --> 00:44:26,585
I think one of the biggest challenges was
that by design we had a very different
754
00:44:26,675 --> 00:44:29,765
data model for the old sync engine.
755
00:44:29,765 --> 00:44:31,135
We called it sync engine Classic.
756
00:44:31,465 --> 00:44:32,085
Affectionately.
757
00:44:32,225 --> 00:44:34,505
And then we had for Nucleus was a new one.
758
00:44:34,745 --> 00:44:39,695
Nucleus had a very different data model,
and the motivation for that was that
759
00:44:40,535 --> 00:44:46,145
sync engine Classic just had a ton of
possible states that were illegitimate.
760
00:44:46,505 --> 00:44:50,855
It could, if you had like a, the server
update a file and the client update
761
00:44:50,855 --> 00:44:54,665
a file, but then a shared folder gets
mounted above it, things could get
762
00:44:54,665 --> 00:45:00,005
into all of these really weird states
that were legal but would cause bugs.
763
00:45:00,395 --> 00:45:04,595
And then I think that was like one
of the big guiding principles more
764
00:45:04,595 --> 00:45:09,335
than even just like Rust or Python,
was just like designing what states
765
00:45:09,335 --> 00:45:14,795
should the system be allowed to be
in and design away everything else,
766
00:45:14,795 --> 00:45:16,955
make illegal states unrepresentable.
767
00:45:17,555 --> 00:45:21,215
And so that, what that then
meant is once we had that.
768
00:45:21,515 --> 00:45:26,225
When we needed to migrate, we had a long
tail of really weird starting positions.
769
00:45:27,855 --> 00:45:33,065
So where you basically realized, okay,
this system is in this state A, how the
770
00:45:33,065 --> 00:45:35,195
heck did it ever get into that state?
771
00:45:35,255 --> 00:45:40,175
And B, what are we gonna do about
it now where we can basically,
772
00:45:40,175 --> 00:45:43,145
it's like from a mapping function,
this is like invalid input.
773
00:45:44,105 --> 00:45:49,862
So can you explain a little bit of like,
how you constrained the space of, and how
774
00:45:49,862 --> 00:45:56,075
you designed the space of, legitimate,
valid states and what were some of the,
775
00:45:56,075 --> 00:46:00,665
if you think about this as like a big
matrix of combinations, what are some
776
00:46:00,665 --> 00:46:06,165
of the more intuitive ones that were,
not allowed that you saw quite a bit?
777
00:46:06,975 --> 00:46:13,005
Yeah, so I think part of the difficulty
for Dropbox, like as syncing things
778
00:46:13,005 --> 00:46:17,085
from the file system is that file
system APIs are really anemic.
779
00:46:17,400 --> 00:46:19,980
File system aPIs don't have transactions.
780
00:46:19,980 --> 00:46:23,010
They don't things can get
reordered in all types of ways.
781
00:46:23,190 --> 00:46:26,370
So we would just read and write to
files from the local file system, and
782
00:46:26,370 --> 00:46:30,450
we would use file system events on
Mac, we would use the equivalent on
783
00:46:30,450 --> 00:46:32,773
Windows and Linux to get, updates.
784
00:46:32,983 --> 00:46:36,403
But everything can be reordered
and racy and everything.
785
00:46:36,493 --> 00:46:40,990
So one, like common invariant
would be that if you have a
786
00:46:40,990 --> 00:46:44,497
directory you know, like files
have to exist within directories.
787
00:46:44,767 --> 00:46:47,887
If a file exists, then it's
parent directory exists.
788
00:46:48,397 --> 00:46:51,727
And like simultaneously, if you
delete a directory, it shouldn't
789
00:46:51,727 --> 00:46:52,817
have any files within it.
790
00:46:53,967 --> 00:46:57,727
And that invariant guarantees and
that the file system is a tree.
791
00:46:57,847 --> 00:46:58,207
Right?
792
00:46:58,537 --> 00:47:03,787
And then we, it's very easy to come
up with settings, with reads from the
793
00:47:03,787 --> 00:47:07,687
local file system where if you just
naively take that and write it into
794
00:47:07,687 --> 00:47:12,187
your SQLite database, you will end up
with data that does not form a tree.
795
00:47:12,815 --> 00:47:16,435
and then especially even with like
I know it's being unique, right?
796
00:47:16,435 --> 00:47:22,435
Like if I move a file from A to B, then
I might observe the add for it at B
797
00:47:23,825 --> 00:47:28,225
way before the delete at B or I might
observe it vice versa, where the file
798
00:47:28,225 --> 00:47:31,435
is transiently gone and disappeared and
we definitely don't wanna sync that.
799
00:47:31,795 --> 00:47:37,318
and then with directories, if I have
like a, as a directory and then B as
800
00:47:37,318 --> 00:47:43,528
a directory, and then I move it's, I
could observe a state where A moves into
801
00:47:43,528 --> 00:47:48,498
B, which then without doing the right
bookkeeping, might introduce a cycle in
802
00:47:48,498 --> 00:47:52,188
the graph and a cycle for directories
would be really bad news, right?
803
00:47:52,482 --> 00:47:57,072
so all of these invariants were things
that the file system APIs, they don't
804
00:47:57,072 --> 00:48:00,732
respect, even though the file system
internally has these invariants, right?
805
00:48:01,752 --> 00:48:04,422
You cannot create a direct
recycle on any file system.
806
00:48:05,412 --> 00:48:05,802
Definitely.
807
00:48:05,802 --> 00:48:09,989
I mean certainly without root And
all of these invariants exist but
808
00:48:09,989 --> 00:48:12,863
are not observable through the APIs.
809
00:48:12,863 --> 00:48:16,583
And so then we sync Engine Classic
would get into the state where it's
810
00:48:16,583 --> 00:48:19,793
like local SQLite file would have
all types of violations like that.
811
00:48:20,303 --> 00:48:24,473
So then how do we read the tea
leaves of like the database is in
812
00:48:24,473 --> 00:48:26,933
a really weird state we can't lose.
813
00:48:26,933 --> 00:48:30,263
And to go back to, I think what you had
talked about at the beginning of this was
814
00:48:30,263 --> 00:48:36,293
that we always had the nuclear option of
dropping all of our local state and doing
815
00:48:36,293 --> 00:48:38,753
a full resync from the files themselves.
816
00:48:39,143 --> 00:48:42,443
But then the problem is that we
would entirely lose user intent.
817
00:48:42,863 --> 00:48:48,323
So if, for example, I was offline for
a month and I had a bunch of files,
818
00:48:48,803 --> 00:48:53,153
and then during that month other
people in my team deleted those files.
819
00:48:53,791 --> 00:48:58,838
If I came back online and didn't have
my local database, we would have to
820
00:48:58,838 --> 00:49:02,828
recreate those files and people would
complain about this all the time because.
821
00:49:03,418 --> 00:49:05,738
They would delete something and wanna
delete it, and then Dropbox would
822
00:49:05,738 --> 00:49:07,358
just randomly decide to resurrect it.
823
00:49:07,808 --> 00:49:12,441
So those types of decisions we, we tried
to avoid that as much as possible, but
824
00:49:12,441 --> 00:49:17,271
then that meant having to look at a
potentially really confusing database and
825
00:49:17,271 --> 00:49:19,041
read what the user intent might have been.
826
00:49:19,761 --> 00:49:20,211
Right.
827
00:49:20,481 --> 00:49:24,411
I wanna dig a little bit more
into the topic of user intent.
828
00:49:24,441 --> 00:49:30,201
Since with Dropbox you've built a sync
engine very specifically for the use
829
00:49:30,201 --> 00:49:36,231
case of file management, et cetera, where
user intent has a particular meaning that
830
00:49:36,231 --> 00:49:41,181
might be very different from moving a
cursor around in a Google Docs document.
831
00:49:41,511 --> 00:49:47,618
So can you explain a little bit, what
are some of the, common scenarios of, and
832
00:49:47,618 --> 00:49:54,408
maybe subtle scenarios of user intent,
when it comes to the Dropbox design space?
833
00:49:55,218 --> 00:49:56,178
Yeah, totally.
834
00:49:56,535 --> 00:50:01,515
and I think the for regular
things like say editing files.
835
00:50:01,830 --> 00:50:06,420
I think we saw that like people just
generally did not, maybe because
836
00:50:06,420 --> 00:50:09,690
of the way the system was even
its capabilities, people did not
837
00:50:09,690 --> 00:50:11,820
edit the same files all too often.
838
00:50:12,090 --> 00:50:17,033
So maintaining user intent when file,
when everyone is online, just kind of
839
00:50:17,333 --> 00:50:21,563
taking last writer wins Where I think
user intent became very interesting is
840
00:50:21,593 --> 00:50:26,583
if someone went offline, like they're on
an airplane before wifi and airplanes
841
00:50:27,026 --> 00:50:30,746
And they worked on their document and
someone else worked on the same time.
842
00:50:31,346 --> 00:50:35,906
In that case, we observed that users
always wanted to see the conflicted
843
00:50:35,906 --> 00:50:39,956
copy and that they wanted to get
the opportunity to say, like, I did.
844
00:50:39,956 --> 00:50:43,046
I put in a lot of effort into working
on this when I was on the plane.
845
00:50:43,346 --> 00:50:47,970
Someone else, put in probably a similar
amount of effort when they were online and
846
00:50:48,170 --> 00:50:50,830
you know, so last writer wins policies.
847
00:50:50,830 --> 00:50:55,700
There violated user expectations
quite a lot because either a person
848
00:50:55,700 --> 00:50:58,460
had to win and then the person
who lost would be really upset.
849
00:50:58,900 --> 00:51:00,970
so I think those were pretty interesting.
850
00:51:00,970 --> 00:51:05,113
I think with Moose, like with more
metadata operations I think people
851
00:51:05,130 --> 00:51:06,420
were a little bit more permissive.
852
00:51:06,420 --> 00:51:10,680
Like if I moved something from one
folder to another, another person
853
00:51:10,680 --> 00:51:12,180
moved it to a different folder.
854
00:51:12,496 --> 00:51:15,076
having it just converged on
something as long as it converges.
855
00:51:15,136 --> 00:51:18,586
We observed it being like people
didn't worry about it too much.
856
00:51:18,810 --> 00:51:21,480
I think the place where user
intent is really interesting
857
00:51:21,480 --> 00:51:23,300
with moves is with sharing.
858
00:51:23,666 --> 00:51:26,983
So I think thinking about this
from like the distributed systems
859
00:51:26,983 --> 00:51:31,333
perspective on causality, there would
be like someone might have like,
860
00:51:31,423 --> 00:51:33,103
I dunno, their HR folder, right?
861
00:51:33,823 --> 00:51:38,353
And I don't know, like, let's say that
someone is transferring to the HR team is
862
00:51:38,383 --> 00:51:40,423
they're getting added to the HR folder.
863
00:51:41,158 --> 00:51:44,038
But then say before they were
on the team, they were on a
864
00:51:44,158 --> 00:51:45,358
performance improvement plan.
865
00:51:46,061 --> 00:51:50,958
So then the administrator for HR
would delete that file, make sure it's
866
00:51:50,958 --> 00:51:53,838
deleted, and then add them to the folder.
867
00:51:54,438 --> 00:51:59,178
And so their user intent is
express in a very specific
868
00:51:59,178 --> 00:52:00,978
sequencing of operations, right?
869
00:52:01,158 --> 00:52:04,038
That like this causally depended on this.
870
00:52:04,188 --> 00:52:08,238
I would not have invited 'em to the folder
unless the delete was stably synced.
871
00:52:08,848 --> 00:52:12,648
And that making sure that gets
preserved throughout the system,
872
00:52:12,798 --> 00:52:16,428
even when people are going online
and offline and everything is a very
873
00:52:16,428 --> 00:52:18,048
hard distributed systems problem.
874
00:52:18,078 --> 00:52:18,468
Right.
875
00:52:18,901 --> 00:52:22,441
and it was intimately related
with the details of the product.
876
00:52:22,958 --> 00:52:23,378
Right.
877
00:52:23,421 --> 00:52:23,661
yeah.
878
00:52:23,661 --> 00:52:29,571
How did you capture that causality
chain of events since you probably also
879
00:52:29,571 --> 00:52:32,151
couldn't quite trust the system clock?
880
00:52:32,451 --> 00:52:33,681
How did you go about that?
881
00:52:34,085 --> 00:52:36,348
Yeah, this became even
more difficult, right?
882
00:52:36,348 --> 00:52:41,118
Where file system metadata was partitioned
across many shards in the database.
883
00:52:41,568 --> 00:52:45,528
So then we ended up using something like
Lamport timestamp, where every single
884
00:52:45,528 --> 00:52:47,328
operation would get assigned a timestamp.
885
00:52:47,448 --> 00:52:50,745
And those timestamps were usually
only reading and writing to their
886
00:52:50,745 --> 00:52:55,153
particular shard and for whatever
timestamp the client had observed.
887
00:52:55,423 --> 00:52:59,677
But then in these cases where there
were potentially cross shard, they
888
00:52:59,677 --> 00:53:03,397
weren't transactions, but like causal
dependencies, we would be able to say
889
00:53:03,397 --> 00:53:07,597
like, the operation to mount this or
to add someone to the shared folder
890
00:53:07,657 --> 00:53:11,917
and there them mounting it within
their file system has to have a higher
891
00:53:11,917 --> 00:53:14,887
timestamp than any right within that or.
892
00:53:15,532 --> 00:53:16,582
Rights including deletes.
893
00:53:16,948 --> 00:53:21,628
so then that way when the client is
syncing it would be able to know that when
894
00:53:21,628 --> 00:53:26,998
I am merging operation logs across all of
the different shards, I need to assemble
895
00:53:26,998 --> 00:53:28,828
them in a causally consistent order.
896
00:53:29,288 --> 00:53:33,058
And that would then respect all
of these particular invariants.
897
00:53:33,438 --> 00:53:33,828
Right.
898
00:53:34,098 --> 00:53:38,448
So you having thought through those
different scenarios for Dropbox and
899
00:53:38,448 --> 00:53:43,758
made very intentional design decisions
that, for example, in one scenario
900
00:53:43,758 --> 00:53:46,728
last writer wins is not desirable.
901
00:53:46,728 --> 00:53:51,415
Since that might lead to a very sad
person stepping off the plane because
902
00:53:51,415 --> 00:53:54,955
all of your data is suddenly gone,
or the other person's data is gone.
903
00:53:55,262 --> 00:53:58,292
so you make very specific
design trade-offs here when it
904
00:53:58,292 --> 00:54:03,032
comes to somehow squaring the
circle of distributed systems.
905
00:54:03,182 --> 00:54:08,222
Which sort of advice would you have for
application developers or people even
906
00:54:08,552 --> 00:54:12,362
who are sitting inside of a company
and are now thinking about, oh, maybe
907
00:54:12,362 --> 00:54:17,552
we should have our own Dropbox style,
linear style sync engine internally.
908
00:54:17,552 --> 00:54:21,122
Which sort of advice would
you give them when they Yeah.
909
00:54:21,122 --> 00:54:23,132
Start thinking this through to the detail.
910
00:54:23,987 --> 00:54:28,505
Yeah, I'll talk through kind of how we
structured things at Dropbox to be able
911
00:54:28,505 --> 00:54:30,275
to navigate these types of problems.
912
00:54:30,395 --> 00:54:33,335
And I think the patterns
here, can be quite general.
913
00:54:33,605 --> 00:54:37,815
I think what we ended up with was
that like thinking like distributed
914
00:54:37,815 --> 00:54:39,945
systems syncing is hard, right?
915
00:54:40,185 --> 00:54:45,645
So we would have the kind of base layer
of the sync protocol and how state
916
00:54:45,645 --> 00:54:49,245
gets moved around between the clients
and the servers and all the shards.
917
00:54:49,695 --> 00:54:52,575
We would have very strong
consistency guarantees there.
918
00:54:52,875 --> 00:54:57,345
So we would not use any of the
knowledge of the product at that layer.
919
00:54:57,725 --> 00:55:02,475
So from a, like thinking of Dropbox
in the file system as a CRDT.
920
00:55:03,660 --> 00:55:06,420
Dropbox allows, like moves
to happen concurrently.
921
00:55:06,420 --> 00:55:09,690
It ha allows you to add something
while another thing is happening.
922
00:55:10,020 --> 00:55:12,780
But at the protocol level,
we kept things very strict.
923
00:55:12,780 --> 00:55:17,437
We kept them very close to being
serializable that every view of the
924
00:55:17,437 --> 00:55:20,857
system was identified by a very small
amount of state, like a timestamp.
925
00:55:21,067 --> 00:55:24,127
And that would fully determine the
state of the system and like the
926
00:55:24,127 --> 00:55:26,077
amount of entropy in that was very low.
927
00:55:26,497 --> 00:55:30,067
And then whenever you are modifying
it, you would say, here's what I expect
928
00:55:30,067 --> 00:55:34,267
the data to be, and if it doesn't match
exactly, it will reject the operation.
929
00:55:34,597 --> 00:55:39,727
And then by doing it, structuring things
in that way, then we made it very easy
930
00:55:39,727 --> 00:55:45,037
for product teams and for even us
working on sync to embed all of these like
931
00:55:45,067 --> 00:55:47,677
looser more product focused requirements.
932
00:55:47,677 --> 00:55:51,247
They also may wanna change over time
into the end points, like layered on top.
933
00:55:51,247 --> 00:55:57,157
So every time we wanted to change a policy
on how like a delete reconciles with an.
934
00:55:57,787 --> 00:55:59,647
You know, add for a folder or something.
935
00:55:59,887 --> 00:56:02,707
We didn't have to solve any distributed
systems problems to do that.
936
00:56:03,487 --> 00:56:07,897
So I think that like pattern of saying
that, like is there a good abstraction?
937
00:56:07,897 --> 00:56:11,467
Is there something that is like very
powerful that could solve a large
938
00:56:11,467 --> 00:56:16,267
class of problems, doing that well at
the lowest layer and then potentially
939
00:56:16,627 --> 00:56:18,577
weakening the consistency above it.
940
00:56:19,297 --> 00:56:24,217
I actually really like the Rocicorp
folks have a really great description of
941
00:56:24,217 --> 00:56:28,897
their consistency model for Replicache of
it being like session plus consistency.
942
00:56:29,227 --> 00:56:34,087
And it's like a very similar idea
where like when we build things on
943
00:56:34,087 --> 00:56:38,977
a platform, we may as our with our
product hats on, like want users to
944
00:56:38,977 --> 00:56:42,607
not have to think about conflicts and
merging and all that in a lot of cases.
945
00:56:42,757 --> 00:56:45,397
But those decisions might be
very particular to our app.
946
00:56:45,397 --> 00:56:48,187
And that's something that holds
for everything on the platform.
947
00:56:48,457 --> 00:56:52,177
And then there's always a way to
embed those decisions onto, say.
948
00:56:52,552 --> 00:56:56,842
Session consistency and Replicache
or serializability and other systems.
949
00:56:57,082 --> 00:57:00,435
And so I think that's like that
separation of concerns I
950
00:57:00,435 --> 00:57:03,615
think is something that can
apply to a lot of systems.
951
00:57:04,105 --> 00:57:04,495
Right.
952
00:57:04,555 --> 00:57:09,895
So maybe we use this also as a transition
to talk a bit more about what you're
953
00:57:09,895 --> 00:57:12,295
now designing and working on Convex.
954
00:57:12,655 --> 00:57:19,225
What were some of the key insights that
you've taken with you from Dropbox that
955
00:57:19,225 --> 00:57:22,195
ultimately led to you co-founding Convex?
956
00:57:22,975 --> 00:57:27,068
Yeah, when we first were starting
Convex we were looking at how apps
957
00:57:27,068 --> 00:57:28,238
are getting built today, right?
958
00:57:28,298 --> 00:57:32,498
Like web apps are easier
to build than ever.
959
00:57:32,653 --> 00:57:37,013
Even in 2021, it's incredible
how much, like more productive
960
00:57:37,483 --> 00:57:39,703
that compared to 10 years before.
961
00:57:39,793 --> 00:57:40,093
Right.
962
00:57:40,093 --> 00:57:45,613
It was, and I think we noticed that
the hard part for so many discussions
963
00:57:45,853 --> 00:57:50,110
was managing state and like how
state propagates I think it was from
964
00:57:50,110 --> 00:57:54,370
the Riffle paper right, on how like
so many issues in app development
965
00:57:54,370 --> 00:57:58,330
are kind of database problems in
disguise and that how techniques
966
00:57:58,330 --> 00:58:00,340
from databases might be able to help.
967
00:58:00,610 --> 00:58:05,797
So with Convex we were saying like, well
if we start with the idea of designing
968
00:58:05,797 --> 00:58:10,213
a database from first principles, can we
apply some of those database solutions
969
00:58:10,393 --> 00:58:11,923
to things across the whole stack?
970
00:58:12,253 --> 00:58:17,173
So say for example, when I'm reading
data from it within in my app, I have
971
00:58:17,173 --> 00:58:20,743
all of these React components that are
all reading different pieces of data.
972
00:58:21,193 --> 00:58:24,643
It'd be really nice if all of them
just executed at the same timestamp
973
00:58:24,703 --> 00:58:29,563
and I never had to handle consistency
issues where one component knows
974
00:58:29,563 --> 00:58:30,973
about a user or the other one doesn't.
975
00:58:31,423 --> 00:58:36,823
Similarly, like why isn't it possible
to be that I just use query across
976
00:58:36,823 --> 00:58:40,753
all my components and they just all
live update whenever I read anything,
977
00:58:40,753 --> 00:58:42,133
it's a automatically reactive.
978
00:58:42,403 --> 00:58:46,753
So those were some of the like
the initial kind of thought
979
00:58:46,753 --> 00:58:48,613
experiments for what led to Convex.
980
00:58:48,883 --> 00:58:52,243
I think the other one that was
really motivated from our time at
981
00:58:52,243 --> 00:58:56,143
Dropbox and I think is like kind
of a both a blessing and a curse.
982
00:58:56,143 --> 00:58:59,833
It's kind of like one of the key
design decisions for Convex is
983
00:58:59,833 --> 00:59:03,133
that Convex is very opinionated
about there being a separation
984
00:59:03,133 --> 00:59:04,523
between the client and the server.
985
00:59:05,303 --> 00:59:09,463
So we saw this at Dropbox where they
were just different teams, right?
986
00:59:09,853 --> 00:59:13,393
And you know, as we've seen with like
even the origin of GraphQL, right?
987
00:59:13,393 --> 00:59:16,153
Like that ability to
decouple development between.
988
00:59:16,830 --> 00:59:20,505
teams working on user facing features
and the way that the data fetching
989
00:59:20,505 --> 00:59:23,175
is implemented on the backend,
it's gonna be really powerful.
990
00:59:23,805 --> 00:59:27,615
And so kind of the kind of thought
experiment with Convex is, can we
991
00:59:27,722 --> 00:59:32,522
maintain a very strong separation while
still getting like live updating, while
992
00:59:32,522 --> 00:59:36,752
still getting a really good ergonomics
for both consuming data on the client
993
00:59:36,752 --> 00:59:38,372
and like fetching it on the server.
994
00:59:39,015 --> 00:59:39,435
Right.
995
00:59:39,435 --> 00:59:44,175
So yeah, walk me through a little bit
more through the evolution of Convex then.
996
00:59:44,235 --> 00:59:49,158
And so, in, in terms of all the other
options that are out there in terms
997
00:59:49,158 --> 00:59:55,698
of state management and I think most
what applications are using is probably
998
00:59:55,818 --> 01:00:01,892
something that at least to some degree is
somewhat customized and hand rolled and
999
01:00:01,892 --> 01:00:04,682
comes with its own huge set of trade-offs.
1000
01:00:05,348 --> 01:00:08,228
Help me better understand sort
of the, where you mentioned the,
1001
01:00:08,390 --> 01:00:11,135
opinionated nature of Convex.
1002
01:00:11,435 --> 01:00:13,272
What are the, benefits of that?
1003
01:00:13,362 --> 01:00:16,262
What are the downsides of
that and other implications?
1004
01:00:16,752 --> 01:00:20,562
Yeah, so when you write an app
on Convex we can use maybe
1005
01:00:20,562 --> 01:00:22,242
like a basic to do app, right?
1006
01:00:22,602 --> 01:00:24,072
The linear clone, everyone does.
1007
01:00:24,355 --> 01:00:26,695
you write endpoints like
you might be used to, right?
1008
01:00:26,695 --> 01:00:30,805
Where it's like list all the to-dos in a
project like update a to-do in a project.
1009
01:00:31,182 --> 01:00:34,902
and those get pushed as your
API to your Convex server.
1010
01:00:35,602 --> 01:00:39,292
the implementations of that API can
then read and write to the database
1011
01:00:39,292 --> 01:00:43,492
and Convex has like a, kinda like Mongo
or Firebase, like API for doing so.
1012
01:00:44,008 --> 01:00:48,688
I think the main benefit then of
Convex relative to more traditional
1013
01:00:48,688 --> 01:00:53,172
architectures is that if you're on the
client, the only thing you need to do
1014
01:00:53,412 --> 01:00:55,722
is call the, like the use query hook.
1015
01:00:56,067 --> 01:01:00,957
You're saying like, I am looking at a
project I just do use like use query
1016
01:01:01,347 --> 01:01:07,857
list tasks and project that will then
talk to the server, run that query, but
1017
01:01:07,857 --> 01:01:12,057
then also set up the subscription and
then whenever any data that that query
1018
01:01:12,057 --> 01:01:16,227
looked at changes, it will efficiently
determine that and then push the update.
1019
01:01:16,857 --> 01:01:21,567
So part of what is like been nice
with Convex is that you are getting
1020
01:01:21,917 --> 01:01:26,307
a client that has a web socket
protocol, it has a sync engine built in.
1021
01:01:26,637 --> 01:01:30,297
You're getting infrastructure for
running JavaScript at scale and for
1022
01:01:30,297 --> 01:01:32,517
handling sandboxing and all of that.
1023
01:01:32,757 --> 01:01:35,757
And then you're also getting a
database, which is, you know.
1024
01:01:36,102 --> 01:01:39,342
One, supporting transactions
or reading and writing to it.
1025
01:01:39,552 --> 01:01:43,212
But then it also supports this
efficient like being able to subscribe
1026
01:01:43,212 --> 01:01:47,652
on, I ran this query, this query
just ran a bunch of JavaScript.
1027
01:01:47,652 --> 01:01:50,752
It looked at different rows
and it ran some queries.
1028
01:01:51,235 --> 01:01:55,965
the system will automatically efficiently
determine if any right overlaps with that.
1029
01:01:56,385 --> 01:01:59,805
So the combination of all of those
things is like part of the benefit of
1030
01:01:59,805 --> 01:02:03,735
Convex, you just write TypeScript and
you write it in a way that's, feels
1031
01:02:03,735 --> 01:02:06,645
very natural and everything just works.
1032
01:02:07,335 --> 01:02:12,825
And I think some of the like downsides is
that it's it is a different set of APIs.
1033
01:02:13,098 --> 01:02:16,658
it's not using sql, it's doing
things a little bit differently
1034
01:02:16,658 --> 01:02:17,858
than they've been done before.
1035
01:02:18,342 --> 01:02:22,842
yeah, it's like kind of interesting
even today to see like what you know.
1036
01:02:23,262 --> 01:02:24,942
Talking about AI code gen, right?
1037
01:02:24,942 --> 01:02:28,422
Like models have been trained,
pre-trained on this huge corpus
1038
01:02:28,422 --> 01:02:29,322
of stuff on the internet.
1039
01:02:29,592 --> 01:02:32,412
And when are they good at
adopting new technologies?
1040
01:02:32,682 --> 01:02:35,202
Technologies that might be
after their knowledge cutoff.
1041
01:02:35,562 --> 01:02:38,887
And when are they like it's better just
to stick to things that they know already.
1042
01:02:39,592 --> 01:02:39,952
Right.
1043
01:02:39,997 --> 01:02:45,428
So what you've mentioned before where you
say, Convex is rather opinionated for me.
1044
01:02:45,858 --> 01:02:49,668
in let's say five years ago,
I might've been much more of
1045
01:02:49,668 --> 01:02:53,028
like, oh, but maybe there's a
technology that's less opinionated
1046
01:02:53,028 --> 01:02:54,468
and I can use it for everything.
1047
01:02:54,828 --> 01:02:58,518
But the more experience I got,
the more I realized no, actually.
1048
01:02:58,848 --> 01:03:02,478
I want something that's very
opinionated, but opinionated
1049
01:03:02,538 --> 01:03:04,338
and I share those opinions.
1050
01:03:04,518 --> 01:03:06,378
Those are exactly for my use case.
1051
01:03:06,378 --> 01:03:08,448
So I think that is much better.
1052
01:03:08,448 --> 01:03:12,648
This is why we have different technologies
and they are great for different
1053
01:03:12,648 --> 01:03:17,208
scenarios, and I think the more a
technology tries to say, no, we're,
1054
01:03:17,208 --> 01:03:22,912
we're best for everything, I think the,
less it's actually good at anything.
1055
01:03:23,392 --> 01:03:26,932
And so I greatly appreciate you
standing your ground and saying
1056
01:03:26,932 --> 01:03:30,872
like, Hey, those are, our design,
decisions that we've made.
1057
01:03:31,022 --> 01:03:35,615
And those are the use cases where,
you'd be really well served building
1058
01:03:35,615 --> 01:03:37,355
on top of something like Convex.
1059
01:03:37,685 --> 01:03:42,522
And, I particularly like for now where
TypeScript is really the, default
1060
01:03:42,522 --> 01:03:44,772
language to build full stack applications.
1061
01:03:45,042 --> 01:03:48,732
And it's also increasingly
becoming the default for.
1062
01:03:48,933 --> 01:03:51,250
ai, based applications as well.
1063
01:03:51,430 --> 01:03:57,040
And AI based systems speak type
script, just as well as English.
1064
01:03:57,640 --> 01:04:02,090
And given that Convex makes
that full stack super easy.
1065
01:04:02,450 --> 01:04:07,893
And also I think you can, when
you build local-first apps, it can
1066
01:04:07,893 --> 01:04:11,913
sometimes get really tricky because
you empower the client so much.
1067
01:04:11,913 --> 01:04:15,453
You give the client so much
responsibility and therefore there's
1068
01:04:15,453 --> 01:04:17,193
many, many things that can go wrong.
1069
01:04:17,223 --> 01:04:21,653
And I think Convex therefore, takes
a more conservative approach and says
1070
01:04:21,653 --> 01:04:25,881
like, Hey, everything that happens on
the server is like highly privileged
1071
01:04:25,881 --> 01:04:27,501
and this is your safe environment.
1072
01:04:27,831 --> 01:04:31,491
And the client will try to give
you the best user experience and
1073
01:04:31,491 --> 01:04:33,081
developer experience out of the box.
1074
01:04:33,831 --> 01:04:37,551
But the client could be in a
more adversarial environment.
1075
01:04:37,611 --> 01:04:39,831
And I think those are
great design trade offs.
1076
01:04:40,071 --> 01:04:45,208
So, I think that is a fantastic foundation
for tons of different applications.
1077
01:04:45,818 --> 01:04:46,238
Yeah.
1078
01:04:46,701 --> 01:04:49,011
talking about some of these
strong opinions being both
1079
01:04:49,011 --> 01:04:50,271
blessings and curses, right?
1080
01:04:50,271 --> 01:04:54,681
Like over the past few months, one
thing we've been working on is trying
1081
01:04:54,681 --> 01:04:58,401
to bridge the gap between those
two points in the spectrum, right?
1082
01:04:58,705 --> 01:05:02,675
we wrote a blog post on it a few months
ago of like working on what we're calling
1083
01:05:02,675 --> 01:05:08,135
our like Object sync engine, trying to
take a lot of the principles from more of
1084
01:05:08,135 --> 01:05:14,270
a local-first type approach of having a
data model that it is synced to the client
1085
01:05:14,450 --> 01:05:18,020
and the only interaction between the
server and the client is through the sync.
1086
01:05:18,440 --> 01:05:22,580
And the client then can always render
its UI just looking at the local
1087
01:05:22,580 --> 01:05:24,380
database and it can be offline.
1088
01:05:24,530 --> 01:05:28,040
It's also fully describes the
app stage so it can be exported
1089
01:05:28,040 --> 01:05:29,600
and rehydrated or whatever.
1090
01:05:29,904 --> 01:05:33,564
it's very interesting design exercise
we've been on to say like, can
1091
01:05:33,564 --> 01:05:39,804
you structure a protocol on a sync
engine in a way such that the UI
1092
01:05:39,834 --> 01:05:42,984
is still reading and writing to a
local store that is authoritative.
1093
01:05:43,344 --> 01:05:47,514
But then that local store is like to kind
of use like an electric SQL terminology is
1094
01:05:47,514 --> 01:05:52,584
like that is a shape that is some mapping
of a strongly separated server data model.
1095
01:05:52,794 --> 01:05:56,754
So we still have a client data model
and server data model, which might be
1096
01:05:56,754 --> 01:06:01,419
owned by different teams and evolve
independently and, we also have that
1097
01:06:01,419 --> 01:06:06,159
strong separation where the implementation
of the shape is privileged and running
1098
01:06:06,159 --> 01:06:10,925
on the server and has authorization rules
built in and get the best of both worlds.
1099
01:06:10,925 --> 01:06:16,255
And we've kind of, we have a like beta
that we've not released publicly thought
1100
01:06:16,375 --> 01:06:19,585
open, sourced out there, but kind
of a thing where we, I think they're
1101
01:06:19,585 --> 01:06:21,355
still figuring out like the DX for it.
1102
01:06:21,355 --> 01:06:24,055
And I think we have something
that like algorithmically works
1103
01:06:24,355 --> 01:06:28,165
and it's like the protocol works,
but it's like, it's kind of hard.
1104
01:06:28,165 --> 01:06:28,315
Right.
1105
01:06:28,315 --> 01:06:32,395
It kind of reminds me a lot of writing
GraphQL resolvers of like saying How do I
1106
01:06:32,395 --> 01:06:35,215
take the messages table from my chat app?
1107
01:06:35,710 --> 01:06:39,280
Then under the hood that might be
joining stuff from many different
1108
01:06:39,280 --> 01:06:43,060
tables and filtering rows, or might
even be doing a full tech search
1109
01:06:43,060 --> 01:06:45,250
query in another view or something.
1110
01:06:45,547 --> 01:06:48,817
and coming up with the right
ergonomics to make that feel
1111
01:06:48,847 --> 01:06:50,767
great for a day one experience.
1112
01:06:50,767 --> 01:06:53,047
I think something that's like
still we're working on, still
1113
01:06:53,047 --> 01:06:53,902
kinda like a research project,
1114
01:06:54,097 --> 01:06:54,577
right?
1115
01:06:54,637 --> 01:06:58,837
Well, when it comes to data, there is no
free lunch, but I'd much rather to have
1116
01:06:58,837 --> 01:07:03,787
it be done in the order and sequencing
that you're going through, which is
1117
01:07:03,787 --> 01:07:09,307
having a solid foundation that I can
trust and then figuring out the right
1118
01:07:09,307 --> 01:07:14,047
ergonomics afterwards, since I think
there's many, many tools that start with
1119
01:07:14,047 --> 01:07:19,747
great ergonomics, but later realize that
it's on a built, on a unsound foundation.
1120
01:07:19,957 --> 01:07:24,137
So when it comes to data, I want a
trustworthy foundation, and I think
1121
01:07:24,137 --> 01:07:25,964
you're going about in the right order.
1122
01:07:26,529 --> 01:07:31,209
Hey, Sujay, I've been learning
so much about one of my favorite
1123
01:07:31,209 --> 01:07:33,099
products of all time, Dropbox.
1124
01:07:33,789 --> 01:07:39,599
I've learned so much of like how the
sausage was actually made, how it evolved
1125
01:07:39,599 --> 01:07:45,119
over time and I'm really excited that
you got to share the story today and
1126
01:07:45,419 --> 01:07:48,272
many me included, got to, learn from it.
1127
01:07:48,452 --> 01:07:51,002
Thank you so much for taking the
time and sharing all of this.
1128
01:07:51,572 --> 01:07:52,202
Thanks for having me.
1129
01:07:52,202 --> 01:07:53,207
This is super, super fun.
1130
01:07:54,159 --> 01:07:56,739
Thank you for listening to
the localfirst.fm podcast.
1131
01:07:56,919 --> 01:08:00,009
If you've enjoyed this episode and
haven't done so already, please
1132
01:08:00,009 --> 01:08:01,299
subscribe and leave a review.
1133
01:08:01,689 --> 01:08:04,209
Please also share this episode
with your friends and colleagues.
1134
01:08:04,599 --> 01:08:07,599
Spreading the word about the
podcast is a great way to support
1135
01:08:07,599 --> 01:08:09,309
it and to help me keep it going.
1136
01:08:09,969 --> 01:08:13,389
A special thanks again to Jazz
for supporting this podcast.
1137
01:08:13,689 --> 01:08:14,649
I'll see you next time.