·E23

#23 – Sujay Jayakar: Dropbox, Convex

Episode Transcript

1 00:00:00,254 --> 00:00:04,304 There's another kind of interesting decision here on Dropbox by 2 00:00:04,304 --> 00:00:06,824 design was always like a sidecar. 3 00:00:06,824 --> 00:00:09,974 It's always something that just sits and it looks at your files. 4 00:00:09,974 --> 00:00:12,434 Your files are just regular files on the file system. 5 00:00:12,794 --> 00:00:17,408 And if Dropbox, the app isn't running, your files are there and they're safe, 6 00:00:17,408 --> 00:00:21,638 and it's something that you know, regular apps can just read and write 7 00:00:21,638 --> 00:00:26,988 to, and in some sense like Dropbox was unintentionally local-first 8 00:00:27,008 --> 00:00:28,508 from that perspective, right? 9 00:00:28,538 --> 00:00:31,658 Because it's saying that no matter what happens, your data 10 00:00:31,658 --> 00:00:32,918 is just there and you own it. 11 00:00:33,984 --> 00:00:36,084 Welcome to the localfirst.fm podcast. 12 00:00:36,444 --> 00:00:39,174 I'm your host, Johannes Schickling, and I'm a web developer, a 13 00:00:39,174 --> 00:00:42,234 startup founder, and love the craft of software engineering. 14 00:00:42,654 --> 00:00:46,194 For the past few years, I've been on a journey to build a modern high quality 15 00:00:46,194 --> 00:00:50,034 music app using web technologies, and in doing so, I've been following down 16 00:00:50,034 --> 00:00:51,984 the rabbit hole of local-first software. 17 00:00:52,494 --> 00:00:55,374 This podcast is your invitation to join me on that journey. 18 00:00:56,154 --> 00:00:58,924 In this episode, I'm speaking to Sujay Jayakar. 19 00:00:59,439 --> 00:01:02,319 Co-founder of Convex and Early Engineer at Dropbox. 20 00:01:02,739 --> 00:01:06,609 In this conversation, Sujay shares the story on how the Sync Engine 21 00:01:06,669 --> 00:01:11,169 powering Dropbox was built initially and later redesigned to address all 22 00:01:11,169 --> 00:01:13,209 sorts of distributed systems problems. 23 00:01:13,689 --> 00:01:18,999 Before getting started, also a big thank you to Jazz for supporting this podcast. 24 00:01:19,299 --> 00:01:21,099 And now my interview with Sujay. 25 00:01:22,211 --> 00:01:23,051 Hey, Sujay. 26 00:01:23,081 --> 00:01:24,701 So nice to have you on the show. 27 00:01:24,701 --> 00:01:25,421 How are you doing? 28 00:01:25,901 --> 00:01:26,501 Doing great. 29 00:01:26,506 --> 00:01:26,736 Great. 30 00:01:26,981 --> 00:01:27,911 Really happy to be here. 31 00:01:28,361 --> 00:01:30,461 I'm super excited to have you on the show. 32 00:01:30,491 --> 00:01:35,244 I've been using your work really since over a decade at this point 33 00:01:35,244 --> 00:01:39,351 when I was really getting into using computers productively. 34 00:01:39,681 --> 00:01:45,124 And we just the other time had another really interesting guest, Seph Gentle on 35 00:01:45,124 --> 00:01:50,534 the podcast, who has worked on a really fascinating tool, called Google Wave 36 00:01:50,534 --> 00:01:52,904 back then that had a big impact on me. 37 00:01:53,204 --> 00:01:56,264 And you've been working on another technology that had a big impact 38 00:01:56,264 --> 00:02:01,351 on me, which is Dropbox and still has a very positive impact on me. 39 00:02:01,531 --> 00:02:05,431 That was all the way back then over 10 years ago in 2014. 40 00:02:05,731 --> 00:02:11,181 I don't think I need to explain to the audience what Dropbox is, but, I 41 00:02:11,181 --> 00:02:15,441 want to hear it from you, like what led you to join Dropbox, I think very 42 00:02:15,441 --> 00:02:19,851 early on and just hearing a little bit just embedded in your personal 43 00:02:20,001 --> 00:02:24,231 context when you joined it, and then we're gonna go dive really deep into 44 00:02:24,231 --> 00:02:27,201 all things syncing related, et cetera. 45 00:02:27,201 --> 00:02:27,951 How does that sound? 46 00:02:28,731 --> 00:02:29,721 Yeah, that sounds great. 47 00:02:30,021 --> 00:02:31,761 It's actually a really funny story. 48 00:02:31,803 --> 00:02:34,533 my career here in technology started in 2012. 49 00:02:34,853 --> 00:02:37,883 I was actually studying mathematics. 50 00:02:37,883 --> 00:02:44,243 I was going to go work at the NSA doing cryptography, and I was born in India. 51 00:02:44,408 --> 00:02:48,273 but I'm a naturalized citizen for the United States, and, you have to 52 00:02:48,273 --> 00:02:52,343 be, have security clearance to go do these types of cryptography things. 53 00:02:52,993 --> 00:02:58,063 And you know, my clearance kept on dragging on and on and on and 54 00:02:58,223 --> 00:03:01,873 they like interviewed my roommates and apparently just a very sketchy 55 00:03:01,873 --> 00:03:06,113 guy so I had an offer to go work there, but it kept on dragging on. 56 00:03:06,323 --> 00:03:10,543 And then my roommate, at the time was a computer science major who wanted some 57 00:03:10,833 --> 00:03:15,459 like someone to go with him to the career fair and, just started chatting with the 58 00:03:15,459 --> 00:03:18,909 Dropbox people and you know, it's about like a hundred people around that time. 59 00:03:19,329 --> 00:03:23,786 And, chatting turned into hanging out at dinner, turned into interviewing 60 00:03:23,786 --> 00:03:26,906 and being a math person, I did my interviews all in Haskell and 61 00:03:26,906 --> 00:03:28,466 didn't know any real programming. 62 00:03:28,909 --> 00:03:33,619 and then yeah, that turned into doing an internship, dropping out of undergrad and. 63 00:03:34,090 --> 00:03:35,391 just following the dream. 64 00:03:35,391 --> 00:03:38,841 And so I worked on, at Dropbox, I worked on a bunch of things. 65 00:03:38,841 --> 00:03:40,971 I started off working on our, like, growth team. 66 00:03:40,971 --> 00:03:43,761 So I did a lot of like email system. 67 00:03:43,761 --> 00:03:47,051 Like I did this, I worked on this thing called the space raise, like a promotion. 68 00:03:47,361 --> 00:03:48,771 Oh, I remember that. 69 00:03:48,771 --> 00:03:49,161 Yes. 70 00:03:49,161 --> 00:03:53,631 I think I've, I've earned quite a lot of like free storage, which I think 71 00:03:53,631 --> 00:03:55,701 over the time has like gone down. 72 00:03:56,001 --> 00:03:58,641 But that was a very smart and effective mechanism. 73 00:03:58,641 --> 00:04:01,371 I surely invited all my friends back then. 74 00:04:01,371 --> 00:04:05,901 I couldn't afford a premium plan being a broke student that worked. 75 00:04:07,381 --> 00:04:11,381 And then from there worked on the sync engine for some time. 76 00:04:11,411 --> 00:04:15,961 And then right now I'm the co-founder and chief scientist of a startup called Convex 77 00:04:16,141 --> 00:04:20,251 and my three co-founders and I met working on this project called Magic Pocket, 78 00:04:20,251 --> 00:04:25,474 where Dropbox stores hundreds of petabytes now exabytes of files, for users. 79 00:04:25,474 --> 00:04:26,854 And we used to do that in S3. 80 00:04:27,064 --> 00:04:32,284 And so the three of us worked together on a team to build Amazon S3, but in-house 81 00:04:32,284 --> 00:04:33,994 and migrate all of the data over. 82 00:04:34,364 --> 00:04:39,116 so we did that for a few years and then, Worked on rewriting the entirety of 83 00:04:39,116 --> 00:04:42,866 Dropbox, the sync engine, the thing that runs on all of our desktop computers. 84 00:04:43,139 --> 00:04:46,919 we rewrote it to be really correct and scalable and very flexible. 85 00:04:47,279 --> 00:04:48,369 and shipped that. 86 00:04:48,650 --> 00:04:52,673 after that left Dropbox in 2020 I was trying to decide if I wanted 87 00:04:52,673 --> 00:04:54,353 to get back to academics or not. 88 00:04:54,353 --> 00:04:59,513 So I did some research in networking and then decided to start Convex in 2021. 89 00:04:59,843 --> 00:05:03,531 Certainly curious, which sort of research has had your interest the 90 00:05:03,531 --> 00:05:07,611 most in this sort of transitionary per period, but maybe we stash that 91 00:05:07,672 --> 00:05:11,494 for a moment and go back to the beginning when you joined Dropbox. 92 00:05:11,805 --> 00:05:15,451 you mentioned there were around a hundred people working there currently. 93 00:05:15,728 --> 00:05:20,078 how do I need to imagine the technology behind Dropbox at this point? 94 00:05:20,378 --> 00:05:26,714 it clearly started all out with like, desktop focused like daemon project, 95 00:05:27,054 --> 00:05:33,204 like daemon process that's running on your machine somehow keeps track of the files 96 00:05:33,294 --> 00:05:36,774 on your system and then applies the magic. 97 00:05:37,044 --> 00:05:43,138 So explain to me how things worked back then and what was it like to 98 00:05:43,138 --> 00:05:46,258 work at Dropbox when there were around about a hundred people. 99 00:05:46,888 --> 00:05:49,678 Yeah, I mean, it was pretty magical, right? 100 00:05:49,678 --> 00:05:54,058 Because the company had, I think gotten so many things right on the product side 101 00:05:54,058 --> 00:05:55,968 and then those showed up in technology. 102 00:05:55,968 --> 00:06:00,238 But just this feeling of like Dropbox being this product that just worked right? 103 00:06:00,238 --> 00:06:01,468 It was for everyone. 104 00:06:01,738 --> 00:06:05,758 It was not just for technologists, but anyone should be able, anyone who's 105 00:06:05,758 --> 00:06:09,478 comfortable using a computer should be able to install Dropbox and have 106 00:06:09,538 --> 00:06:12,118 a folder of theirs become magical. 107 00:06:12,508 --> 00:06:15,748 And without understanding anything about how it works, they should 108 00:06:15,748 --> 00:06:19,198 just think of it as like an extension of what they know already. 109 00:06:19,468 --> 00:06:19,708 yeah. 110 00:06:19,708 --> 00:06:22,948 And so like the ways that that showed up I think were really interesting. 111 00:06:22,948 --> 00:06:26,048 At the time there was a very strong culture of like reverse engineering. 112 00:06:26,678 --> 00:06:29,998 So to have this daemon that runs locally. 113 00:06:30,238 --> 00:06:34,348 You know, there was one of the amazing early moments in Dropbox was that, 114 00:06:34,564 --> 00:06:38,614 if like you open up finder or explore and you have the overlays on it. 115 00:06:39,094 --> 00:06:43,294 Like that used to be done by like attaching to the finder 116 00:06:43,294 --> 00:06:45,124 process and injecting code into it 117 00:06:47,854 --> 00:06:51,994 and to the point where, uh, when some folks had gone to talk to Apple at the 118 00:06:51,994 --> 00:06:57,364 time and about like working with the file system and everything like the, 119 00:06:57,874 --> 00:07:02,584 there were teams at Apple that asked Dropbox, how did you do that in Finder? 120 00:07:05,374 --> 00:07:08,674 So you wanted to offer the most native experience. 121 00:07:08,674 --> 00:07:11,044 There weren't the necessary APIs for that. 122 00:07:11,314 --> 00:07:12,724 And so you just made it happen. 123 00:07:12,754 --> 00:07:13,444 That's amazing. 124 00:07:13,449 --> 00:07:13,459 Yeah. 125 00:07:14,164 --> 00:07:14,464 Yeah. 126 00:07:14,804 --> 00:07:19,744 And so that that idea of like, how do you create the best user experience, 127 00:07:19,744 --> 00:07:26,464 something that you know, for the purpose of making non-technical users feel very 128 00:07:26,464 --> 00:07:28,654 confident and feel very safe using it. 129 00:07:28,834 --> 00:07:32,584 That was another, I think, really deep like company value of like 130 00:07:32,584 --> 00:07:36,544 being worthy of trust and taking people's files very seriously. 131 00:07:36,604 --> 00:07:39,544 You know, I like remember having a friend who was in residency at the 132 00:07:39,544 --> 00:07:44,314 time and he was telling me that he keeps all of his, like some of his non 133 00:07:44,314 --> 00:07:50,014 HIPAA stuff, but like his things that he looks at on Dropbox and you know, 134 00:07:50,014 --> 00:07:51,754 pulls him up and he's consulting 'em. 135 00:07:51,754 --> 00:07:54,374 And there's a part of me which is terrified by that, right? 136 00:07:54,374 --> 00:07:57,934 Like we think of software as something where like throwing a 500 137 00:07:57,934 --> 00:07:59,554 error is fine every once in a while. 138 00:08:00,004 --> 00:08:03,054 And a Dropbox that was there was a culture of making users feel 139 00:08:03,054 --> 00:08:04,284 like they could really trust us. 140 00:08:04,314 --> 00:08:08,274 And then that showed up for things like making sure that, like when 141 00:08:08,274 --> 00:08:11,604 we give feedback to users, if we put that green overlay in finder. 142 00:08:12,189 --> 00:08:16,689 They know that no matter what happens, they could throw their laptop in a pool. 143 00:08:16,689 --> 00:08:20,739 They could like they, anything could happen and their files are safe. 144 00:08:20,869 --> 00:08:24,609 Like if their house burns down, they don't have to worry about that thing. 145 00:08:24,879 --> 00:08:29,469 And that's like all of that reverse engineering and all of the emphasis 146 00:08:29,469 --> 00:08:31,269 on correctness and durability. 147 00:08:31,479 --> 00:08:34,209 It was all in service of that feeling, which I think was really cool. 148 00:08:34,763 --> 00:08:38,273 so on the engineering side, at the time it was like in hyper growth mode. 149 00:08:38,273 --> 00:08:40,793 So they had a Python desktop client. 150 00:08:40,973 --> 00:08:43,433 Almost all of Dropbox was in Python at the time. 151 00:08:43,793 --> 00:08:49,943 And so there's a pre my py, like big rapidly changing desktop client that 152 00:08:50,283 --> 00:08:53,693 needed to support Mac, windows and Linux and all these different file systems. 153 00:08:53,963 --> 00:08:58,313 and then on the server, it was like we had one big server called Meta Server, 154 00:08:58,646 --> 00:08:59,936 meta, I think was from metadata. 155 00:09:00,243 --> 00:09:03,513 and that like ran almost all of Dropbox. 156 00:09:03,573 --> 00:09:06,223 We stored the metadata in MySQL. 157 00:09:06,673 --> 00:09:11,463 The files were stored in S3, and then we had a separate notification server 158 00:09:11,463 --> 00:09:13,713 for managing pushes and things like that. 159 00:09:13,923 --> 00:09:17,433 And so it was like kind of classic architecture and like reach was 160 00:09:17,463 --> 00:09:20,403 starting to reach the limits of its scaling even at that time. 161 00:09:20,913 --> 00:09:24,106 And, those were a lot of the things we worked on over the next 10 years. 162 00:09:24,616 --> 00:09:25,126 Wow. 163 00:09:25,366 --> 00:09:27,766 So was the server also written in Python? 164 00:09:27,766 --> 00:09:29,566 So it was all one big python shop. 165 00:09:30,076 --> 00:09:30,436 Yeah. 166 00:09:30,886 --> 00:09:32,836 And the server was all written in Python. 167 00:09:33,373 --> 00:09:39,043 we, had some pretty funny bugs that were due to it's kind of 168 00:09:39,043 --> 00:09:40,213 crazy to think about it now. 169 00:09:40,213 --> 00:09:44,743 You know, we, you working in TypeScript and full time and to think of, like 170 00:09:44,743 --> 00:09:48,433 back in the day we just had these like hundreds of thousands, millions of lines 171 00:09:48,433 --> 00:09:54,383 of code with no type safety and with all types of crazy meta programming and 172 00:09:55,053 --> 00:09:57,553 decorators and meta classes and stuff. 173 00:09:57,553 --> 00:10:00,193 And yeah, so there was a, it was all in Python when I showed up. 174 00:10:00,235 --> 00:10:04,453 it was not all in Python and not all in one big monolithic service when I left. 175 00:10:04,814 --> 00:10:09,644 So you mentioning joining when there were around a hundred people and you 176 00:10:09,644 --> 00:10:14,894 probably already at this point had like multitudes more in terms of users. 177 00:10:15,274 --> 00:10:21,004 Being in hypergrowth, it is sort of this race against time where you only have 178 00:10:21,004 --> 00:10:26,344 so much time to work on something, but growth may be outrun you already and 179 00:10:26,344 --> 00:10:28,624 things are already starting to break. 180 00:10:28,624 --> 00:10:33,514 Or You know like, okay, if things gonna grow like this, this system will 181 00:10:33,514 --> 00:10:36,508 break and it's gonna be pretty bad. 182 00:10:36,808 --> 00:10:42,171 So tell me more about how you were dealing with like the constant r 183 00:10:42,321 --> 00:10:48,778 race against time to rebuild systems, redesign systems, putting out fires. 184 00:10:49,018 --> 00:10:49,948 What was that like? 185 00:10:50,224 --> 00:10:53,374 Yeah, and I think there's like kind of an interesting place to take this on. 186 00:10:53,374 --> 00:10:56,584 I think like the normal things were on scale right there. 187 00:10:56,584 --> 00:10:57,274 Those were like. 188 00:10:57,619 --> 00:11:00,416 One, kinda class of problems of being able to handle the load. 189 00:11:00,626 --> 00:11:04,849 But I think one kind of really interesting, dimension of this that led 190 00:11:04,849 --> 00:11:09,829 to our decision to start rewriting all of the sync engine in 2016 was actually 191 00:11:09,829 --> 00:11:11,749 just like customer debugging load. 192 00:11:12,619 --> 00:11:17,449 You know, we have we had hundreds of millions of active users and they were 193 00:11:17,539 --> 00:11:20,149 using Dropbox in all types of crazy ways. 194 00:11:20,389 --> 00:11:24,019 Like one of the stories is someone was using Dropbox with like, I think 195 00:11:24,019 --> 00:11:27,559 it was running on some, I don't know if it was like a raspberry pie or 196 00:11:27,559 --> 00:11:28,849 something, something on his tractor. 197 00:11:28,879 --> 00:11:32,749 Like the guy ran a farm and he was using Dropbox to sink like 198 00:11:32,749 --> 00:11:35,089 pads in text files to his tractor. 199 00:11:35,533 --> 00:11:37,633 And I might be getting some of the details wrong, but 200 00:11:37,633 --> 00:11:38,353 it's something like that. 201 00:11:38,353 --> 00:11:43,243 And so people would just use Dropbox in all types of crazy ways on crazy 202 00:11:43,243 --> 00:11:47,913 file systems with kernel modules running that are messing things around 203 00:11:47,913 --> 00:11:52,251 or so I think, You know, in terms of getting ahead of scale, I think we found 204 00:11:52,251 --> 00:11:58,644 ourselves around 2015, 2016, in the place where for the syn engine on the 205 00:11:58,644 --> 00:12:03,864 desktop client, the entire team just spent all of its time debugging issues. 206 00:12:04,644 --> 00:12:08,934 We had this principle of like anything that's possible, anything that a 207 00:12:08,934 --> 00:12:13,254 protocol allows, anything that some threading race condition that's 208 00:12:13,254 --> 00:12:15,864 theoretically possible will be possible. 209 00:12:16,404 --> 00:12:17,934 And then we would see it, right? 210 00:12:17,934 --> 00:12:20,514 Like users would write in saying, my files aren't sinking. 211 00:12:20,814 --> 00:12:24,414 And then we would look at it and we would spend months debugging each one of these 212 00:12:24,414 --> 00:12:30,218 issues and trying to read the tea leaves from traces and reports and reproductions. 213 00:12:30,218 --> 00:12:33,878 And it'll be like, oh they mounted this file system over here 214 00:12:33,878 --> 00:12:36,278 and then this one and this one are in a different file system. 215 00:12:36,278 --> 00:12:40,188 So moving the file actually did a copy, but then the X adders 216 00:12:40,188 --> 00:12:42,138 were in, preserved this and that. 217 00:12:42,478 --> 00:12:46,168 You know, in terms of that theme of like getting ahead of scale, like I think there 218 00:12:46,168 --> 00:12:51,238 was first this realization that like the set of possible things that can happen in 219 00:12:51,238 --> 00:12:54,508 the system is just astronomically large. 220 00:12:54,598 --> 00:12:57,298 And all of them will happen if they're allowed to. 221 00:12:57,718 --> 00:13:01,498 And we do not have, no matter how much like incremental time 222 00:13:01,498 --> 00:13:04,798 we put into debugging things, we will never be able to keep up. 223 00:13:05,128 --> 00:13:08,188 And the cost of doing that is that the entire team is working 224 00:13:08,188 --> 00:13:09,628 on maintenance like this. 225 00:13:09,628 --> 00:13:11,098 We couldn't build any new features. 226 00:13:11,578 --> 00:13:15,958 So I think that was a motivation then for the rewrite to is can we find like points 227 00:13:15,958 --> 00:13:20,758 of leverage where if we just invest a little bit in technology upfront, like by 228 00:13:20,758 --> 00:13:25,768 architecting things a particular way, can we just eliminate a much bigger set of 229 00:13:25,768 --> 00:13:29,638 potential work from debugging and working with customers and stuff like that. 230 00:13:29,974 --> 00:13:33,974 So maybe this is a good time to take a step back and try to 231 00:13:33,974 --> 00:13:38,354 better understand what was Dropbox sync Engine actually back then? 232 00:13:38,654 --> 00:13:45,041 So from just thinking about it through like a user's perspective, I have maybe 233 00:13:45,041 --> 00:13:48,094 two computers, and I have files over here. 234 00:13:48,094 --> 00:13:53,179 I. I want to make sure that I have the files synced over from here to here. 235 00:13:53,569 --> 00:13:59,299 So I could now think about this as sort of like a Git style, approach. 236 00:13:59,539 --> 00:14:01,219 Maybe there's other ways as well. 237 00:14:01,489 --> 00:14:05,329 walk me through sort of like through the solution space, how this could have been 238 00:14:05,389 --> 00:14:07,459 approached and how was it approached? 239 00:14:07,592 --> 00:14:12,032 is there some sort of like diffing involved between different file states 240 00:14:12,032 --> 00:14:14,222 over time, those are being synced around. 241 00:14:14,462 --> 00:14:17,792 Do you sync around the actual file content itself? 242 00:14:18,062 --> 00:14:19,142 Help me to understand. 243 00:14:19,142 --> 00:14:24,752 Building a mental model, what does it mean back then for the sync engine to work? 244 00:14:25,187 --> 00:14:25,697 Yeah. 245 00:14:25,757 --> 00:14:26,087 Yeah. 246 00:14:26,207 --> 00:14:28,427 It's a super interesting question, right? 247 00:14:28,427 --> 00:14:31,517 Because I think like you're saying, there's so many different paths one 248 00:14:31,517 --> 00:14:34,667 can take and it's, I think one of those things where like if someone 249 00:14:34,667 --> 00:14:37,307 asks, like design Dropbox in an interview question, there's like 250 00:14:37,517 --> 00:14:39,767 definitely not one right answer, right? 251 00:14:39,797 --> 00:14:44,417 It's like there are so many trade-offs and like different forks in the decision tree. 252 00:14:44,777 --> 00:14:48,767 I think one of the first things is that, so you have your desktop A and you 253 00:14:48,767 --> 00:14:52,547 have your, maybe you have your desktop and your laptop, and one of the first 254 00:14:52,547 --> 00:14:55,877 decisions for Dropbox is that we would have a central server in the middle, 255 00:14:56,417 --> 00:15:01,097 that there would be a Dropbox file system in the middle that Dropbox, the company 256 00:15:01,097 --> 00:15:05,897 ran, and we did that from this trust perspective, we wanted to say that we 257 00:15:05,897 --> 00:15:10,217 will run this infallibly when you get that green check mark when it's there. 258 00:15:11,177 --> 00:15:15,077 You know, even if an asteroid destroys the eastern side of the United 259 00:15:15,077 --> 00:15:17,737 States, like we will have things replicated in multiple data centers. 260 00:15:18,267 --> 00:15:22,367 And that you know, and then also it's accessible anywhere 261 00:15:22,367 --> 00:15:23,027 on the internet, right? 262 00:15:23,027 --> 00:15:24,287 You can go to the library. 263 00:15:24,347 --> 00:15:26,897 This is not so common these days but I remember when I was a student, like, 264 00:15:26,897 --> 00:15:29,747 go to the library, log into Dropbox and read all your things right? 265 00:15:30,030 --> 00:15:31,800 rather than having to bring a USB stick around. 266 00:15:32,180 --> 00:15:36,570 And so I think that is the first decision, but it's not necessary, right? 267 00:15:36,570 --> 00:15:39,450 Like there were plenty of distributed, entirely peer to 268 00:15:39,450 --> 00:15:42,090 peer file syncing, designs, right? 269 00:15:42,420 --> 00:15:44,700 And so that was the first decision. 270 00:15:44,970 --> 00:15:48,680 And I think the kind of second decision was that if we imagine our desktop and 271 00:15:48,680 --> 00:15:52,760 our laptop and you have the server in the middle, the desktop might be on 272 00:15:52,760 --> 00:15:55,760 Windows, the laptop might be on Mac OS. 273 00:15:56,030 --> 00:15:59,360 So I think that decision to support multiple platforms. 274 00:15:59,705 --> 00:16:01,745 Is like another really interesting one. 275 00:16:02,105 --> 00:16:05,855 This is like where I think Git and Dropbox can be a little bit different. 276 00:16:06,065 --> 00:16:09,395 And that Git is at the end of the day quite Linux centric. 277 00:16:09,605 --> 00:16:11,965 It's case sensitive for its file system. 278 00:16:12,275 --> 00:16:15,875 It deals with directories and it makes particular assumptions about 279 00:16:15,875 --> 00:16:17,195 how directories should behave. 280 00:16:17,555 --> 00:16:19,335 And that was something with Dropbox. 281 00:16:19,335 --> 00:16:22,575 We wanted to be consumer, we wanted to support everything and we wanted 282 00:16:22,575 --> 00:16:24,225 it to feel very automatic, right? 283 00:16:24,225 --> 00:16:28,095 That like, someone shouldn't have to understand like what a, like unicode, 284 00:16:28,095 --> 00:16:29,895 normalization disagreement means. 285 00:16:29,895 --> 00:16:30,285 Right? 286 00:16:30,495 --> 00:16:34,275 Where in Git like in really bad settings, like you might have to understand 287 00:16:34,275 --> 00:16:38,205 that, that you're right, you with an accent differently on Mac and Windows. 288 00:16:38,732 --> 00:16:40,555 so I think that's the kind of like next, side. 289 00:16:40,555 --> 00:16:43,645 So then Dropbox has its design for a file system and it's a 290 00:16:43,645 --> 00:16:47,925 central, it's like the hub and all those folks are your phone, your. 291 00:16:48,258 --> 00:16:49,925 desktop, your laptop and whatnot. 292 00:16:50,385 --> 00:16:53,288 and then so to kind of get down to the details a bit more. 293 00:16:53,468 --> 00:16:56,618 So then, yeah, we have a process that runs on your computer, that's the 294 00:16:56,618 --> 00:17:02,528 Dropbox app, and that watches all of the files on your file system, and then 295 00:17:02,528 --> 00:17:07,088 it looks at what's happened and then syncs them up to the Dropbox server. 296 00:17:07,268 --> 00:17:10,448 And then whenever changes happen on the Dropbox server, it syncs them down. 297 00:17:11,062 --> 00:17:15,112 there's another kind of interesting decision here on Dropbox by 298 00:17:15,112 --> 00:17:17,632 design was always like a sidecar. 299 00:17:17,632 --> 00:17:20,782 It's always something that just sits and it looks at your files. 300 00:17:20,782 --> 00:17:23,242 Your files are just regular files on the file system. 301 00:17:23,602 --> 00:17:28,215 And if Dropbox, the app isn't running, your files are there and they're safe, 302 00:17:28,215 --> 00:17:32,445 and it's something that you know, regular apps can just read and write 303 00:17:32,445 --> 00:17:37,795 to, and in some sense like Dropbox was unintentionally local-first 304 00:17:37,815 --> 00:17:39,315 from that perspective, right? 305 00:17:39,345 --> 00:17:42,465 Because it's saying that no matter what happens, your data 306 00:17:42,465 --> 00:17:43,725 is just there and you own it. 307 00:17:44,257 --> 00:17:46,957 and you know, there are other systems, right? 308 00:17:46,957 --> 00:17:52,597 Like if you use NFS a like a network file system, then if you unmount it or 309 00:17:52,597 --> 00:17:53,987 if you lose connection to the server. 310 00:17:54,657 --> 00:17:58,447 You might not be able to actually open any files that you have the metadata for. 311 00:17:58,897 --> 00:17:59,227 Right. 312 00:17:59,227 --> 00:18:04,493 And I remember from a user perspective, the local-first aspect, I really went 313 00:18:04,513 --> 00:18:08,083 through like all the stages where I had a computer that wasn't connected 314 00:18:08,083 --> 00:18:11,983 to the internet yet, and that at some point I had an internet connection. 315 00:18:12,313 --> 00:18:16,957 But, files were always where like everything depended on files. 316 00:18:16,957 --> 00:18:20,647 Like if I didn't have a file, things wouldn't work. 317 00:18:20,647 --> 00:18:22,127 Everything depended on files. 318 00:18:22,127 --> 00:18:26,627 There were barely websites that where you could do meaningful things. 319 00:18:26,947 --> 00:18:30,130 certainly web apps weren't very common yet. 320 00:18:30,640 --> 00:18:35,410 And then Dropbox made everything seamlessly work together. 321 00:18:35,950 --> 00:18:41,140 And then when web apps and SaaS software more came along, I was a 322 00:18:41,140 --> 00:18:43,240 bit confused because I felt Okay. 323 00:18:43,240 --> 00:18:48,639 I t gives me some collaboration, but seems to be a different kind of collaboration 324 00:18:48,639 --> 00:18:50,499 since I had collaboration before. 325 00:18:50,889 --> 00:18:56,085 But I also understood the limitations of, when I'm working on the same doc 326 00:18:56,085 --> 00:19:00,762 file, through Dropbox, which gets sort of like the first copy, second 327 00:19:00,762 --> 00:19:05,729 copy, third copy, and now I need to somehow manually reconcile that. 328 00:19:05,789 --> 00:19:08,519 And when I saw Google Docs for the first time. 329 00:19:09,149 --> 00:19:14,609 That was really like a revelation because, oh, now we can do this at the same time. 330 00:19:14,609 --> 00:19:19,079 But at the same while I saw that, I still remember the feeling 331 00:19:19,079 --> 00:19:20,969 where, but where are my files? 332 00:19:20,969 --> 00:19:22,409 This is my stuff now. 333 00:19:22,409 --> 00:19:23,459 Where, where is it? 334 00:19:23,909 --> 00:19:29,899 And that trust that you've mentioned with Dropbox, I felt like I lost some, 335 00:19:30,109 --> 00:19:35,549 some control here and it required a lot of trust, in those tools that I 336 00:19:35,549 --> 00:19:37,529 started now step by step, embracing. 337 00:19:37,559 --> 00:19:41,279 And frankly, I think a lot of those tools didn't deserve my trust in hindsight. 338 00:19:41,879 --> 00:19:48,254 I still feel like we've lost something by no longer being able to like call 339 00:19:48,254 --> 00:19:50,624 the foundation our own in a way. 340 00:19:50,954 --> 00:19:54,764 And I'm still hoping that we kind of find the best of both worlds where 341 00:19:54,764 --> 00:19:58,634 we get that seamless collaboration that we now take for granted. 342 00:19:58,634 --> 00:20:00,344 Something like that Figma gives us. 343 00:20:00,682 --> 00:20:06,080 but also the control and just being ready for whatever happens, that's 344 00:20:06,080 --> 00:20:08,330 something Dropbox gave us out of the box. 345 00:20:08,617 --> 00:20:12,037 I just wanna share this sort of like anecdote and like almost 346 00:20:12,037 --> 00:20:15,817 emotional confusion as I walk through those different stages 347 00:20:16,117 --> 00:20:17,997 of how we work with software. 348 00:20:18,837 --> 00:20:19,257 Totally. 349 00:20:19,257 --> 00:20:22,617 And we've ended up in a place that's not great in a lot of ways. 350 00:20:22,617 --> 00:20:22,977 Right. 351 00:20:22,977 --> 00:20:27,867 And I think you know, I think part of the sad thing, and maybe from 352 00:20:27,897 --> 00:20:32,907 even like an operating systems design perspective is that I feel like files 353 00:20:32,907 --> 00:20:35,007 have lots of design decisions that are. 354 00:20:35,472 --> 00:20:36,672 Packaged up together. 355 00:20:36,972 --> 00:20:39,972 You know, like one of the amazing things about files is that 356 00:20:39,972 --> 00:20:41,322 they're self-contained, right? 357 00:20:41,502 --> 00:20:44,712 Like on Google, I don't know what Google's backend looks like for Google 358 00:20:44,712 --> 00:20:49,322 Docs, but they probably have like all of the metadata and pieces of the data 359 00:20:49,322 --> 00:20:53,622 spread and different rows in a database and different things in an object store. 360 00:20:53,922 --> 00:20:57,192 And just even thinking about like the physical implementation of that 361 00:20:57,192 --> 00:21:00,522 data, it's like scattered around probably a bunch of servers, right? 362 00:21:00,522 --> 00:21:01,842 Maybe in different data centers. 363 00:21:02,172 --> 00:21:05,712 And there's something really nice about a file where a file is just 364 00:21:05,712 --> 00:21:08,292 like a piece of state, right? 365 00:21:08,292 --> 00:21:09,552 That is just self-contained. 366 00:21:09,912 --> 00:21:13,632 And I think the thing that I think is one of the things I think is very 367 00:21:13,632 --> 00:21:18,042 unfortunate is like from a operating systems perspective is that that decision 368 00:21:18,042 --> 00:21:24,312 then has also been coupled with a very anemic API like with files, they're just 369 00:21:24,342 --> 00:21:30,072 sequences of bytes that can be read and written to and impended and there's 370 00:21:30,072 --> 00:21:31,992 no additional structure beyond that. 371 00:21:32,532 --> 00:21:33,452 And I think like. 372 00:21:33,782 --> 00:21:37,882 Folks the way that things have evolved is that we've given up on, too 373 00:21:37,902 --> 00:21:41,082 have more structure, too make things like Google Docs, too be able to 374 00:21:41,082 --> 00:21:46,032 reconcile and have collaboration and interpret things more than just bites. 375 00:21:46,302 --> 00:21:49,092 We've also given up this ability to package things together. 376 00:21:49,585 --> 00:21:53,873 Mac os had like a very kind of baby step in this direction with I 377 00:21:53,873 --> 00:21:54,803 think they're called bundles. 378 00:21:54,833 --> 00:21:57,863 Like the things where like if you have like your.app, they're 379 00:21:57,863 --> 00:21:59,363 actually a zip file, right? 380 00:21:59,610 --> 00:22:03,510 And there's all types of ways, all types of brain damage for how this 381 00:22:03,510 --> 00:22:05,130 like, doesn't actually work well. 382 00:22:05,130 --> 00:22:05,670 You know? 383 00:22:05,670 --> 00:22:07,740 But the idea is kind of interesting, right? 384 00:22:07,740 --> 00:22:10,950 It's like what if files had some more structure and what if you still 385 00:22:10,950 --> 00:22:15,300 considered something, an atomic unit, but then it had pieces of it that 386 00:22:15,300 --> 00:22:17,100 weren't just uninterpretable bites. 387 00:22:17,400 --> 00:22:20,705 And I think that's like, the path dependent, way that we've 388 00:22:20,705 --> 00:22:21,665 ended up where we are today. 389 00:22:22,295 --> 00:22:23,075 That makes sense. 390 00:22:23,165 --> 00:22:28,525 So going back to the sync engine implementation did the Python process 391 00:22:28,525 --> 00:22:34,195 back in the day did that mostly index all of the files and then actually 392 00:22:34,218 --> 00:22:39,394 send across the actual bites probably in some chunks, across the wire? 393 00:22:39,394 --> 00:22:45,468 Or was there some more intelligent and diffing happening client side 394 00:22:45,588 --> 00:22:50,601 that you would only send kind of the changes across the wire and how do I 395 00:22:50,601 --> 00:22:55,341 need to think about what is a change when I'm dealing with like a ton of 396 00:22:55,341 --> 00:22:57,561 bites before and a ton of bites after? 397 00:22:57,924 --> 00:22:58,074 Yeah. 398 00:22:58,074 --> 00:22:59,364 It's really, really good questions. 399 00:22:59,364 --> 00:23:03,784 I think maybe like the first starting point is that like files 400 00:23:03,784 --> 00:23:07,464 in Dropbox were stored, just broken up into four megabyte chunks. 401 00:23:07,734 --> 00:23:10,764 And that was just a decision at the very beginning to pick some size. 402 00:23:11,394 --> 00:23:15,384 And on the server, the way that those chunks were stored is that they, 403 00:23:15,414 --> 00:23:20,364 each four megabyte chunk was stored by key to by its shot to 56 hash. 404 00:23:20,764 --> 00:23:22,834 So we would assume that those are globally unique. 405 00:23:23,074 --> 00:23:27,514 So then if you had the same copy of a bunch of file, or you had 406 00:23:27,514 --> 00:23:30,964 a file copied many times in your Dropbox, we would only store it once. 407 00:23:31,504 --> 00:23:34,654 And that would just happen organically because we would say 408 00:23:34,654 --> 00:23:38,974 like, okay, I looked at this file, it has three chunks A, B, and C. 409 00:23:39,364 --> 00:23:42,964 And then the client would ask the server, do you have A, B, and C? 410 00:23:43,294 --> 00:23:47,734 Like the server would say, yes, I have B and C already, please send A, then we 411 00:23:47,734 --> 00:23:52,098 would upload A. so there was already like at the file level there was this like 412 00:23:52,098 --> 00:23:55,728 kind of very coarse grained Delta sync. 413 00:23:56,071 --> 00:23:57,708 at the four megabyte chunk layer. 414 00:23:58,231 --> 00:24:01,748 and then the kind of, it's funny, these things evolve, right? 415 00:24:01,748 --> 00:24:05,228 Like then the next thing we layered on up top was that in that setting where 416 00:24:05,228 --> 00:24:09,398 you decided B and C were there already and you needed to upload a then with 417 00:24:09,428 --> 00:24:15,308 a, the desktop client could use rsync to know that there was previously a 418 00:24:15,308 --> 00:24:19,568 prime and do a patch between the two and then send just those contents. 419 00:24:19,918 --> 00:24:23,578 the kind of thing that was pretty interesting is that a lot of the content 420 00:24:23,578 --> 00:24:29,588 on Dropbox was very incompressible stuff like video, images, so the 421 00:24:29,784 --> 00:24:34,314 benefits of deduplication both across users or even within a user. 422 00:24:34,524 --> 00:24:39,984 And the benefit of like rsync was not actually as much as one might think, 423 00:24:40,434 --> 00:24:43,824 at least from the like, terms of bandwidth going through the system. 424 00:24:43,854 --> 00:24:47,534 It wasn't that reductive because a lot of this content was just kind of unique and 425 00:24:48,104 --> 00:24:50,364 not getting updated in small patches. 426 00:24:51,429 --> 00:24:56,559 And on your server side, blob store, now that you had those hashes for those four 427 00:24:56,559 --> 00:25:02,619 megabyte chunks, that also means that you could probably deduplicate some content 428 00:25:02,679 --> 00:25:08,979 across users, which makes me think of all sorts of other implications of that. 429 00:25:09,069 --> 00:25:12,369 When do you know it's safe to let go of a junk? 430 00:25:12,736 --> 00:25:16,876 do you also now know that, you could kind of go backwards and 431 00:25:16,876 --> 00:25:20,506 say like, oh, from this hash, we know this is sensitive content. 432 00:25:20,986 --> 00:25:25,993 And have some further implications for, whatever we don't need to go too 433 00:25:25,993 --> 00:25:28,659 much into depth on that now, but, yeah. 434 00:25:28,659 --> 00:25:32,259 I'm curious like how you thought of those design decisions and 435 00:25:32,259 --> 00:25:33,549 the possible implications. 436 00:25:34,119 --> 00:25:34,479 Yeah. 437 00:25:34,539 --> 00:25:38,289 Yeah, for the first one yeah, like distributed garbage collection 438 00:25:38,289 --> 00:25:39,759 was a very hard problem for us. 439 00:25:39,819 --> 00:25:44,349 We called it vacuuming and in terms of making Dropbox economics work out 440 00:25:44,349 --> 00:25:48,963 of, like, when we couldn't afford to keep a lot of content that was deleted 441 00:25:48,963 --> 00:25:50,283 that we couldn't charge users for. 442 00:25:50,583 --> 00:25:54,453 So that was you know, there's all additional complexity where different 443 00:25:54,453 --> 00:25:58,389 users would have like the ability to restore for different periods of time. 444 00:25:58,509 --> 00:26:01,689 So we would say like, anything that's deleted, it doesn't actually 445 00:26:01,689 --> 00:26:05,199 get deleted for 30 days or a year or whatnot based on their plan. 446 00:26:05,583 --> 00:26:09,813 so then, yeah, like having to do this like big distributed mark and 447 00:26:09,813 --> 00:26:14,043 sweep garbage collection algorithm across hundreds of petabytes, 448 00:26:14,043 --> 00:26:18,243 exabytes of content that was something that we had to get pretty good at. 449 00:26:18,243 --> 00:26:23,006 And when we designed Magic Pocket, where we, implemented S3 in-house, we 450 00:26:23,006 --> 00:26:28,226 had specific primitives for making it a little bit easier to avoid race conditions 451 00:26:28,226 --> 00:26:31,016 where like, if a file was deleted. 452 00:26:31,961 --> 00:26:34,601 And we decided that no one needed it anymore. 453 00:26:34,631 --> 00:26:38,241 But then just at that point in time, someone uploads it again, making sure 454 00:26:38,241 --> 00:26:40,421 that we don't accidentally delete it. 455 00:26:40,781 --> 00:26:43,481 So that was like, yeah, definitely a very tricky problem. 456 00:26:43,531 --> 00:26:48,614 And I think in retrospect this is like an interesting design exercise, right? 457 00:26:48,614 --> 00:26:52,784 And that if deduplication wasn't actually that valuable for us, we could have 458 00:26:52,934 --> 00:26:57,464 eliminated a lot of complexity for this garbage collection by not doing it right. 459 00:26:58,001 --> 00:26:59,671 I think for the second thing, yeah. 460 00:26:59,671 --> 00:27:06,731 So at the beginning when Dropbox started, if you had a file with A, B and C and you 461 00:27:06,731 --> 00:27:10,831 uploaded it, it would just check, does A, B and C exist anywhere in Dropbox? 462 00:27:11,351 --> 00:27:16,958 And, that got changed over time to be does do you as your user 463 00:27:17,348 --> 00:27:18,948 have access to A, B, and C? 464 00:27:19,448 --> 00:27:24,143 And you know, 'cause otherwise you could use this for all types of purposes, right? 465 00:27:24,143 --> 00:27:27,583 To see if there exists some content anywhere in Dropbox. 466 00:27:27,613 --> 00:27:32,573 And, that was something where we would in the case where the user was 467 00:27:32,573 --> 00:27:38,033 uploading A, B, and C, say none of them were present in their account, we would 468 00:27:38,033 --> 00:27:42,833 actually force them to upload it, incur the bandwidth for doing so, and then 469 00:27:42,833 --> 00:27:45,173 discard it if B and C existed elsewhere. 470 00:27:46,085 --> 00:27:46,345 Yeah. 471 00:27:46,345 --> 00:27:47,091 Very interesting. 472 00:27:47,121 --> 00:27:50,878 I mean, this would be an interesting rabbit hole just to go down just the 473 00:27:50,878 --> 00:27:54,658 kind of second order effects of that design decision, particularly at 474 00:27:54,658 --> 00:27:56,783 the scale and importance of Dropbox. 475 00:27:57,083 --> 00:27:59,213 But maybe we save that for another time. 476 00:27:59,513 --> 00:28:04,359 So going back to the sync engine, now that we have a better understanding of, how it 477 00:28:04,359 --> 00:28:06,999 worked in that shape and form back then. 478 00:28:07,449 --> 00:28:12,219 You've been already mentioning before, like as things as usage went through 479 00:28:12,219 --> 00:28:16,813 the roof, all sorts of different usage scenarios also expanded. 480 00:28:17,268 --> 00:28:22,749 you had all sorts of more esoteric ways, how you didn't kind of even think 481 00:28:22,809 --> 00:28:25,209 before that it would be used this way. 482 00:28:25,239 --> 00:28:27,369 Now all of that came to light. 483 00:28:28,099 --> 00:28:33,216 I'm curious which sort of, helper systems you put in place that you could 484 00:28:33,216 --> 00:28:39,446 even have a grasp of what's going on since a part of the trust that Dropbox 485 00:28:39,586 --> 00:28:44,476 owned or that earned over time, was probably also related to privacy. 486 00:28:44,716 --> 00:28:49,126 So you, you couldn't just like read everything that's going on in someone's 487 00:28:49,126 --> 00:28:54,766 system, so you're probably also relying to some degree on the help of a user 488 00:28:55,036 --> 00:28:57,076 that they like send something over. 489 00:28:57,076 --> 00:28:57,406 Yeah. 490 00:28:57,436 --> 00:29:02,716 Walk me through like the evolution of that and that you, like as 491 00:29:02,716 --> 00:29:06,376 an engineer, if there's a bug reproducing that bug is everything. 492 00:29:07,006 --> 00:29:09,316 So walk me through that process. 493 00:29:09,766 --> 00:29:13,306 Yeah, and you know, like we had a very strict rule, right, where it just, 494 00:29:13,366 --> 00:29:15,316 we do not look at content, right? 495 00:29:15,773 --> 00:29:20,323 and so that was the thing when debugging issues, the saving grace is 496 00:29:20,323 --> 00:29:22,573 that for most of the issues we saw. 497 00:29:22,923 --> 00:29:28,003 They were more metadata issues around like sync, not converging or sync, getting 498 00:29:28,003 --> 00:29:32,383 to the client thinking it's in sync with the server, but them disagreeing. 499 00:29:32,691 --> 00:29:35,799 so we had a few pretty, yeah, like pretty interesting 500 00:29:35,799 --> 00:29:37,539 supporting algorithms for this. 501 00:29:37,569 --> 00:29:41,769 So one of them was just simple like hang detection, like making sure, like 502 00:29:41,949 --> 00:29:45,249 if, when should a client reasonably expect that they are in sync? 503 00:29:45,869 --> 00:29:49,439 And if they're online and if they've downloaded all the recent 504 00:29:49,439 --> 00:29:53,189 versions and things are getting stuck, why are they getting stuck? 505 00:29:53,189 --> 00:29:55,649 So are they getting stuck because they can't read stuff from the 506 00:29:55,649 --> 00:29:57,749 server, either metadata or data? 507 00:29:57,959 --> 00:30:00,509 Are they getting stuck because they can't write to the file system and 508 00:30:00,509 --> 00:30:01,819 there's some permission errors? 509 00:30:02,079 --> 00:30:06,683 So I think having very fine-grained classification of that and having the 510 00:30:06,683 --> 00:30:11,653 client do that in a way that's like not including any private information and 511 00:30:11,653 --> 00:30:14,753 sending that up for reports and then aggregating that over all of the clients 512 00:30:14,753 --> 00:30:19,643 and being able to classify was a big part of us being able to get a handle on it. 513 00:30:20,059 --> 00:30:23,699 And I think this is just generally very useful for these sync engines. 514 00:30:23,996 --> 00:30:27,056 the biggest return on investment we got was from consistency checkers. 515 00:30:27,676 --> 00:30:32,949 So part of sync is that there's the same data duplicated in many places, right? 516 00:30:33,219 --> 00:30:36,849 Like, so we had the data that's on the user's local file system. 517 00:30:37,179 --> 00:30:41,199 We had all of the metadata that we stored in SQLite or we would store like what 518 00:30:41,199 --> 00:30:42,939 we think should be on the file system. 519 00:30:43,689 --> 00:30:46,569 We would store what the latest view from the server was. 520 00:30:46,569 --> 00:30:49,509 We would store things that were in progress, and then we have 521 00:30:49,509 --> 00:30:50,589 what's stored on the server. 522 00:30:50,799 --> 00:30:55,269 And for each one of those like hops, we would have a consistency checker that 523 00:30:55,269 --> 00:30:57,639 would go and see if those two matched. 524 00:30:57,969 --> 00:31:02,139 And those would, that was like the highest return on investment we got. 525 00:31:02,139 --> 00:31:05,649 Because before we had that, people would write in and they would 526 00:31:05,649 --> 00:31:07,179 complain that Dropbox wasn't working. 527 00:31:07,779 --> 00:31:10,509 And until we had these consistency checkers, we had no idea the 528 00:31:10,509 --> 00:31:13,419 order of magnitude of how many issues were happening. 529 00:31:13,869 --> 00:31:16,029 And when we started doing it, we're like, wow. 530 00:31:16,599 --> 00:31:17,379 There's actually a lot. 531 00:31:18,026 --> 00:31:22,886 So a consistency check in this regard was mostly like a hash over some 532 00:31:22,886 --> 00:31:24,506 packets that you're sending around. 533 00:31:24,866 --> 00:31:30,326 And with that you could verify, okay, up until like from A to B to C to D, we're 534 00:31:30,326 --> 00:31:35,816 all seeing the same hash, but suddenly on the hop from D to E, the hash changes. 535 00:31:35,876 --> 00:31:36,266 Ah-huh. 536 00:31:36,296 --> 00:31:37,196 Let's investigate. 537 00:31:37,736 --> 00:31:38,396 Exactly. 538 00:31:38,726 --> 00:31:42,926 And so, and to do that in a way that's respectful of the users, 539 00:31:42,986 --> 00:31:45,356 even like resources on their system. 540 00:31:45,356 --> 00:31:50,006 Like we wouldn't just go and blast their CPU and their disc and their network to go 541 00:31:50,006 --> 00:31:51,836 and like turn through a bunch of things. 542 00:31:51,836 --> 00:31:54,896 So we would have like a sampling process where we like sample a random 543 00:31:54,896 --> 00:31:58,166 path in the tree and the client and do it the same on the server. 544 00:31:58,463 --> 00:32:02,333 we would have stuff with like Merkle trees and then when things would diverge, 545 00:32:02,333 --> 00:32:07,643 we would try to see like, is there a way we can compare on the client and see like 546 00:32:07,643 --> 00:32:12,004 for example one of the kind of really important, goals for us as an operational 547 00:32:12,004 --> 00:32:14,494 team was to have like the power of zero. 548 00:32:14,764 --> 00:32:17,464 I think it might be from AWS or something. 549 00:32:17,464 --> 00:32:19,294 My co-founder James, has a really good talk on it. 550 00:32:19,764 --> 00:32:25,704 but we would want to have a metric of saying that the number of unexplained 551 00:32:25,764 --> 00:32:28,790 inconsistencies is zero and one 'cause. 552 00:32:28,790 --> 00:32:31,730 Then the nice thing right, is that if it's a zero and it regresses, 553 00:32:31,730 --> 00:32:33,080 you know that it's a regression. 554 00:32:33,350 --> 00:32:38,780 If it's at like fluctuating at like 15 or like a hundred thousand and it kind 555 00:32:38,780 --> 00:32:42,530 of goes up by 5%, it's very hard to know when evaluating a new release, right? 556 00:32:42,530 --> 00:32:44,390 That like that's actually safe or not. 557 00:32:44,824 --> 00:32:49,204 so then that would mean that whenever we would have an inconsistency due to a bit 558 00:32:49,204 --> 00:32:55,234 flip, which we would see all the time on client devices, then we would have to 559 00:32:55,444 --> 00:32:57,454 categorize that and then bucket that out. 560 00:32:57,604 --> 00:32:58,804 So we would have a baseline. 561 00:32:59,659 --> 00:33:03,319 Expectation of how many bit flips there are across all of the devices on Dropbox. 562 00:33:03,679 --> 00:33:06,589 And we would see that that's staying consistent or increasing or 563 00:33:06,589 --> 00:33:09,829 decreasing, and that the number of unexplained things was still at zero. 564 00:33:10,215 --> 00:33:12,885 now let's take those detours since you got me curious. 565 00:33:13,125 --> 00:33:16,065 Uh, what would cause bit flips on a local device? 566 00:33:16,602 --> 00:33:20,982 I think a few, few causes, one of them is just that in the data center, most 567 00:33:20,982 --> 00:33:24,822 memory uses error correction and you have to pay more for it, usually have to pay 568 00:33:24,822 --> 00:33:26,472 more for a motherboard that supports it. 569 00:33:26,862 --> 00:33:27,672 at least back then. 570 00:33:27,736 --> 00:33:30,532 now like on client devices we don't have that. 571 00:33:30,602 --> 00:33:34,302 So this is a little bit above my pay grade for hardware cosmic 572 00:33:34,302 --> 00:33:36,632 rays or thermal noise or whatever. 573 00:33:36,632 --> 00:33:40,002 But memory is much more resilient in the data center. 574 00:33:40,315 --> 00:33:44,355 I think another is just that, storage devices are very greatly in quality. 575 00:33:44,415 --> 00:33:49,335 Like your SSDs and your hard drives are much higher quality inside the data 576 00:33:49,335 --> 00:33:51,495 center than they are on local devices. 577 00:33:51,855 --> 00:33:52,515 And so. 578 00:33:53,160 --> 00:33:54,150 You know, there's that. 579 00:33:54,447 --> 00:33:57,297 it also could be like I had mentioned that people have all 580 00:33:57,297 --> 00:33:58,797 types of weird configurations. 581 00:33:59,097 --> 00:34:03,387 Like on Mac there are all these kernel extensions on Windows, there's 582 00:34:03,387 --> 00:34:05,007 all of these mini filter drivers. 583 00:34:05,007 --> 00:34:07,437 There are all these things that are interposing between 584 00:34:07,827 --> 00:34:11,127 Dropbox, the user space process and writing to the file system. 585 00:34:11,427 --> 00:34:15,297 And if those have any memory safety issues where they're corrupting memory 586 00:34:15,387 --> 00:34:19,434 'cause of the written in archaic C you know, or something that that's 587 00:34:19,454 --> 00:34:20,654 the way things can get corrupted. 588 00:34:20,834 --> 00:34:22,244 I mean, we've seen all types of things. 589 00:34:22,244 --> 00:34:26,709 We've seen network routers get having corrupting data, but usually 590 00:34:26,924 --> 00:34:28,394 that fails some checksum, right? 591 00:34:28,424 --> 00:34:33,464 Or we've seen even registers on CPUs being bad where the memory gets replaced 592 00:34:33,614 --> 00:34:38,114 and the memory seems like it's fine, but then it just turns out the CPU has its 593 00:34:38,114 --> 00:34:40,214 own registers on CHIP that are busted. 594 00:34:40,214 --> 00:34:44,204 And so all of that stuff I think just can happen at scale. 595 00:34:44,234 --> 00:34:44,624 Right. 596 00:34:45,050 --> 00:34:45,770 that makes sense. 597 00:34:45,770 --> 00:34:51,774 And I'm happy to say that I've hadn't had yet to worry about flip bits, whether 598 00:34:51,774 --> 00:34:56,824 it's being for storage or other things, but huge respect to whoever had already 599 00:34:56,824 --> 00:34:59,591 to, tame those parts of the system. 600 00:34:59,951 --> 00:35:05,444 So, you mentioning the consistency check as probably the biggest lever that you 601 00:35:05,444 --> 00:35:11,324 had to understand which health stage your sync engine is in the first place. 602 00:35:11,698 --> 00:35:18,928 was this the only kind of metric and proxy for understanding with how well 603 00:35:18,928 --> 00:35:22,618 the syn system is working or were there some other aspects that gave 604 00:35:22,618 --> 00:35:25,618 you visibility both macro and micro? 605 00:35:26,071 --> 00:35:30,511 Yeah, I mean, I think this yeah, the kind of hangs, so like knowing 606 00:35:30,511 --> 00:35:33,991 that something gets to a sync state and knowing the duration, right? 607 00:35:33,991 --> 00:35:38,514 So the kind of performance of that was one of our top line metrics. 608 00:35:38,514 --> 00:35:40,474 And the other one was this consistency check. 609 00:35:40,814 --> 00:35:43,524 And then first specific like operations, right? 610 00:35:43,524 --> 00:35:47,374 Like uploading a file, like how much bandwidth are people able to use 611 00:35:47,624 --> 00:35:53,124 because for like, people wanted to use Dropbox, but, and upload lots, 612 00:35:53,124 --> 00:35:57,324 like huge data, like huge number of files where each file is really large. 613 00:35:57,594 --> 00:36:01,584 And then they might do it on in Australia or Japan where they're 614 00:36:01,944 --> 00:36:03,234 far away from a data center. 615 00:36:03,234 --> 00:36:06,774 So latency is high, but bandwidth is very high too, right? 616 00:36:06,774 --> 00:36:09,914 So making sure that we could fully saturate their pipes and all 617 00:36:09,914 --> 00:36:12,114 types of stuff with debugging. 618 00:36:12,724 --> 00:36:13,654 Things in the internet, right? 619 00:36:13,654 --> 00:36:16,774 People having really bad routes to AWS and all that. 620 00:36:16,974 --> 00:36:18,324 so we would track things like that. 621 00:36:18,568 --> 00:36:20,968 I think other than that it was mostly just the usual quality stuff, 622 00:36:20,968 --> 00:36:25,298 like just exceptions and making sure that features all work. 623 00:36:25,388 --> 00:36:30,154 I think when we rewrote this system and we, designed it to be very correct. 624 00:36:30,274 --> 00:36:34,404 We moved a lot of these things into testing before we would release. 625 00:36:35,024 --> 00:36:38,734 So we this is I think one of the, to jump ahead a little bit, we designed, 626 00:36:38,794 --> 00:36:44,974 decided to rewrite Dropbox's sync engine from this big Python code base into Rust. 627 00:36:45,304 --> 00:36:49,294 And one of the specific design decisions was to make things extremely testable. 628 00:36:49,729 --> 00:36:53,239 So we would have everything be deterministic on a single thread, 629 00:36:53,509 --> 00:36:56,989 have all of the reads and rights to the network and file system, 630 00:36:56,989 --> 00:36:59,119 be, through a virtualized API. 631 00:36:59,416 --> 00:37:03,616 So then we could run all of these simulations of exploring what would 632 00:37:03,616 --> 00:37:08,026 happen if you uploaded a file here and deleted it concurrently and then had a 633 00:37:08,026 --> 00:37:09,976 network issue that forced you to retry. 634 00:37:10,306 --> 00:37:14,716 And so by simulating all of those in ci, we would be able to then have very 635 00:37:14,716 --> 00:37:18,466 strong in variance about them that knowing that like a file should never 636 00:37:18,466 --> 00:37:21,796 get deleted in this case, or that it should always converge, or things 637 00:37:21,796 --> 00:37:26,326 like the sharing that this file should never get exposed to this other viewer. 638 00:37:26,904 --> 00:37:31,043 I think like the, having much, like having stronger guarantees was something 639 00:37:31,043 --> 00:37:36,443 that we only could really do effectively once we designed the system to make 640 00:37:36,443 --> 00:37:38,093 it easy to test those guarantees. 641 00:37:38,828 --> 00:37:39,188 Right. 642 00:37:39,188 --> 00:37:40,268 That makes a lot of sense. 643 00:37:40,268 --> 00:37:43,568 And I think we're seeing more and more systems, also in the 644 00:37:43,568 --> 00:37:45,704 database world, embrace this. 645 00:37:45,704 --> 00:37:49,012 I think TigerBeetle is, is quite popular for that. 646 00:37:49,394 --> 00:37:53,828 I think the folks at Torso are now also embracing this approach. 647 00:37:54,102 --> 00:37:56,772 I think it goes under the umbrella of simulation testing. 648 00:37:57,218 --> 00:37:58,448 that sounds very interesting. 649 00:37:58,448 --> 00:38:03,788 Can you explain a little bit more how maybe in a much smaller program would 650 00:38:03,788 --> 00:38:08,318 this basically be Just that every assumption and any potential branch, 651 00:38:08,348 --> 00:38:13,958 any sort of side effect thing that might impact the execution of my program. 652 00:38:13,958 --> 00:38:19,868 Now I need to make explicit and it's almost like a parameter that I put into 653 00:38:19,868 --> 00:38:25,735 the arguments of my functions and now I call it under these circumstances, and I 654 00:38:25,735 --> 00:38:31,375 can therefore simulate, oh, if that file suddenly gives me an unexpected error. 655 00:38:31,675 --> 00:38:33,385 Then this is how we're gonna handle it. 656 00:38:33,865 --> 00:38:34,795 Yeah, exactly. 657 00:38:34,795 --> 00:38:38,845 So it's like and there's techniques that like the TigerBeetle folks, like 658 00:38:38,845 --> 00:38:42,745 we, we do this at Convex in rust with the right, like abstractions, there's like 659 00:38:42,745 --> 00:38:45,235 techniques to make it not so awkward. 660 00:38:45,235 --> 00:38:50,815 But yeah, it is like this idea of like, can you pin all of the non-determinism in 661 00:38:50,815 --> 00:38:54,895 the system can, whether it's like reading from a random number generator, whether 662 00:38:54,895 --> 00:38:58,765 it's looking at time, whether it's reading and writing to files or the network. 663 00:38:58,945 --> 00:39:04,425 Can that all be like pulled out so that in, production it's just using the 664 00:39:04,425 --> 00:39:06,865 random AP or the regular APIs for it. 665 00:39:07,258 --> 00:39:10,558 so there's like for any of these sync engines, there's a core 666 00:39:10,558 --> 00:39:13,318 of the system which represents all the sync rules, right? 667 00:39:13,318 --> 00:39:16,198 Like when I get a new file from the server, what do I do? 668 00:39:16,528 --> 00:39:19,468 You know, if there's a concurrent edit to this, what do I do? 669 00:39:19,748 --> 00:39:23,953 and that I. Core of the code is often the part that has the most bugs, right? 670 00:39:23,953 --> 00:39:27,403 It has the, it doesn't think about some of the corner cases or if 671 00:39:27,403 --> 00:39:30,853 there are errors or needs retries or doesn't handle concurrency. 672 00:39:30,853 --> 00:39:32,053 It might have race conditions. 673 00:39:32,323 --> 00:39:36,883 So the kind of, I think the core idea for determination, determin deterministic 674 00:39:36,883 --> 00:39:43,033 simulation testing is to take that core and just kind of like pull out all of the 675 00:39:43,033 --> 00:39:45,283 non-determinism from it into an interface. 676 00:39:45,403 --> 00:39:49,213 So time randomness, reading and writing to the network, reading 677 00:39:49,213 --> 00:39:52,753 and writing to the file system, and making it so that in production, 678 00:39:52,933 --> 00:39:54,703 those are just using the regular APIs. 679 00:39:55,033 --> 00:39:58,873 But in a testing situation, those can be using mocks. 680 00:39:59,023 --> 00:40:02,383 Like they could be using things that for a particular test 681 00:40:02,383 --> 00:40:06,253 and wants to test a scenario or setting it up in a specific way. 682 00:40:06,673 --> 00:40:09,223 Or it could be randomized, right? 683 00:40:09,223 --> 00:40:14,543 Where it might be that reading from Like time, the test framework might 684 00:40:14,603 --> 00:40:18,923 decide pseudo randomly to advance it or to keep it at the current time or 685 00:40:18,923 --> 00:40:20,873 might serialize things differently. 686 00:40:21,143 --> 00:40:27,293 And that type of ability to have random search explore the state space of 687 00:40:27,353 --> 00:40:30,833 all the things that are possible is just one of those like unreasonably 688 00:40:30,833 --> 00:40:32,813 effective ideas, I think for testing. 689 00:40:33,203 --> 00:40:37,373 And then that like getting a system to pass that type of 690 00:40:37,373 --> 00:40:38,963 deterministic simulation testing. 691 00:40:39,503 --> 00:40:42,893 It's not at the threshold of having formal verification, but in our 692 00:40:42,893 --> 00:40:47,457 experience it's pretty close and with a much, much, smaller amount of work. 693 00:40:48,117 --> 00:40:50,427 And you mentioning Haskell at the beginning? 694 00:40:50,457 --> 00:40:55,467 I still remember when I, after a a lot of time having spent writing unit tests in 695 00:40:55,517 --> 00:41:00,017 JavaScript and I, back then, in the other order, I first had JavaScript and then I 696 00:41:00,017 --> 00:41:04,817 learned Haskell, and then I found quick test and was quick test, Quick Check. 697 00:41:05,183 --> 00:41:06,113 which one was it? 698 00:41:06,563 --> 00:41:07,493 I think it was Quick check, right? 699 00:41:07,873 --> 00:41:08,383 Well, right. 700 00:41:08,383 --> 00:41:13,424 So I found Quick Check and I could express sort of like, Hey, this is this type. 701 00:41:13,664 --> 00:41:18,614 It has sort of those aspects to it, those invariants and then would just 702 00:41:18,614 --> 00:41:20,534 go along and test all of those things. 703 00:41:20,534 --> 00:41:23,564 Like, wait, I never thought of that, but of course, yes. 704 00:41:23,864 --> 00:41:27,824 And then you combine those and you would get way too lazy to write unit 705 00:41:27,824 --> 00:41:32,354 tests for the combinatorial explosion of like all of your different things. 706 00:41:32,354 --> 00:41:36,494 And then you can say, sample it like that, and like, focus on this. 707 00:41:36,778 --> 00:41:40,958 and so I actually also, started embracing this practice a lot more in the 708 00:41:40,958 --> 00:41:45,488 TypeScript work that I'm doing through a great project called Prop Check. 709 00:41:45,994 --> 00:41:52,069 and that is, picking up the same ideas and for particularly those 710 00:41:52,069 --> 00:41:56,509 sort of scenarios where, okay, Murphy's Law will come and haunt you. 711 00:41:56,969 --> 00:41:58,829 this is in distributed systems. 712 00:41:58,829 --> 00:42:00,509 That is typically the case. 713 00:42:00,796 --> 00:42:05,623 Building things in such a way where all the aspects can be, specifically 714 00:42:05,623 --> 00:42:07,873 injected and the, the sweet spot. 715 00:42:07,873 --> 00:42:12,043 If you can do so still in an ergonomic way, I think that's the way to go. 716 00:42:13,063 --> 00:42:15,373 It's so, so valuable, right? 717 00:42:15,373 --> 00:42:15,643 And yeah. 718 00:42:15,643 --> 00:42:20,323 And yeah, the ability to, for prop tasks, for quick check for all of these to 719 00:42:20,323 --> 00:42:23,113 also minimize is just magical, right? 720 00:42:23,113 --> 00:42:27,023 Like it comes up with this crazy counter example and it might be 721 00:42:27,143 --> 00:42:31,693 like a list with 700 elements, but then is able to shrink it down to 722 00:42:31,693 --> 00:42:33,613 the, like, real core of the bug. 723 00:42:33,913 --> 00:42:35,483 It's magic, right? 724 00:42:35,803 --> 00:42:38,038 And you know, I mean, I think this is something like, you know. 725 00:42:38,653 --> 00:42:40,453 A totally different theme, right? 726 00:42:40,453 --> 00:42:44,353 Like one thing at Convex we're exploring a lot is like coding has changed a lot 727 00:42:44,353 --> 00:42:46,423 in the past year with AI coding tools. 728 00:42:46,693 --> 00:42:50,413 And one of the things we've observed for getting coding tools to work very 729 00:42:50,413 --> 00:42:54,763 well with Convex is that these types of like very succinct tests that can 730 00:42:54,763 --> 00:42:59,863 be generated easily and have like a really high strength to weight or power 731 00:42:59,863 --> 00:43:03,449 to weight ratio are just really good for like autonomous coding, right? 732 00:43:03,449 --> 00:43:06,629 Like, if you are gonna take like cursor agent and let it go wild, 733 00:43:06,839 --> 00:43:10,499 like what does it take to just let it operate without you doing anything? 734 00:43:10,589 --> 00:43:13,229 It takes something like a prop test because then it can just continuously 735 00:43:13,229 --> 00:43:18,149 make changes, run the test, and not know that it's done until that test passes. 736 00:43:18,846 --> 00:43:20,316 Yeah, that makes a lot of sense. 737 00:43:20,316 --> 00:43:25,356 So let's go back for a moment to the point where you were just transitioning 738 00:43:25,686 --> 00:43:32,016 from the previous Python based sync engine to the Rust based sync engine. 739 00:43:32,016 --> 00:43:36,963 So you're embracing simulation testing to have a better sense of 740 00:43:36,963 --> 00:43:41,253 like all the different aspects that might influence the outcome here. 741 00:43:41,579 --> 00:43:44,289 walk me through like how you, went about. 742 00:43:44,559 --> 00:43:46,479 Deploying that new system. 743 00:43:46,659 --> 00:43:52,119 Were there any sort of big headaches associated with migrating from the 744 00:43:52,119 --> 00:43:54,249 previous system to the new system? 745 00:43:54,549 --> 00:43:57,849 since you, for everything, you had sort of a defacto source 746 00:43:57,849 --> 00:43:59,979 of truth, which are the files. 747 00:43:59,979 --> 00:44:04,659 So could you maybe just forget everything the old system has done and you just 748 00:44:04,659 --> 00:44:09,646 treat it as like, oh, the, user would've just installed this fresh, walk me 749 00:44:09,646 --> 00:44:14,056 through like how you thought about that since migrating systems on such 750 00:44:14,056 --> 00:44:16,970 a big scale is typically, quite dread 751 00:44:17,340 --> 00:44:19,495 Yeah, dreadsome is, yeah. 752 00:44:19,575 --> 00:44:20,415 appropriate word. 753 00:44:20,720 --> 00:44:26,585 I think one of the biggest challenges was that by design we had a very different 754 00:44:26,675 --> 00:44:29,765 data model for the old sync engine. 755 00:44:29,765 --> 00:44:31,135 We called it sync engine Classic. 756 00:44:31,465 --> 00:44:32,085 Affectionately. 757 00:44:32,225 --> 00:44:34,505 And then we had for Nucleus was a new one. 758 00:44:34,745 --> 00:44:39,695 Nucleus had a very different data model, and the motivation for that was that 759 00:44:40,535 --> 00:44:46,145 sync engine Classic just had a ton of possible states that were illegitimate. 760 00:44:46,505 --> 00:44:50,855 It could, if you had like a, the server update a file and the client update 761 00:44:50,855 --> 00:44:54,665 a file, but then a shared folder gets mounted above it, things could get 762 00:44:54,665 --> 00:45:00,005 into all of these really weird states that were legal but would cause bugs. 763 00:45:00,395 --> 00:45:04,595 And then I think that was like one of the big guiding principles more 764 00:45:04,595 --> 00:45:09,335 than even just like Rust or Python, was just like designing what states 765 00:45:09,335 --> 00:45:14,795 should the system be allowed to be in and design away everything else, 766 00:45:14,795 --> 00:45:16,955 make illegal states unrepresentable. 767 00:45:17,555 --> 00:45:21,215 And so that, what that then meant is once we had that. 768 00:45:21,515 --> 00:45:26,225 When we needed to migrate, we had a long tail of really weird starting positions. 769 00:45:27,855 --> 00:45:33,065 So where you basically realized, okay, this system is in this state A, how the 770 00:45:33,065 --> 00:45:35,195 heck did it ever get into that state? 771 00:45:35,255 --> 00:45:40,175 And B, what are we gonna do about it now where we can basically, 772 00:45:40,175 --> 00:45:43,145 it's like from a mapping function, this is like invalid input. 773 00:45:44,105 --> 00:45:49,862 So can you explain a little bit of like, how you constrained the space of, and how 774 00:45:49,862 --> 00:45:56,075 you designed the space of, legitimate, valid states and what were some of the, 775 00:45:56,075 --> 00:46:00,665 if you think about this as like a big matrix of combinations, what are some 776 00:46:00,665 --> 00:46:06,165 of the more intuitive ones that were, not allowed that you saw quite a bit? 777 00:46:06,975 --> 00:46:13,005 Yeah, so I think part of the difficulty for Dropbox, like as syncing things 778 00:46:13,005 --> 00:46:17,085 from the file system is that file system APIs are really anemic. 779 00:46:17,400 --> 00:46:19,980 File system aPIs don't have transactions. 780 00:46:19,980 --> 00:46:23,010 They don't things can get reordered in all types of ways. 781 00:46:23,190 --> 00:46:26,370 So we would just read and write to files from the local file system, and 782 00:46:26,370 --> 00:46:30,450 we would use file system events on Mac, we would use the equivalent on 783 00:46:30,450 --> 00:46:32,773 Windows and Linux to get, updates. 784 00:46:32,983 --> 00:46:36,403 But everything can be reordered and racy and everything. 785 00:46:36,493 --> 00:46:40,990 So one, like common invariant would be that if you have a 786 00:46:40,990 --> 00:46:44,497 directory you know, like files have to exist within directories. 787 00:46:44,767 --> 00:46:47,887 If a file exists, then it's parent directory exists. 788 00:46:48,397 --> 00:46:51,727 And like simultaneously, if you delete a directory, it shouldn't 789 00:46:51,727 --> 00:46:52,817 have any files within it. 790 00:46:53,967 --> 00:46:57,727 And that invariant guarantees and that the file system is a tree. 791 00:46:57,847 --> 00:46:58,207 Right? 792 00:46:58,537 --> 00:47:03,787 And then we, it's very easy to come up with settings, with reads from the 793 00:47:03,787 --> 00:47:07,687 local file system where if you just naively take that and write it into 794 00:47:07,687 --> 00:47:12,187 your SQLite database, you will end up with data that does not form a tree. 795 00:47:12,815 --> 00:47:16,435 and then especially even with like I know it's being unique, right? 796 00:47:16,435 --> 00:47:22,435 Like if I move a file from A to B, then I might observe the add for it at B 797 00:47:23,825 --> 00:47:28,225 way before the delete at B or I might observe it vice versa, where the file 798 00:47:28,225 --> 00:47:31,435 is transiently gone and disappeared and we definitely don't wanna sync that. 799 00:47:31,795 --> 00:47:37,318 and then with directories, if I have like a, as a directory and then B as 800 00:47:37,318 --> 00:47:43,528 a directory, and then I move it's, I could observe a state where A moves into 801 00:47:43,528 --> 00:47:48,498 B, which then without doing the right bookkeeping, might introduce a cycle in 802 00:47:48,498 --> 00:47:52,188 the graph and a cycle for directories would be really bad news, right? 803 00:47:52,482 --> 00:47:57,072 so all of these invariants were things that the file system APIs, they don't 804 00:47:57,072 --> 00:48:00,732 respect, even though the file system internally has these invariants, right? 805 00:48:01,752 --> 00:48:04,422 You cannot create a direct recycle on any file system. 806 00:48:05,412 --> 00:48:05,802 Definitely. 807 00:48:05,802 --> 00:48:09,989 I mean certainly without root And all of these invariants exist but 808 00:48:09,989 --> 00:48:12,863 are not observable through the APIs. 809 00:48:12,863 --> 00:48:16,583 And so then we sync Engine Classic would get into the state where it's 810 00:48:16,583 --> 00:48:19,793 like local SQLite file would have all types of violations like that. 811 00:48:20,303 --> 00:48:24,473 So then how do we read the tea leaves of like the database is in 812 00:48:24,473 --> 00:48:26,933 a really weird state we can't lose. 813 00:48:26,933 --> 00:48:30,263 And to go back to, I think what you had talked about at the beginning of this was 814 00:48:30,263 --> 00:48:36,293 that we always had the nuclear option of dropping all of our local state and doing 815 00:48:36,293 --> 00:48:38,753 a full resync from the files themselves. 816 00:48:39,143 --> 00:48:42,443 But then the problem is that we would entirely lose user intent. 817 00:48:42,863 --> 00:48:48,323 So if, for example, I was offline for a month and I had a bunch of files, 818 00:48:48,803 --> 00:48:53,153 and then during that month other people in my team deleted those files. 819 00:48:53,791 --> 00:48:58,838 If I came back online and didn't have my local database, we would have to 820 00:48:58,838 --> 00:49:02,828 recreate those files and people would complain about this all the time because. 821 00:49:03,418 --> 00:49:05,738 They would delete something and wanna delete it, and then Dropbox would 822 00:49:05,738 --> 00:49:07,358 just randomly decide to resurrect it. 823 00:49:07,808 --> 00:49:12,441 So those types of decisions we, we tried to avoid that as much as possible, but 824 00:49:12,441 --> 00:49:17,271 then that meant having to look at a potentially really confusing database and 825 00:49:17,271 --> 00:49:19,041 read what the user intent might have been. 826 00:49:19,761 --> 00:49:20,211 Right. 827 00:49:20,481 --> 00:49:24,411 I wanna dig a little bit more into the topic of user intent. 828 00:49:24,441 --> 00:49:30,201 Since with Dropbox you've built a sync engine very specifically for the use 829 00:49:30,201 --> 00:49:36,231 case of file management, et cetera, where user intent has a particular meaning that 830 00:49:36,231 --> 00:49:41,181 might be very different from moving a cursor around in a Google Docs document. 831 00:49:41,511 --> 00:49:47,618 So can you explain a little bit, what are some of the, common scenarios of, and 832 00:49:47,618 --> 00:49:54,408 maybe subtle scenarios of user intent, when it comes to the Dropbox design space? 833 00:49:55,218 --> 00:49:56,178 Yeah, totally. 834 00:49:56,535 --> 00:50:01,515 and I think the for regular things like say editing files. 835 00:50:01,830 --> 00:50:06,420 I think we saw that like people just generally did not, maybe because 836 00:50:06,420 --> 00:50:09,690 of the way the system was even its capabilities, people did not 837 00:50:09,690 --> 00:50:11,820 edit the same files all too often. 838 00:50:12,090 --> 00:50:17,033 So maintaining user intent when file, when everyone is online, just kind of 839 00:50:17,333 --> 00:50:21,563 taking last writer wins Where I think user intent became very interesting is 840 00:50:21,593 --> 00:50:26,583 if someone went offline, like they're on an airplane before wifi and airplanes 841 00:50:27,026 --> 00:50:30,746 And they worked on their document and someone else worked on the same time. 842 00:50:31,346 --> 00:50:35,906 In that case, we observed that users always wanted to see the conflicted 843 00:50:35,906 --> 00:50:39,956 copy and that they wanted to get the opportunity to say, like, I did. 844 00:50:39,956 --> 00:50:43,046 I put in a lot of effort into working on this when I was on the plane. 845 00:50:43,346 --> 00:50:47,970 Someone else, put in probably a similar amount of effort when they were online and 846 00:50:48,170 --> 00:50:50,830 you know, so last writer wins policies. 847 00:50:50,830 --> 00:50:55,700 There violated user expectations quite a lot because either a person 848 00:50:55,700 --> 00:50:58,460 had to win and then the person who lost would be really upset. 849 00:50:58,900 --> 00:51:00,970 so I think those were pretty interesting. 850 00:51:00,970 --> 00:51:05,113 I think with Moose, like with more metadata operations I think people 851 00:51:05,130 --> 00:51:06,420 were a little bit more permissive. 852 00:51:06,420 --> 00:51:10,680 Like if I moved something from one folder to another, another person 853 00:51:10,680 --> 00:51:12,180 moved it to a different folder. 854 00:51:12,496 --> 00:51:15,076 having it just converged on something as long as it converges. 855 00:51:15,136 --> 00:51:18,586 We observed it being like people didn't worry about it too much. 856 00:51:18,810 --> 00:51:21,480 I think the place where user intent is really interesting 857 00:51:21,480 --> 00:51:23,300 with moves is with sharing. 858 00:51:23,666 --> 00:51:26,983 So I think thinking about this from like the distributed systems 859 00:51:26,983 --> 00:51:31,333 perspective on causality, there would be like someone might have like, 860 00:51:31,423 --> 00:51:33,103 I dunno, their HR folder, right? 861 00:51:33,823 --> 00:51:38,353 And I don't know, like, let's say that someone is transferring to the HR team is 862 00:51:38,383 --> 00:51:40,423 they're getting added to the HR folder. 863 00:51:41,158 --> 00:51:44,038 But then say before they were on the team, they were on a 864 00:51:44,158 --> 00:51:45,358 performance improvement plan. 865 00:51:46,061 --> 00:51:50,958 So then the administrator for HR would delete that file, make sure it's 866 00:51:50,958 --> 00:51:53,838 deleted, and then add them to the folder. 867 00:51:54,438 --> 00:51:59,178 And so their user intent is express in a very specific 868 00:51:59,178 --> 00:52:00,978 sequencing of operations, right? 869 00:52:01,158 --> 00:52:04,038 That like this causally depended on this. 870 00:52:04,188 --> 00:52:08,238 I would not have invited 'em to the folder unless the delete was stably synced. 871 00:52:08,848 --> 00:52:12,648 And that making sure that gets preserved throughout the system, 872 00:52:12,798 --> 00:52:16,428 even when people are going online and offline and everything is a very 873 00:52:16,428 --> 00:52:18,048 hard distributed systems problem. 874 00:52:18,078 --> 00:52:18,468 Right. 875 00:52:18,901 --> 00:52:22,441 and it was intimately related with the details of the product. 876 00:52:22,958 --> 00:52:23,378 Right. 877 00:52:23,421 --> 00:52:23,661 yeah. 878 00:52:23,661 --> 00:52:29,571 How did you capture that causality chain of events since you probably also 879 00:52:29,571 --> 00:52:32,151 couldn't quite trust the system clock? 880 00:52:32,451 --> 00:52:33,681 How did you go about that? 881 00:52:34,085 --> 00:52:36,348 Yeah, this became even more difficult, right? 882 00:52:36,348 --> 00:52:41,118 Where file system metadata was partitioned across many shards in the database. 883 00:52:41,568 --> 00:52:45,528 So then we ended up using something like Lamport timestamp, where every single 884 00:52:45,528 --> 00:52:47,328 operation would get assigned a timestamp. 885 00:52:47,448 --> 00:52:50,745 And those timestamps were usually only reading and writing to their 886 00:52:50,745 --> 00:52:55,153 particular shard and for whatever timestamp the client had observed. 887 00:52:55,423 --> 00:52:59,677 But then in these cases where there were potentially cross shard, they 888 00:52:59,677 --> 00:53:03,397 weren't transactions, but like causal dependencies, we would be able to say 889 00:53:03,397 --> 00:53:07,597 like, the operation to mount this or to add someone to the shared folder 890 00:53:07,657 --> 00:53:11,917 and there them mounting it within their file system has to have a higher 891 00:53:11,917 --> 00:53:14,887 timestamp than any right within that or. 892 00:53:15,532 --> 00:53:16,582 Rights including deletes. 893 00:53:16,948 --> 00:53:21,628 so then that way when the client is syncing it would be able to know that when 894 00:53:21,628 --> 00:53:26,998 I am merging operation logs across all of the different shards, I need to assemble 895 00:53:26,998 --> 00:53:28,828 them in a causally consistent order. 896 00:53:29,288 --> 00:53:33,058 And that would then respect all of these particular invariants. 897 00:53:33,438 --> 00:53:33,828 Right. 898 00:53:34,098 --> 00:53:38,448 So you having thought through those different scenarios for Dropbox and 899 00:53:38,448 --> 00:53:43,758 made very intentional design decisions that, for example, in one scenario 900 00:53:43,758 --> 00:53:46,728 last writer wins is not desirable. 901 00:53:46,728 --> 00:53:51,415 Since that might lead to a very sad person stepping off the plane because 902 00:53:51,415 --> 00:53:54,955 all of your data is suddenly gone, or the other person's data is gone. 903 00:53:55,262 --> 00:53:58,292 so you make very specific design trade-offs here when it 904 00:53:58,292 --> 00:54:03,032 comes to somehow squaring the circle of distributed systems. 905 00:54:03,182 --> 00:54:08,222 Which sort of advice would you have for application developers or people even 906 00:54:08,552 --> 00:54:12,362 who are sitting inside of a company and are now thinking about, oh, maybe 907 00:54:12,362 --> 00:54:17,552 we should have our own Dropbox style, linear style sync engine internally. 908 00:54:17,552 --> 00:54:21,122 Which sort of advice would you give them when they Yeah. 909 00:54:21,122 --> 00:54:23,132 Start thinking this through to the detail. 910 00:54:23,987 --> 00:54:28,505 Yeah, I'll talk through kind of how we structured things at Dropbox to be able 911 00:54:28,505 --> 00:54:30,275 to navigate these types of problems. 912 00:54:30,395 --> 00:54:33,335 And I think the patterns here, can be quite general. 913 00:54:33,605 --> 00:54:37,815 I think what we ended up with was that like thinking like distributed 914 00:54:37,815 --> 00:54:39,945 systems syncing is hard, right? 915 00:54:40,185 --> 00:54:45,645 So we would have the kind of base layer of the sync protocol and how state 916 00:54:45,645 --> 00:54:49,245 gets moved around between the clients and the servers and all the shards. 917 00:54:49,695 --> 00:54:52,575 We would have very strong consistency guarantees there. 918 00:54:52,875 --> 00:54:57,345 So we would not use any of the knowledge of the product at that layer. 919 00:54:57,725 --> 00:55:02,475 So from a, like thinking of Dropbox in the file system as a CRDT. 920 00:55:03,660 --> 00:55:06,420 Dropbox allows, like moves to happen concurrently. 921 00:55:06,420 --> 00:55:09,690 It ha allows you to add something while another thing is happening. 922 00:55:10,020 --> 00:55:12,780 But at the protocol level, we kept things very strict. 923 00:55:12,780 --> 00:55:17,437 We kept them very close to being serializable that every view of the 924 00:55:17,437 --> 00:55:20,857 system was identified by a very small amount of state, like a timestamp. 925 00:55:21,067 --> 00:55:24,127 And that would fully determine the state of the system and like the 926 00:55:24,127 --> 00:55:26,077 amount of entropy in that was very low. 927 00:55:26,497 --> 00:55:30,067 And then whenever you are modifying it, you would say, here's what I expect 928 00:55:30,067 --> 00:55:34,267 the data to be, and if it doesn't match exactly, it will reject the operation. 929 00:55:34,597 --> 00:55:39,727 And then by doing it, structuring things in that way, then we made it very easy 930 00:55:39,727 --> 00:55:45,037 for product teams and for even us working on sync to embed all of these like 931 00:55:45,067 --> 00:55:47,677 looser more product focused requirements. 932 00:55:47,677 --> 00:55:51,247 They also may wanna change over time into the end points, like layered on top. 933 00:55:51,247 --> 00:55:57,157 So every time we wanted to change a policy on how like a delete reconciles with an. 934 00:55:57,787 --> 00:55:59,647 You know, add for a folder or something. 935 00:55:59,887 --> 00:56:02,707 We didn't have to solve any distributed systems problems to do that. 936 00:56:03,487 --> 00:56:07,897 So I think that like pattern of saying that, like is there a good abstraction? 937 00:56:07,897 --> 00:56:11,467 Is there something that is like very powerful that could solve a large 938 00:56:11,467 --> 00:56:16,267 class of problems, doing that well at the lowest layer and then potentially 939 00:56:16,627 --> 00:56:18,577 weakening the consistency above it. 940 00:56:19,297 --> 00:56:24,217 I actually really like the Rocicorp folks have a really great description of 941 00:56:24,217 --> 00:56:28,897 their consistency model for Replicache of it being like session plus consistency. 942 00:56:29,227 --> 00:56:34,087 And it's like a very similar idea where like when we build things on 943 00:56:34,087 --> 00:56:38,977 a platform, we may as our with our product hats on, like want users to 944 00:56:38,977 --> 00:56:42,607 not have to think about conflicts and merging and all that in a lot of cases. 945 00:56:42,757 --> 00:56:45,397 But those decisions might be very particular to our app. 946 00:56:45,397 --> 00:56:48,187 And that's something that holds for everything on the platform. 947 00:56:48,457 --> 00:56:52,177 And then there's always a way to embed those decisions onto, say. 948 00:56:52,552 --> 00:56:56,842 Session consistency and Replicache or serializability and other systems. 949 00:56:57,082 --> 00:57:00,435 And so I think that's like that separation of concerns I 950 00:57:00,435 --> 00:57:03,615 think is something that can apply to a lot of systems. 951 00:57:04,105 --> 00:57:04,495 Right. 952 00:57:04,555 --> 00:57:09,895 So maybe we use this also as a transition to talk a bit more about what you're 953 00:57:09,895 --> 00:57:12,295 now designing and working on Convex. 954 00:57:12,655 --> 00:57:19,225 What were some of the key insights that you've taken with you from Dropbox that 955 00:57:19,225 --> 00:57:22,195 ultimately led to you co-founding Convex? 956 00:57:22,975 --> 00:57:27,068 Yeah, when we first were starting Convex we were looking at how apps 957 00:57:27,068 --> 00:57:28,238 are getting built today, right? 958 00:57:28,298 --> 00:57:32,498 Like web apps are easier to build than ever. 959 00:57:32,653 --> 00:57:37,013 Even in 2021, it's incredible how much, like more productive 960 00:57:37,483 --> 00:57:39,703 that compared to 10 years before. 961 00:57:39,793 --> 00:57:40,093 Right. 962 00:57:40,093 --> 00:57:45,613 It was, and I think we noticed that the hard part for so many discussions 963 00:57:45,853 --> 00:57:50,110 was managing state and like how state propagates I think it was from 964 00:57:50,110 --> 00:57:54,370 the Riffle paper right, on how like so many issues in app development 965 00:57:54,370 --> 00:57:58,330 are kind of database problems in disguise and that how techniques 966 00:57:58,330 --> 00:58:00,340 from databases might be able to help. 967 00:58:00,610 --> 00:58:05,797 So with Convex we were saying like, well if we start with the idea of designing 968 00:58:05,797 --> 00:58:10,213 a database from first principles, can we apply some of those database solutions 969 00:58:10,393 --> 00:58:11,923 to things across the whole stack? 970 00:58:12,253 --> 00:58:17,173 So say for example, when I'm reading data from it within in my app, I have 971 00:58:17,173 --> 00:58:20,743 all of these React components that are all reading different pieces of data. 972 00:58:21,193 --> 00:58:24,643 It'd be really nice if all of them just executed at the same timestamp 973 00:58:24,703 --> 00:58:29,563 and I never had to handle consistency issues where one component knows 974 00:58:29,563 --> 00:58:30,973 about a user or the other one doesn't. 975 00:58:31,423 --> 00:58:36,823 Similarly, like why isn't it possible to be that I just use query across 976 00:58:36,823 --> 00:58:40,753 all my components and they just all live update whenever I read anything, 977 00:58:40,753 --> 00:58:42,133 it's a automatically reactive. 978 00:58:42,403 --> 00:58:46,753 So those were some of the like the initial kind of thought 979 00:58:46,753 --> 00:58:48,613 experiments for what led to Convex. 980 00:58:48,883 --> 00:58:52,243 I think the other one that was really motivated from our time at 981 00:58:52,243 --> 00:58:56,143 Dropbox and I think is like kind of a both a blessing and a curse. 982 00:58:56,143 --> 00:58:59,833 It's kind of like one of the key design decisions for Convex is 983 00:58:59,833 --> 00:59:03,133 that Convex is very opinionated about there being a separation 984 00:59:03,133 --> 00:59:04,523 between the client and the server. 985 00:59:05,303 --> 00:59:09,463 So we saw this at Dropbox where they were just different teams, right? 986 00:59:09,853 --> 00:59:13,393 And you know, as we've seen with like even the origin of GraphQL, right? 987 00:59:13,393 --> 00:59:16,153 Like that ability to decouple development between. 988 00:59:16,830 --> 00:59:20,505 teams working on user facing features and the way that the data fetching 989 00:59:20,505 --> 00:59:23,175 is implemented on the backend, it's gonna be really powerful. 990 00:59:23,805 --> 00:59:27,615 And so kind of the kind of thought experiment with Convex is, can we 991 00:59:27,722 --> 00:59:32,522 maintain a very strong separation while still getting like live updating, while 992 00:59:32,522 --> 00:59:36,752 still getting a really good ergonomics for both consuming data on the client 993 00:59:36,752 --> 00:59:38,372 and like fetching it on the server. 994 00:59:39,015 --> 00:59:39,435 Right. 995 00:59:39,435 --> 00:59:44,175 So yeah, walk me through a little bit more through the evolution of Convex then. 996 00:59:44,235 --> 00:59:49,158 And so, in, in terms of all the other options that are out there in terms 997 00:59:49,158 --> 00:59:55,698 of state management and I think most what applications are using is probably 998 00:59:55,818 --> 01:00:01,892 something that at least to some degree is somewhat customized and hand rolled and 999 01:00:01,892 --> 01:00:04,682 comes with its own huge set of trade-offs. 1000 01:00:05,348 --> 01:00:08,228 Help me better understand sort of the, where you mentioned the, 1001 01:00:08,390 --> 01:00:11,135 opinionated nature of Convex. 1002 01:00:11,435 --> 01:00:13,272 What are the, benefits of that? 1003 01:00:13,362 --> 01:00:16,262 What are the downsides of that and other implications? 1004 01:00:16,752 --> 01:00:20,562 Yeah, so when you write an app on Convex we can use maybe 1005 01:00:20,562 --> 01:00:22,242 like a basic to do app, right? 1006 01:00:22,602 --> 01:00:24,072 The linear clone, everyone does. 1007 01:00:24,355 --> 01:00:26,695 you write endpoints like you might be used to, right? 1008 01:00:26,695 --> 01:00:30,805 Where it's like list all the to-dos in a project like update a to-do in a project. 1009 01:00:31,182 --> 01:00:34,902 and those get pushed as your API to your Convex server. 1010 01:00:35,602 --> 01:00:39,292 the implementations of that API can then read and write to the database 1011 01:00:39,292 --> 01:00:43,492 and Convex has like a, kinda like Mongo or Firebase, like API for doing so. 1012 01:00:44,008 --> 01:00:48,688 I think the main benefit then of Convex relative to more traditional 1013 01:00:48,688 --> 01:00:53,172 architectures is that if you're on the client, the only thing you need to do 1014 01:00:53,412 --> 01:00:55,722 is call the, like the use query hook. 1015 01:00:56,067 --> 01:01:00,957 You're saying like, I am looking at a project I just do use like use query 1016 01:01:01,347 --> 01:01:07,857 list tasks and project that will then talk to the server, run that query, but 1017 01:01:07,857 --> 01:01:12,057 then also set up the subscription and then whenever any data that that query 1018 01:01:12,057 --> 01:01:16,227 looked at changes, it will efficiently determine that and then push the update. 1019 01:01:16,857 --> 01:01:21,567 So part of what is like been nice with Convex is that you are getting 1020 01:01:21,917 --> 01:01:26,307 a client that has a web socket protocol, it has a sync engine built in. 1021 01:01:26,637 --> 01:01:30,297 You're getting infrastructure for running JavaScript at scale and for 1022 01:01:30,297 --> 01:01:32,517 handling sandboxing and all of that. 1023 01:01:32,757 --> 01:01:35,757 And then you're also getting a database, which is, you know. 1024 01:01:36,102 --> 01:01:39,342 One, supporting transactions or reading and writing to it. 1025 01:01:39,552 --> 01:01:43,212 But then it also supports this efficient like being able to subscribe 1026 01:01:43,212 --> 01:01:47,652 on, I ran this query, this query just ran a bunch of JavaScript. 1027 01:01:47,652 --> 01:01:50,752 It looked at different rows and it ran some queries. 1028 01:01:51,235 --> 01:01:55,965 the system will automatically efficiently determine if any right overlaps with that. 1029 01:01:56,385 --> 01:01:59,805 So the combination of all of those things is like part of the benefit of 1030 01:01:59,805 --> 01:02:03,735 Convex, you just write TypeScript and you write it in a way that's, feels 1031 01:02:03,735 --> 01:02:06,645 very natural and everything just works. 1032 01:02:07,335 --> 01:02:12,825 And I think some of the like downsides is that it's it is a different set of APIs. 1033 01:02:13,098 --> 01:02:16,658 it's not using sql, it's doing things a little bit differently 1034 01:02:16,658 --> 01:02:17,858 than they've been done before. 1035 01:02:18,342 --> 01:02:22,842 yeah, it's like kind of interesting even today to see like what you know. 1036 01:02:23,262 --> 01:02:24,942 Talking about AI code gen, right? 1037 01:02:24,942 --> 01:02:28,422 Like models have been trained, pre-trained on this huge corpus 1038 01:02:28,422 --> 01:02:29,322 of stuff on the internet. 1039 01:02:29,592 --> 01:02:32,412 And when are they good at adopting new technologies? 1040 01:02:32,682 --> 01:02:35,202 Technologies that might be after their knowledge cutoff. 1041 01:02:35,562 --> 01:02:38,887 And when are they like it's better just to stick to things that they know already. 1042 01:02:39,592 --> 01:02:39,952 Right. 1043 01:02:39,997 --> 01:02:45,428 So what you've mentioned before where you say, Convex is rather opinionated for me. 1044 01:02:45,858 --> 01:02:49,668 in let's say five years ago, I might've been much more of 1045 01:02:49,668 --> 01:02:53,028 like, oh, but maybe there's a technology that's less opinionated 1046 01:02:53,028 --> 01:02:54,468 and I can use it for everything. 1047 01:02:54,828 --> 01:02:58,518 But the more experience I got, the more I realized no, actually. 1048 01:02:58,848 --> 01:03:02,478 I want something that's very opinionated, but opinionated 1049 01:03:02,538 --> 01:03:04,338 and I share those opinions. 1050 01:03:04,518 --> 01:03:06,378 Those are exactly for my use case. 1051 01:03:06,378 --> 01:03:08,448 So I think that is much better. 1052 01:03:08,448 --> 01:03:12,648 This is why we have different technologies and they are great for different 1053 01:03:12,648 --> 01:03:17,208 scenarios, and I think the more a technology tries to say, no, we're, 1054 01:03:17,208 --> 01:03:22,912 we're best for everything, I think the, less it's actually good at anything. 1055 01:03:23,392 --> 01:03:26,932 And so I greatly appreciate you standing your ground and saying 1056 01:03:26,932 --> 01:03:30,872 like, Hey, those are, our design, decisions that we've made. 1057 01:03:31,022 --> 01:03:35,615 And those are the use cases where, you'd be really well served building 1058 01:03:35,615 --> 01:03:37,355 on top of something like Convex. 1059 01:03:37,685 --> 01:03:42,522 And, I particularly like for now where TypeScript is really the, default 1060 01:03:42,522 --> 01:03:44,772 language to build full stack applications. 1061 01:03:45,042 --> 01:03:48,732 And it's also increasingly becoming the default for. 1062 01:03:48,933 --> 01:03:51,250 ai, based applications as well. 1063 01:03:51,430 --> 01:03:57,040 And AI based systems speak type script, just as well as English. 1064 01:03:57,640 --> 01:04:02,090 And given that Convex makes that full stack super easy. 1065 01:04:02,450 --> 01:04:07,893 And also I think you can, when you build local-first apps, it can 1066 01:04:07,893 --> 01:04:11,913 sometimes get really tricky because you empower the client so much. 1067 01:04:11,913 --> 01:04:15,453 You give the client so much responsibility and therefore there's 1068 01:04:15,453 --> 01:04:17,193 many, many things that can go wrong. 1069 01:04:17,223 --> 01:04:21,653 And I think Convex therefore, takes a more conservative approach and says 1070 01:04:21,653 --> 01:04:25,881 like, Hey, everything that happens on the server is like highly privileged 1071 01:04:25,881 --> 01:04:27,501 and this is your safe environment. 1072 01:04:27,831 --> 01:04:31,491 And the client will try to give you the best user experience and 1073 01:04:31,491 --> 01:04:33,081 developer experience out of the box. 1074 01:04:33,831 --> 01:04:37,551 But the client could be in a more adversarial environment. 1075 01:04:37,611 --> 01:04:39,831 And I think those are great design trade offs. 1076 01:04:40,071 --> 01:04:45,208 So, I think that is a fantastic foundation for tons of different applications. 1077 01:04:45,818 --> 01:04:46,238 Yeah. 1078 01:04:46,701 --> 01:04:49,011 talking about some of these strong opinions being both 1079 01:04:49,011 --> 01:04:50,271 blessings and curses, right? 1080 01:04:50,271 --> 01:04:54,681 Like over the past few months, one thing we've been working on is trying 1081 01:04:54,681 --> 01:04:58,401 to bridge the gap between those two points in the spectrum, right? 1082 01:04:58,705 --> 01:05:02,675 we wrote a blog post on it a few months ago of like working on what we're calling 1083 01:05:02,675 --> 01:05:08,135 our like Object sync engine, trying to take a lot of the principles from more of 1084 01:05:08,135 --> 01:05:14,270 a local-first type approach of having a data model that it is synced to the client 1085 01:05:14,450 --> 01:05:18,020 and the only interaction between the server and the client is through the sync. 1086 01:05:18,440 --> 01:05:22,580 And the client then can always render its UI just looking at the local 1087 01:05:22,580 --> 01:05:24,380 database and it can be offline. 1088 01:05:24,530 --> 01:05:28,040 It's also fully describes the app stage so it can be exported 1089 01:05:28,040 --> 01:05:29,600 and rehydrated or whatever. 1090 01:05:29,904 --> 01:05:33,564 it's very interesting design exercise we've been on to say like, can 1091 01:05:33,564 --> 01:05:39,804 you structure a protocol on a sync engine in a way such that the UI 1092 01:05:39,834 --> 01:05:42,984 is still reading and writing to a local store that is authoritative. 1093 01:05:43,344 --> 01:05:47,514 But then that local store is like to kind of use like an electric SQL terminology is 1094 01:05:47,514 --> 01:05:52,584 like that is a shape that is some mapping of a strongly separated server data model. 1095 01:05:52,794 --> 01:05:56,754 So we still have a client data model and server data model, which might be 1096 01:05:56,754 --> 01:06:01,419 owned by different teams and evolve independently and, we also have that 1097 01:06:01,419 --> 01:06:06,159 strong separation where the implementation of the shape is privileged and running 1098 01:06:06,159 --> 01:06:10,925 on the server and has authorization rules built in and get the best of both worlds. 1099 01:06:10,925 --> 01:06:16,255 And we've kind of, we have a like beta that we've not released publicly thought 1100 01:06:16,375 --> 01:06:19,585 open, sourced out there, but kind of a thing where we, I think they're 1101 01:06:19,585 --> 01:06:21,355 still figuring out like the DX for it. 1102 01:06:21,355 --> 01:06:24,055 And I think we have something that like algorithmically works 1103 01:06:24,355 --> 01:06:28,165 and it's like the protocol works, but it's like, it's kind of hard. 1104 01:06:28,165 --> 01:06:28,315 Right. 1105 01:06:28,315 --> 01:06:32,395 It kind of reminds me a lot of writing GraphQL resolvers of like saying How do I 1106 01:06:32,395 --> 01:06:35,215 take the messages table from my chat app? 1107 01:06:35,710 --> 01:06:39,280 Then under the hood that might be joining stuff from many different 1108 01:06:39,280 --> 01:06:43,060 tables and filtering rows, or might even be doing a full tech search 1109 01:06:43,060 --> 01:06:45,250 query in another view or something. 1110 01:06:45,547 --> 01:06:48,817 and coming up with the right ergonomics to make that feel 1111 01:06:48,847 --> 01:06:50,767 great for a day one experience. 1112 01:06:50,767 --> 01:06:53,047 I think something that's like still we're working on, still 1113 01:06:53,047 --> 01:06:53,902 kinda like a research project, 1114 01:06:54,097 --> 01:06:54,577 right? 1115 01:06:54,637 --> 01:06:58,837 Well, when it comes to data, there is no free lunch, but I'd much rather to have 1116 01:06:58,837 --> 01:07:03,787 it be done in the order and sequencing that you're going through, which is 1117 01:07:03,787 --> 01:07:09,307 having a solid foundation that I can trust and then figuring out the right 1118 01:07:09,307 --> 01:07:14,047 ergonomics afterwards, since I think there's many, many tools that start with 1119 01:07:14,047 --> 01:07:19,747 great ergonomics, but later realize that it's on a built, on a unsound foundation. 1120 01:07:19,957 --> 01:07:24,137 So when it comes to data, I want a trustworthy foundation, and I think 1121 01:07:24,137 --> 01:07:25,964 you're going about in the right order. 1122 01:07:26,529 --> 01:07:31,209 Hey, Sujay, I've been learning so much about one of my favorite 1123 01:07:31,209 --> 01:07:33,099 products of all time, Dropbox. 1124 01:07:33,789 --> 01:07:39,599 I've learned so much of like how the sausage was actually made, how it evolved 1125 01:07:39,599 --> 01:07:45,119 over time and I'm really excited that you got to share the story today and 1126 01:07:45,419 --> 01:07:48,272 many me included, got to, learn from it. 1127 01:07:48,452 --> 01:07:51,002 Thank you so much for taking the time and sharing all of this. 1128 01:07:51,572 --> 01:07:52,202 Thanks for having me. 1129 01:07:52,202 --> 01:07:53,207 This is super, super fun. 1130 01:07:54,159 --> 01:07:56,739 Thank you for listening to the localfirst.fm podcast. 1131 01:07:56,919 --> 01:08:00,009 If you've enjoyed this episode and haven't done so already, please 1132 01:08:00,009 --> 01:08:01,299 subscribe and leave a review. 1133 01:08:01,689 --> 01:08:04,209 Please also share this episode with your friends and colleagues. 1134 01:08:04,599 --> 01:08:07,599 Spreading the word about the podcast is a great way to support 1135 01:08:07,599 --> 01:08:09,309 it and to help me keep it going. 1136 01:08:09,969 --> 01:08:13,389 A special thanks again to Jazz for supporting this podcast. 1137 01:08:13,689 --> 01:08:14,649 I'll see you next time.

#23 – Sujay Jayakar: Dropbox, Convex

Episode Transcript

Never lose your place, on any device