localfirst.fm

A podcast about local-first software development

Listen

Conference

#23 – Sujay Jayakar: Dropbox, Convex


This episode's guest is Sujay Jayakar, co-founder of Convex and early engineer at Dropbox. In this conversation, Sujay shares the story of how the sync engine powering Dropbox was initially built and later redesigned to address all sorts of distributed systems problems 

Mentioned in podcast:

Links:

Thank you to Jazz for supporting the podcast.

Transcript

#23 – Sujay Jayakar: Dropbox, Convex
00:00There's another kind of interesting decision here on Dropbox by
00:04design was always like a sidecar.
00:06It's always something that just sits and it looks at your files.
00:09Your files are just regular files on the file system.
00:12And if Dropbox, the app isn't running, your files are there and they're safe,
00:17and it's something that you know, regular apps can just read and write
00:21to, and in some sense like Dropbox was unintentionally local-first
00:27from that perspective, right?
00:28Because it's saying that no matter what happens, your data
00:31is just there and you own it.
00:33Welcome to the localfirst.fm podcast.
00:36I'm your host, Johannes Schickling, and I'm a web developer, a
00:39startup founder, and love the craft of software engineering.
00:42For the past few years, I've been on a journey to build a modern high quality
00:46music app using web technologies, and in doing so, I've been following down
00:50the rabbit hole of local-first software.
00:52This podcast is your invitation to join me on that journey.
00:56In this episode, I'm speaking to Sujay Jayakar.
00:59Co-founder of Convex and Early Engineer at Dropbox.
01:02In this conversation, Sujay shares the story on how the Sync Engine
01:06powering Dropbox was built initially and later redesigned to address all
01:11sorts of distributed systems problems.
01:13Before getting started, also a big thank you to Jazz for supporting this podcast.
01:19And now my interview with Sujay.
01:22Hey, Sujay.
01:23So nice to have you on the show.
01:24How are you doing?
01:25Doing great.
01:26Great.
01:26Really happy to be here.
01:28I'm super excited to have you on the show.
01:30I've been using your work really since over a decade at this point
01:35when I was really getting into using computers productively.
01:39And we just the other time had another really interesting guest, Seph Gentle on
01:45the podcast, who has worked on a really fascinating tool, called Google Wave
01:50back then that had a big impact on me.
01:53And you've been working on another technology that had a big impact
01:56on me, which is Dropbox and still has a very positive impact on me.
02:01That was all the way back then over 10 years ago in 2014.
02:05I don't think I need to explain to the audience what Dropbox is, but, I
02:11want to hear it from you, like what led you to join Dropbox, I think very
02:15early on and just hearing a little bit just embedded in your personal
02:20context when you joined it, and then we're gonna go dive really deep into
02:24all things syncing related, et cetera.
02:27How does that sound?
02:28Yeah, that sounds great.
02:30It's actually a really funny story.
02:31my career here in technology started in 2012.
02:34I was actually studying mathematics.
02:37I was going to go work at the NSA doing cryptography, and I was born in India.
02:44but I'm a naturalized citizen for the United States, and, you have to
02:48be, have security clearance to go do these types of cryptography things.
02:52And you know, my clearance kept on dragging on and on and on and
02:58they like interviewed my roommates and apparently just a very sketchy
03:01guy so I had an offer to go work there, but it kept on dragging on.
03:06And then my roommate, at the time was a computer science major who wanted some
03:10like someone to go with him to the career fair and, just started chatting with the
03:15Dropbox people and you know, it's about like a hundred people around that time.
03:19And, chatting turned into hanging out at dinner, turned into interviewing
03:23and being a math person, I did my interviews all in Haskell and
03:26didn't know any real programming.
03:28and then yeah, that turned into doing an internship, dropping out of undergrad and.
03:34just following the dream.
03:35And so I worked on, at Dropbox, I worked on a bunch of things.
03:38I started off working on our, like, growth team.
03:40So I did a lot of like email system.
03:43Like I did this, I worked on this thing called the space raise, like a promotion.
03:47Oh, I remember that.
03:48Yes.
03:49I think I've, I've earned quite a lot of like free storage, which I think
03:53over the time has like gone down.
03:56But that was a very smart and effective mechanism.
03:58I surely invited all my friends back then.
04:01I couldn't afford a premium plan being a broke student that worked.
04:07And then from there worked on the sync engine for some time.
04:11And then right now I'm the co-founder and chief scientist of a startup called Convex
04:16and my three co-founders and I met working on this project called Magic Pocket,
04:20where Dropbox stores hundreds of petabytes now exabytes of files, for users.
04:25And we used to do that in S3.
04:27And so the three of us worked together on a team to build Amazon S3, but in-house
04:32and migrate all of the data over.
04:34so we did that for a few years and then, Worked on rewriting the entirety of
04:39Dropbox, the sync engine, the thing that runs on all of our desktop computers.
04:43we rewrote it to be really correct and scalable and very flexible.
04:47and shipped that.
04:48after that left Dropbox in 2020 I was trying to decide if I wanted
04:52to get back to academics or not.
04:54So I did some research in networking and then decided to start Convex in 2021.
04:59Certainly curious, which sort of research has had your interest the
05:03most in this sort of transitionary per period, but maybe we stash that
05:07for a moment and go back to the beginning when you joined Dropbox.
05:11you mentioned there were around a hundred people working there currently.
05:15how do I need to imagine the technology behind Dropbox at this point?
05:20it clearly started all out with like, desktop focused like daemon project,
05:27like daemon process that's running on your machine somehow keeps track of the files
05:33on your system and then applies the magic.
05:37So explain to me how things worked back then and what was it like to
05:43work at Dropbox when there were around about a hundred people.
05:46Yeah, I mean, it was pretty magical, right?
05:49Because the company had, I think gotten so many things right on the product side
05:54and then those showed up in technology.
05:55But just this feeling of like Dropbox being this product that just worked right?
06:00It was for everyone.
06:01It was not just for technologists, but anyone should be able, anyone who's
06:05comfortable using a computer should be able to install Dropbox and have
06:09a folder of theirs become magical.
06:12And without understanding anything about how it works, they should
06:15just think of it as like an extension of what they know already.
06:19yeah.
06:19And so like the ways that that showed up I think were really interesting.
06:22At the time there was a very strong culture of like reverse engineering.
06:26So to have this daemon that runs locally.
06:30You know, there was one of the amazing early moments in Dropbox was that,
06:34if like you open up finder or explore and you have the overlays on it.
06:39Like that used to be done by like attaching to the finder
06:43process and injecting code into it
06:47and to the point where, uh, when some folks had gone to talk to Apple at the
06:51time and about like working with the file system and everything like the,
06:57there were teams at Apple that asked Dropbox, how did you do that in Finder?
07:05So you wanted to offer the most native experience.
07:08There weren't the necessary APIs for that.
07:11And so you just made it happen.
07:12That's amazing.
07:13Yeah.
07:14Yeah.
07:14And so that that idea of like, how do you create the best user experience,
07:19something that you know, for the purpose of making non-technical users feel very
07:26confident and feel very safe using it.
07:28That was another, I think, really deep like company value of like
07:32being worthy of trust and taking people's files very seriously.
07:36You know, I like remember having a friend who was in residency at the
07:39time and he was telling me that he keeps all of his, like some of his non
07:44HIPAA stuff, but like his things that he looks at on Dropbox and you know,
07:50pulls him up and he's consulting 'em.
07:51And there's a part of me which is terrified by that, right?
07:54Like we think of software as something where like throwing a 500
07:57error is fine every once in a while.
08:00And a Dropbox that was there was a culture of making users feel
08:03like they could really trust us.
08:04And then that showed up for things like making sure that, like when
08:08we give feedback to users, if we put that green overlay in finder.
08:12They know that no matter what happens, they could throw their laptop in a pool.
08:16They could like they, anything could happen and their files are safe.
08:20Like if their house burns down, they don't have to worry about that thing.
08:24And that's like all of that reverse engineering and all of the emphasis
08:29on correctness and durability.
08:31It was all in service of that feeling, which I think was really cool.
08:34so on the engineering side, at the time it was like in hyper growth mode.
08:38So they had a Python desktop client.
08:40Almost all of Dropbox was in Python at the time.
08:43And so there's a pre my py, like big rapidly changing desktop client that
08:50needed to support Mac, windows and Linux and all these different file systems.
08:53and then on the server, it was like we had one big server called Meta Server,
08:58meta, I think was from metadata.
09:00and that like ran almost all of Dropbox.
09:03We stored the metadata in MySQL.
09:06The files were stored in S3, and then we had a separate notification server
09:11for managing pushes and things like that.
09:13And so it was like kind of classic architecture and like reach was
09:17starting to reach the limits of its scaling even at that time.
09:20And, those were a lot of the things we worked on over the next 10 years.
09:24Wow.
09:25So was the server also written in Python?
09:27So it was all one big python shop.
09:30Yeah.
09:30And the server was all written in Python.
09:33we, had some pretty funny bugs that were due to it's kind of
09:39crazy to think about it now.
09:40You know, we, you working in TypeScript and full time and to think of, like
09:44back in the day we just had these like hundreds of thousands, millions of lines
09:48of code with no type safety and with all types of crazy meta programming and
09:55decorators and meta classes and stuff.
09:57And yeah, so there was a, it was all in Python when I showed up.
10:00it was not all in Python and not all in one big monolithic service when I left.
10:04So you mentioning joining when there were around a hundred people and you
10:09probably already at this point had like multitudes more in terms of users.
10:15Being in hypergrowth, it is sort of this race against time where you only have
10:21so much time to work on something, but growth may be outrun you already and
10:26things are already starting to break.
10:28Or You know like, okay, if things gonna grow like this, this system will
10:33break and it's gonna be pretty bad.
10:36So tell me more about how you were dealing with like the constant r
10:42race against time to rebuild systems, redesign systems, putting out fires.
10:49What was that like?
10:50Yeah, and I think there's like kind of an interesting place to take this on.
10:53I think like the normal things were on scale right there.
10:56Those were like.
10:57One, kinda class of problems of being able to handle the load.
11:00But I think one kind of really interesting, dimension of this that led
11:04to our decision to start rewriting all of the sync engine in 2016 was actually
11:09just like customer debugging load.
11:12You know, we have we had hundreds of millions of active users and they were
11:17using Dropbox in all types of crazy ways.
11:20Like one of the stories is someone was using Dropbox with like, I think
11:24it was running on some, I don't know if it was like a raspberry pie or
11:27something, something on his tractor.
11:28Like the guy ran a farm and he was using Dropbox to sink like
11:32pads in text files to his tractor.
11:35And I might be getting some of the details wrong, but
11:37it's something like that.
11:38And so people would just use Dropbox in all types of crazy ways on crazy
11:43file systems with kernel modules running that are messing things around
11:47or so I think, You know, in terms of getting ahead of scale, I think we found
11:52ourselves around 2015, 2016, in the place where for the syn engine on the
11:58desktop client, the entire team just spent all of its time debugging issues.
12:04We had this principle of like anything that's possible, anything that a
12:08protocol allows, anything that some threading race condition that's
12:13theoretically possible will be possible.
12:16And then we would see it, right?
12:17Like users would write in saying, my files aren't sinking.
12:20And then we would look at it and we would spend months debugging each one of these
12:24issues and trying to read the tea leaves from traces and reports and reproductions.
12:30And it'll be like, oh they mounted this file system over here
12:33and then this one and this one are in a different file system.
12:36So moving the file actually did a copy, but then the X adders
12:40were in, preserved this and that.
12:42You know, in terms of that theme of like getting ahead of scale, like I think there
12:46was first this realization that like the set of possible things that can happen in
12:51the system is just astronomically large.
12:54And all of them will happen if they're allowed to.
12:57And we do not have, no matter how much like incremental time
13:01we put into debugging things, we will never be able to keep up.
13:05And the cost of doing that is that the entire team is working
13:08on maintenance like this.
13:09We couldn't build any new features.
13:11So I think that was a motivation then for the rewrite to is can we find like points
13:15of leverage where if we just invest a little bit in technology upfront, like by
13:20architecting things a particular way, can we just eliminate a much bigger set of
13:25potential work from debugging and working with customers and stuff like that.
13:29So maybe this is a good time to take a step back and try to
13:33better understand what was Dropbox sync Engine actually back then?
13:38So from just thinking about it through like a user's perspective, I have maybe
13:45two computers, and I have files over here.
13:48I. I want to make sure that I have the files synced over from here to here.
13:53So I could now think about this as sort of like a Git style, approach.
13:59Maybe there's other ways as well.
14:01walk me through sort of like through the solution space, how this could have been
14:05approached and how was it approached?
14:07is there some sort of like diffing involved between different file states
14:12over time, those are being synced around.
14:14Do you sync around the actual file content itself?
14:18Help me to understand.
14:19Building a mental model, what does it mean back then for the sync engine to work?
14:25Yeah.
14:25Yeah.
14:26It's a super interesting question, right?
14:28Because I think like you're saying, there's so many different paths one
14:31can take and it's, I think one of those things where like if someone
14:34asks, like design Dropbox in an interview question, there's like
14:37definitely not one right answer, right?
14:39It's like there are so many trade-offs and like different forks in the decision tree.
14:44I think one of the first things is that, so you have your desktop A and you
14:48have your, maybe you have your desktop and your laptop, and one of the first
14:52decisions for Dropbox is that we would have a central server in the middle,
14:56that there would be a Dropbox file system in the middle that Dropbox, the company
15:01ran, and we did that from this trust perspective, we wanted to say that we
15:05will run this infallibly when you get that green check mark when it's there.
15:11You know, even if an asteroid destroys the eastern side of the United
15:15States, like we will have things replicated in multiple data centers.
15:18And that you know, and then also it's accessible anywhere
15:22on the internet, right?
15:23You can go to the library.
15:24This is not so common these days but I remember when I was a student, like,
15:26go to the library, log into Dropbox and read all your things right?
15:30rather than having to bring a USB stick around.
15:32And so I think that is the first decision, but it's not necessary, right?
15:36Like there were plenty of distributed, entirely peer to
15:39peer file syncing, designs, right?
15:42And so that was the first decision.
15:44And I think the kind of second decision was that if we imagine our desktop and
15:48our laptop and you have the server in the middle, the desktop might be on
15:52Windows, the laptop might be on Mac OS.
15:56So I think that decision to support multiple platforms.
15:59Is like another really interesting one.
16:02This is like where I think Git and Dropbox can be a little bit different.
16:06And that Git is at the end of the day quite Linux centric.
16:09It's case sensitive for its file system.
16:12It deals with directories and it makes particular assumptions about
16:15how directories should behave.
16:17And that was something with Dropbox.
16:19We wanted to be consumer, we wanted to support everything and we wanted
16:22it to feel very automatic, right?
16:24That like, someone shouldn't have to understand like what a, like unicode,
16:28normalization disagreement means.
16:29Right?
16:30Where in Git like in really bad settings, like you might have to understand
16:34that, that you're right, you with an accent differently on Mac and Windows.
16:38so I think that's the kind of like next, side.
16:40So then Dropbox has its design for a file system and it's a
16:43central, it's like the hub and all those folks are your phone, your.
16:48desktop, your laptop and whatnot.
16:50and then so to kind of get down to the details a bit more.
16:53So then, yeah, we have a process that runs on your computer, that's the
16:56Dropbox app, and that watches all of the files on your file system, and then
17:02it looks at what's happened and then syncs them up to the Dropbox server.
17:07And then whenever changes happen on the Dropbox server, it syncs them down.
17:11there's another kind of interesting decision here on Dropbox by
17:15design was always like a sidecar.
17:17It's always something that just sits and it looks at your files.
17:20Your files are just regular files on the file system.
17:23And if Dropbox, the app isn't running, your files are there and they're safe,
17:28and it's something that you know, regular apps can just read and write
17:32to, and in some sense like Dropbox was unintentionally local-first
17:37from that perspective, right?
17:39Because it's saying that no matter what happens, your data
17:42is just there and you own it.
17:44and you know, there are other systems, right?
17:46Like if you use NFS a like a network file system, then if you unmount it or
17:52if you lose connection to the server.
17:54You might not be able to actually open any files that you have the metadata for.
17:58Right.
17:59And I remember from a user perspective, the local-first aspect, I really went
18:04through like all the stages where I had a computer that wasn't connected
18:08to the internet yet, and that at some point I had an internet connection.
18:12But, files were always where like everything depended on files.
18:16Like if I didn't have a file, things wouldn't work.
18:20Everything depended on files.
18:22There were barely websites that where you could do meaningful things.
18:26certainly web apps weren't very common yet.
18:30And then Dropbox made everything seamlessly work together.
18:35And then when web apps and SaaS software more came along, I was a
18:41bit confused because I felt Okay.
18:43I t gives me some collaboration, but seems to be a different kind of collaboration
18:48since I had collaboration before.
18:50But I also understood the limitations of, when I'm working on the same doc
18:56file, through Dropbox, which gets sort of like the first copy, second
19:00copy, third copy, and now I need to somehow manually reconcile that.
19:05And when I saw Google Docs for the first time.
19:09That was really like a revelation because, oh, now we can do this at the same time.
19:14But at the same while I saw that, I still remember the feeling
19:19where, but where are my files?
19:20This is my stuff now.
19:22Where, where is it?
19:23And that trust that you've mentioned with Dropbox, I felt like I lost some,
19:30some control here and it required a lot of trust, in those tools that I
19:35started now step by step, embracing.
19:37And frankly, I think a lot of those tools didn't deserve my trust in hindsight.
19:41I still feel like we've lost something by no longer being able to like call
19:48the foundation our own in a way.
19:50And I'm still hoping that we kind of find the best of both worlds where
19:54we get that seamless collaboration that we now take for granted.
19:58Something like that Figma gives us.
20:00but also the control and just being ready for whatever happens, that's
20:06something Dropbox gave us out of the box.
20:08I just wanna share this sort of like anecdote and like almost
20:12emotional confusion as I walk through those different stages
20:16of how we work with software.
20:18Totally.
20:19And we've ended up in a place that's not great in a lot of ways.
20:22Right.
20:22And I think you know, I think part of the sad thing, and maybe from
20:27even like an operating systems design perspective is that I feel like files
20:32have lots of design decisions that are.
20:35Packaged up together.
20:36You know, like one of the amazing things about files is that
20:39they're self-contained, right?
20:41Like on Google, I don't know what Google's backend looks like for Google
20:44Docs, but they probably have like all of the metadata and pieces of the data
20:49spread and different rows in a database and different things in an object store.
20:53And just even thinking about like the physical implementation of that
20:57data, it's like scattered around probably a bunch of servers, right?
21:00Maybe in different data centers.
21:02And there's something really nice about a file where a file is just
21:05like a piece of state, right?
21:08That is just self-contained.
21:09And I think the thing that I think is one of the things I think is very
21:13unfortunate is like from a operating systems perspective is that that decision
21:18then has also been coupled with a very anemic API like with files, they're just
21:24sequences of bytes that can be read and written to and impended and there's
21:30no additional structure beyond that.
21:32And I think like.
21:33Folks the way that things have evolved is that we've given up on, too
21:37have more structure, too make things like Google Docs, too be able to
21:41reconcile and have collaboration and interpret things more than just bites.
21:46We've also given up this ability to package things together.
21:49Mac os had like a very kind of baby step in this direction with I
21:53think they're called bundles.
21:54Like the things where like if you have like your.app, they're
21:57actually a zip file, right?
21:59And there's all types of ways, all types of brain damage for how this
22:03like, doesn't actually work well.
22:05You know?
22:05But the idea is kind of interesting, right?
22:07It's like what if files had some more structure and what if you still
22:10considered something, an atomic unit, but then it had pieces of it that
22:15weren't just uninterpretable bites.
22:17And I think that's like, the path dependent, way that we've
22:20ended up where we are today.
22:22That makes sense.
22:23So going back to the sync engine implementation did the Python process
22:28back in the day did that mostly index all of the files and then actually
22:34send across the actual bites probably in some chunks, across the wire?
22:39Or was there some more intelligent and diffing happening client side
22:45that you would only send kind of the changes across the wire and how do I
22:50need to think about what is a change when I'm dealing with like a ton of
22:55bites before and a ton of bites after?
22:57Yeah.
22:58It's really, really good questions.
22:59I think maybe like the first starting point is that like files
23:03in Dropbox were stored, just broken up into four megabyte chunks.
23:07And that was just a decision at the very beginning to pick some size.
23:11And on the server, the way that those chunks were stored is that they,
23:15each four megabyte chunk was stored by key to by its shot to 56 hash.
23:20So we would assume that those are globally unique.
23:23So then if you had the same copy of a bunch of file, or you had
23:27a file copied many times in your Dropbox, we would only store it once.
23:31And that would just happen organically because we would say
23:34like, okay, I looked at this file, it has three chunks A, B, and C.
23:39And then the client would ask the server, do you have A, B, and C?
23:43Like the server would say, yes, I have B and C already, please send A, then we
23:47would upload A. so there was already like at the file level there was this like
23:52kind of very coarse grained Delta sync.
23:56at the four megabyte chunk layer.
23:58and then the kind of, it's funny, these things evolve, right?
24:01Like then the next thing we layered on up top was that in that setting where
24:05you decided B and C were there already and you needed to upload a then with
24:09a, the desktop client could use rsync to know that there was previously a
24:15prime and do a patch between the two and then send just those contents.
24:19the kind of thing that was pretty interesting is that a lot of the content
24:23on Dropbox was very incompressible stuff like video, images, so the
24:29benefits of deduplication both across users or even within a user.
24:34And the benefit of like rsync was not actually as much as one might think,
24:40at least from the like, terms of bandwidth going through the system.
24:43It wasn't that reductive because a lot of this content was just kind of unique and
24:48not getting updated in small patches.
24:51And on your server side, blob store, now that you had those hashes for those four
24:56megabyte chunks, that also means that you could probably deduplicate some content
25:02across users, which makes me think of all sorts of other implications of that.
25:09When do you know it's safe to let go of a junk?
25:12do you also now know that, you could kind of go backwards and
25:16say like, oh, from this hash, we know this is sensitive content.
25:20And have some further implications for, whatever we don't need to go too
25:25much into depth on that now, but, yeah.
25:28I'm curious like how you thought of those design decisions and
25:32the possible implications.
25:34Yeah.
25:34Yeah, for the first one yeah, like distributed garbage collection
25:38was a very hard problem for us.
25:39We called it vacuuming and in terms of making Dropbox economics work out
25:44of, like, when we couldn't afford to keep a lot of content that was deleted
25:48that we couldn't charge users for.
25:50So that was you know, there's all additional complexity where different
25:54users would have like the ability to restore for different periods of time.
25:58So we would say like, anything that's deleted, it doesn't actually
26:01get deleted for 30 days or a year or whatnot based on their plan.
26:05so then, yeah, like having to do this like big distributed mark and
26:09sweep garbage collection algorithm across hundreds of petabytes,
26:14exabytes of content that was something that we had to get pretty good at.
26:18And when we designed Magic Pocket, where we, implemented S3 in-house, we
26:23had specific primitives for making it a little bit easier to avoid race conditions
26:28where like, if a file was deleted.
26:31And we decided that no one needed it anymore.
26:34But then just at that point in time, someone uploads it again, making sure
26:38that we don't accidentally delete it.
26:40So that was like, yeah, definitely a very tricky problem.
26:43And I think in retrospect this is like an interesting design exercise, right?
26:48And that if deduplication wasn't actually that valuable for us, we could have
26:52eliminated a lot of complexity for this garbage collection by not doing it right.
26:58I think for the second thing, yeah.
26:59So at the beginning when Dropbox started, if you had a file with A, B and C and you
27:06uploaded it, it would just check, does A, B and C exist anywhere in Dropbox?
27:11And, that got changed over time to be does do you as your user
27:17have access to A, B, and C?
27:19And you know, 'cause otherwise you could use this for all types of purposes, right?
27:24To see if there exists some content anywhere in Dropbox.
27:27And, that was something where we would in the case where the user was
27:32uploading A, B, and C, say none of them were present in their account, we would
27:38actually force them to upload it, incur the bandwidth for doing so, and then
27:42discard it if B and C existed elsewhere.
27:46Yeah.
27:46Very interesting.
27:47I mean, this would be an interesting rabbit hole just to go down just the
27:50kind of second order effects of that design decision, particularly at
27:54the scale and importance of Dropbox.
27:57But maybe we save that for another time.
27:59So going back to the sync engine, now that we have a better understanding of, how it
28:04worked in that shape and form back then.
28:07You've been already mentioning before, like as things as usage went through
28:12the roof, all sorts of different usage scenarios also expanded.
28:17you had all sorts of more esoteric ways, how you didn't kind of even think
28:22before that it would be used this way.
28:25Now all of that came to light.
28:28I'm curious which sort of, helper systems you put in place that you could
28:33even have a grasp of what's going on since a part of the trust that Dropbox
28:39owned or that earned over time, was probably also related to privacy.
28:44So you, you couldn't just like read everything that's going on in someone's
28:49system, so you're probably also relying to some degree on the help of a user
28:55that they like send something over.
28:57Yeah.
28:57Walk me through like the evolution of that and that you, like as
29:02an engineer, if there's a bug reproducing that bug is everything.
29:07So walk me through that process.
29:09Yeah, and you know, like we had a very strict rule, right, where it just,
29:13we do not look at content, right?
29:15and so that was the thing when debugging issues, the saving grace is
29:20that for most of the issues we saw.
29:22They were more metadata issues around like sync, not converging or sync, getting
29:28to the client thinking it's in sync with the server, but them disagreeing.
29:32so we had a few pretty, yeah, like pretty interesting
29:35supporting algorithms for this.
29:37So one of them was just simple like hang detection, like making sure, like
29:41if, when should a client reasonably expect that they are in sync?
29:45And if they're online and if they've downloaded all the recent
29:49versions and things are getting stuck, why are they getting stuck?
29:53So are they getting stuck because they can't read stuff from the
29:55server, either metadata or data?
29:57Are they getting stuck because they can't write to the file system and
30:00there's some permission errors?
30:02So I think having very fine-grained classification of that and having the
30:06client do that in a way that's like not including any private information and
30:11sending that up for reports and then aggregating that over all of the clients
30:14and being able to classify was a big part of us being able to get a handle on it.
30:20And I think this is just generally very useful for these sync engines.
30:23the biggest return on investment we got was from consistency checkers.
30:27So part of sync is that there's the same data duplicated in many places, right?
30:33Like, so we had the data that's on the user's local file system.
30:37We had all of the metadata that we stored in SQLite or we would store like what
30:41we think should be on the file system.
30:43We would store what the latest view from the server was.
30:46We would store things that were in progress, and then we have
30:49what's stored on the server.
30:50And for each one of those like hops, we would have a consistency checker that
30:55would go and see if those two matched.
30:57And those would, that was like the highest return on investment we got.
31:02Because before we had that, people would write in and they would
31:05complain that Dropbox wasn't working.
31:07And until we had these consistency checkers, we had no idea the
31:10order of magnitude of how many issues were happening.
31:13And when we started doing it, we're like, wow.
31:16There's actually a lot.
31:18So a consistency check in this regard was mostly like a hash over some
31:22packets that you're sending around.
31:24And with that you could verify, okay, up until like from A to B to C to D, we're
31:30all seeing the same hash, but suddenly on the hop from D to E, the hash changes.
31:35Ah-huh.
31:36Let's investigate.
31:37Exactly.
31:38And so, and to do that in a way that's respectful of the users,
31:42even like resources on their system.
31:45Like we wouldn't just go and blast their CPU and their disc and their network to go
31:50and like turn through a bunch of things.
31:51So we would have like a sampling process where we like sample a random
31:54path in the tree and the client and do it the same on the server.
31:58we would have stuff with like Merkle trees and then when things would diverge,
32:02we would try to see like, is there a way we can compare on the client and see like
32:07for example one of the kind of really important, goals for us as an operational
32:12team was to have like the power of zero.
32:14I think it might be from AWS or something.
32:17My co-founder James, has a really good talk on it.
32:19but we would want to have a metric of saying that the number of unexplained
32:25inconsistencies is zero and one 'cause.
32:28Then the nice thing right, is that if it's a zero and it regresses,
32:31you know that it's a regression.
32:33If it's at like fluctuating at like 15 or like a hundred thousand and it kind
32:38of goes up by 5%, it's very hard to know when evaluating a new release, right?
32:42That like that's actually safe or not.
32:44so then that would mean that whenever we would have an inconsistency due to a bit
32:49flip, which we would see all the time on client devices, then we would have to
32:55categorize that and then bucket that out.
32:57So we would have a baseline.
32:59Expectation of how many bit flips there are across all of the devices on Dropbox.
33:03And we would see that that's staying consistent or increasing or
33:06decreasing, and that the number of unexplained things was still at zero.
33:10now let's take those detours since you got me curious.
33:13Uh, what would cause bit flips on a local device?
33:16I think a few, few causes, one of them is just that in the data center, most
33:20memory uses error correction and you have to pay more for it, usually have to pay
33:24more for a motherboard that supports it.
33:26at least back then.
33:27now like on client devices we don't have that.
33:30So this is a little bit above my pay grade for hardware cosmic
33:34rays or thermal noise or whatever.
33:36But memory is much more resilient in the data center.
33:40I think another is just that, storage devices are very greatly in quality.
33:44Like your SSDs and your hard drives are much higher quality inside the data
33:49center than they are on local devices.
33:51And so.
33:53You know, there's that.
33:54it also could be like I had mentioned that people have all
33:57types of weird configurations.
33:59Like on Mac there are all these kernel extensions on Windows, there's
34:03all of these mini filter drivers.
34:05There are all these things that are interposing between
34:07Dropbox, the user space process and writing to the file system.
34:11And if those have any memory safety issues where they're corrupting memory
34:15'cause of the written in archaic C you know, or something that that's
34:19the way things can get corrupted.
34:20I mean, we've seen all types of things.
34:22We've seen network routers get having corrupting data, but usually
34:26that fails some checksum, right?
34:28Or we've seen even registers on CPUs being bad where the memory gets replaced
34:33and the memory seems like it's fine, but then it just turns out the CPU has its
34:38own registers on CHIP that are busted.
34:40And so all of that stuff I think just can happen at scale.
34:44Right.
34:45that makes sense.
34:45And I'm happy to say that I've hadn't had yet to worry about flip bits, whether
34:51it's being for storage or other things, but huge respect to whoever had already
34:56to, tame those parts of the system.
34:59So, you mentioning the consistency check as probably the biggest lever that you
35:05had to understand which health stage your sync engine is in the first place.
35:11was this the only kind of metric and proxy for understanding with how well
35:18the syn system is working or were there some other aspects that gave
35:22you visibility both macro and micro?
35:26Yeah, I mean, I think this yeah, the kind of hangs, so like knowing
35:30that something gets to a sync state and knowing the duration, right?
35:33So the kind of performance of that was one of our top line metrics.
35:38And the other one was this consistency check.
35:40And then first specific like operations, right?
35:43Like uploading a file, like how much bandwidth are people able to use
35:47because for like, people wanted to use Dropbox, but, and upload lots,
35:53like huge data, like huge number of files where each file is really large.
35:57And then they might do it on in Australia or Japan where they're
36:01far away from a data center.
36:03So latency is high, but bandwidth is very high too, right?
36:06So making sure that we could fully saturate their pipes and all
36:09types of stuff with debugging.
36:12Things in the internet, right?
36:13People having really bad routes to AWS and all that.
36:16so we would track things like that.
36:18I think other than that it was mostly just the usual quality stuff,
36:20like just exceptions and making sure that features all work.
36:25I think when we rewrote this system and we, designed it to be very correct.
36:30We moved a lot of these things into testing before we would release.
36:35So we this is I think one of the, to jump ahead a little bit, we designed,
36:38decided to rewrite Dropbox's sync engine from this big Python code base into Rust.
36:45And one of the specific design decisions was to make things extremely testable.
36:49So we would have everything be deterministic on a single thread,
36:53have all of the reads and rights to the network and file system,
36:56be, through a virtualized API.
36:59So then we could run all of these simulations of exploring what would
37:03happen if you uploaded a file here and deleted it concurrently and then had a
37:08network issue that forced you to retry.
37:10And so by simulating all of those in ci, we would be able to then have very
37:14strong in variance about them that knowing that like a file should never
37:18get deleted in this case, or that it should always converge, or things
37:21like the sharing that this file should never get exposed to this other viewer.
37:26I think like the, having much, like having stronger guarantees was something
37:31that we only could really do effectively once we designed the system to make
37:36it easy to test those guarantees.
37:38Right.
37:39That makes a lot of sense.
37:40And I think we're seeing more and more systems, also in the
37:43database world, embrace this.
37:45I think TigerBeetle is, is quite popular for that.
37:49I think the folks at Torso are now also embracing this approach.
37:54I think it goes under the umbrella of simulation testing.
37:57that sounds very interesting.
37:58Can you explain a little bit more how maybe in a much smaller program would
38:03this basically be Just that every assumption and any potential branch,
38:08any sort of side effect thing that might impact the execution of my program.
38:13Now I need to make explicit and it's almost like a parameter that I put into
38:19the arguments of my functions and now I call it under these circumstances, and I
38:25can therefore simulate, oh, if that file suddenly gives me an unexpected error.
38:31Then this is how we're gonna handle it.
38:33Yeah, exactly.
38:34So it's like and there's techniques that like the TigerBeetle folks, like
38:38we, we do this at Convex in rust with the right, like abstractions, there's like
38:42techniques to make it not so awkward.
38:45But yeah, it is like this idea of like, can you pin all of the non-determinism in
38:50the system can, whether it's like reading from a random number generator, whether
38:54it's looking at time, whether it's reading and writing to files or the network.
38:58Can that all be like pulled out so that in, production it's just using the
39:04random AP or the regular APIs for it.
39:07so there's like for any of these sync engines, there's a core
39:10of the system which represents all the sync rules, right?
39:13Like when I get a new file from the server, what do I do?
39:16You know, if there's a concurrent edit to this, what do I do?
39:19and that I. Core of the code is often the part that has the most bugs, right?
39:23It has the, it doesn't think about some of the corner cases or if
39:27there are errors or needs retries or doesn't handle concurrency.
39:30It might have race conditions.
39:32So the kind of, I think the core idea for determination, determin deterministic
39:36simulation testing is to take that core and just kind of like pull out all of the
39:43non-determinism from it into an interface.
39:45So time randomness, reading and writing to the network, reading
39:49and writing to the file system, and making it so that in production,
39:52those are just using the regular APIs.
39:55But in a testing situation, those can be using mocks.
39:59Like they could be using things that for a particular test
40:02and wants to test a scenario or setting it up in a specific way.
40:06Or it could be randomized, right?
40:09Where it might be that reading from Like time, the test framework might
40:14decide pseudo randomly to advance it or to keep it at the current time or
40:18might serialize things differently.
40:21And that type of ability to have random search explore the state space of
40:27all the things that are possible is just one of those like unreasonably
40:30effective ideas, I think for testing.
40:33And then that like getting a system to pass that type of
40:37deterministic simulation testing.
40:39It's not at the threshold of having formal verification, but in our
40:42experience it's pretty close and with a much, much, smaller amount of work.
40:48And you mentioning Haskell at the beginning?
40:50I still remember when I, after a a lot of time having spent writing unit tests in
40:55JavaScript and I, back then, in the other order, I first had JavaScript and then I
41:00learned Haskell, and then I found quick test and was quick test, Quick Check.
41:05which one was it?
41:06I think it was Quick check, right?
41:07Well, right.
41:08So I found Quick Check and I could express sort of like, Hey, this is this type.
41:13It has sort of those aspects to it, those invariants and then would just
41:18go along and test all of those things.
41:20Like, wait, I never thought of that, but of course, yes.
41:23And then you combine those and you would get way too lazy to write unit
41:27tests for the combinatorial explosion of like all of your different things.
41:32And then you can say, sample it like that, and like, focus on this.
41:36and so I actually also, started embracing this practice a lot more in the
41:40TypeScript work that I'm doing through a great project called Prop Check.
41:45and that is, picking up the same ideas and for particularly those
41:52sort of scenarios where, okay, Murphy's Law will come and haunt you.
41:56this is in distributed systems.
41:58That is typically the case.
42:00Building things in such a way where all the aspects can be, specifically
42:05injected and the, the sweet spot.
42:07If you can do so still in an ergonomic way, I think that's the way to go.
42:13It's so, so valuable, right?
42:15And yeah.
42:15And yeah, the ability to, for prop tasks, for quick check for all of these to
42:20also minimize is just magical, right?
42:23Like it comes up with this crazy counter example and it might be
42:27like a list with 700 elements, but then is able to shrink it down to
42:31the, like, real core of the bug.
42:33It's magic, right?
42:35And you know, I mean, I think this is something like, you know.
42:38A totally different theme, right?
42:40Like one thing at Convex we're exploring a lot is like coding has changed a lot
42:44in the past year with AI coding tools.
42:46And one of the things we've observed for getting coding tools to work very
42:50well with Convex is that these types of like very succinct tests that can
42:54be generated easily and have like a really high strength to weight or power
42:59to weight ratio are just really good for like autonomous coding, right?
43:03Like, if you are gonna take like cursor agent and let it go wild,
43:06like what does it take to just let it operate without you doing anything?
43:10It takes something like a prop test because then it can just continuously
43:13make changes, run the test, and not know that it's done until that test passes.
43:18Yeah, that makes a lot of sense.
43:20So let's go back for a moment to the point where you were just transitioning
43:25from the previous Python based sync engine to the Rust based sync engine.
43:32So you're embracing simulation testing to have a better sense of
43:36like all the different aspects that might influence the outcome here.
43:41walk me through like how you, went about.
43:44Deploying that new system.
43:46Were there any sort of big headaches associated with migrating from the
43:52previous system to the new system?
43:54since you, for everything, you had sort of a defacto source
43:57of truth, which are the files.
43:59So could you maybe just forget everything the old system has done and you just
44:04treat it as like, oh, the, user would've just installed this fresh, walk me
44:09through like how you thought about that since migrating systems on such
44:14a big scale is typically, quite dread
44:17Yeah, dreadsome is, yeah.
44:19appropriate word.
44:20I think one of the biggest challenges was that by design we had a very different
44:26data model for the old sync engine.
44:29We called it sync engine Classic.
44:31Affectionately.
44:32And then we had for Nucleus was a new one.
44:34Nucleus had a very different data model, and the motivation for that was that
44:40sync engine Classic just had a ton of possible states that were illegitimate.
44:46It could, if you had like a, the server update a file and the client update
44:50a file, but then a shared folder gets mounted above it, things could get
44:54into all of these really weird states that were legal but would cause bugs.
45:00And then I think that was like one of the big guiding principles more
45:04than even just like Rust or Python, was just like designing what states
45:09should the system be allowed to be in and design away everything else,
45:14make illegal states unrepresentable.
45:17And so that, what that then meant is once we had that.
45:21When we needed to migrate, we had a long tail of really weird starting positions.
45:27So where you basically realized, okay, this system is in this state A, how the
45:33heck did it ever get into that state?
45:35And B, what are we gonna do about it now where we can basically,
45:40it's like from a mapping function, this is like invalid input.
45:44So can you explain a little bit of like, how you constrained the space of, and how
45:49you designed the space of, legitimate, valid states and what were some of the,
45:56if you think about this as like a big matrix of combinations, what are some
46:00of the more intuitive ones that were, not allowed that you saw quite a bit?
46:06Yeah, so I think part of the difficulty for Dropbox, like as syncing things
46:13from the file system is that file system APIs are really anemic.
46:17File system aPIs don't have transactions.
46:19They don't things can get reordered in all types of ways.
46:23So we would just read and write to files from the local file system, and
46:26we would use file system events on Mac, we would use the equivalent on
46:30Windows and Linux to get, updates.
46:32But everything can be reordered and racy and everything.
46:36So one, like common invariant would be that if you have a
46:40directory you know, like files have to exist within directories.
46:44If a file exists, then it's parent directory exists.
46:48And like simultaneously, if you delete a directory, it shouldn't
46:51have any files within it.
46:53And that invariant guarantees and that the file system is a tree.
46:57Right?
46:58And then we, it's very easy to come up with settings, with reads from the
47:03local file system where if you just naively take that and write it into
47:07your SQLite database, you will end up with data that does not form a tree.
47:12and then especially even with like I know it's being unique, right?
47:16Like if I move a file from A to B, then I might observe the add for it at B
47:23way before the delete at B or I might observe it vice versa, where the file
47:28is transiently gone and disappeared and we definitely don't wanna sync that.
47:31and then with directories, if I have like a, as a directory and then B as
47:37a directory, and then I move it's, I could observe a state where A moves into
47:43B, which then without doing the right bookkeeping, might introduce a cycle in
47:48the graph and a cycle for directories would be really bad news, right?
47:52so all of these invariants were things that the file system APIs, they don't
47:57respect, even though the file system internally has these invariants, right?
48:01You cannot create a direct recycle on any file system.
48:05Definitely.
48:05I mean certainly without root And all of these invariants exist but
48:09are not observable through the APIs.
48:12And so then we sync Engine Classic would get into the state where it's
48:16like local SQLite file would have all types of violations like that.
48:20So then how do we read the tea leaves of like the database is in
48:24a really weird state we can't lose.
48:26And to go back to, I think what you had talked about at the beginning of this was
48:30that we always had the nuclear option of dropping all of our local state and doing
48:36a full resync from the files themselves.
48:39But then the problem is that we would entirely lose user intent.
48:42So if, for example, I was offline for a month and I had a bunch of files,
48:48and then during that month other people in my team deleted those files.
48:53If I came back online and didn't have my local database, we would have to
48:58recreate those files and people would complain about this all the time because.
49:03They would delete something and wanna delete it, and then Dropbox would
49:05just randomly decide to resurrect it.
49:07So those types of decisions we, we tried to avoid that as much as possible, but
49:12then that meant having to look at a potentially really confusing database and
49:17read what the user intent might have been.
49:19Right.
49:20I wanna dig a little bit more into the topic of user intent.
49:24Since with Dropbox you've built a sync engine very specifically for the use
49:30case of file management, et cetera, where user intent has a particular meaning that
49:36might be very different from moving a cursor around in a Google Docs document.
49:41So can you explain a little bit, what are some of the, common scenarios of, and
49:47maybe subtle scenarios of user intent, when it comes to the Dropbox design space?
49:55Yeah, totally.
49:56and I think the for regular things like say editing files.
50:01I think we saw that like people just generally did not, maybe because
50:06of the way the system was even its capabilities, people did not
50:09edit the same files all too often.
50:12So maintaining user intent when file, when everyone is online, just kind of
50:17taking last writer wins Where I think user intent became very interesting is
50:21if someone went offline, like they're on an airplane before wifi and airplanes
50:27And they worked on their document and someone else worked on the same time.
50:31In that case, we observed that users always wanted to see the conflicted
50:35copy and that they wanted to get the opportunity to say, like, I did.
50:39I put in a lot of effort into working on this when I was on the plane.
50:43Someone else, put in probably a similar amount of effort when they were online and
50:48you know, so last writer wins policies.
50:50There violated user expectations quite a lot because either a person
50:55had to win and then the person who lost would be really upset.
50:58so I think those were pretty interesting.
51:00I think with Moose, like with more metadata operations I think people
51:05were a little bit more permissive.
51:06Like if I moved something from one folder to another, another person
51:10moved it to a different folder.
51:12having it just converged on something as long as it converges.
51:15We observed it being like people didn't worry about it too much.
51:18I think the place where user intent is really interesting
51:21with moves is with sharing.
51:23So I think thinking about this from like the distributed systems
51:26perspective on causality, there would be like someone might have like,
51:31I dunno, their HR folder, right?
51:33And I don't know, like, let's say that someone is transferring to the HR team is
51:38they're getting added to the HR folder.
51:41But then say before they were on the team, they were on a
51:44performance improvement plan.
51:46So then the administrator for HR would delete that file, make sure it's
51:50deleted, and then add them to the folder.
51:54And so their user intent is express in a very specific
51:59sequencing of operations, right?
52:01That like this causally depended on this.
52:04I would not have invited 'em to the folder unless the delete was stably synced.
52:08And that making sure that gets preserved throughout the system,
52:12even when people are going online and offline and everything is a very
52:16hard distributed systems problem.
52:18Right.
52:18and it was intimately related with the details of the product.
52:22Right.
52:23yeah.
52:23How did you capture that causality chain of events since you probably also
52:29couldn't quite trust the system clock?
52:32How did you go about that?
52:34Yeah, this became even more difficult, right?
52:36Where file system metadata was partitioned across many shards in the database.
52:41So then we ended up using something like Lamport timestamp, where every single
52:45operation would get assigned a timestamp.
52:47And those timestamps were usually only reading and writing to their
52:50particular shard and for whatever timestamp the client had observed.
52:55But then in these cases where there were potentially cross shard, they
52:59weren't transactions, but like causal dependencies, we would be able to say
53:03like, the operation to mount this or to add someone to the shared folder
53:07and there them mounting it within their file system has to have a higher
53:11timestamp than any right within that or.
53:15Rights including deletes.
53:16so then that way when the client is syncing it would be able to know that when
53:21I am merging operation logs across all of the different shards, I need to assemble
53:26them in a causally consistent order.
53:29And that would then respect all of these particular invariants.
53:33Right.
53:34So you having thought through those different scenarios for Dropbox and
53:38made very intentional design decisions that, for example, in one scenario
53:43last writer wins is not desirable.
53:46Since that might lead to a very sad person stepping off the plane because
53:51all of your data is suddenly gone, or the other person's data is gone.
53:55so you make very specific design trade-offs here when it
53:58comes to somehow squaring the circle of distributed systems.
54:03Which sort of advice would you have for application developers or people even
54:08who are sitting inside of a company and are now thinking about, oh, maybe
54:12we should have our own Dropbox style, linear style sync engine internally.
54:17Which sort of advice would you give them when they Yeah.
54:21Start thinking this through to the detail.
54:23Yeah, I'll talk through kind of how we structured things at Dropbox to be able
54:28to navigate these types of problems.
54:30And I think the patterns here, can be quite general.
54:33I think what we ended up with was that like thinking like distributed
54:37systems syncing is hard, right?
54:40So we would have the kind of base layer of the sync protocol and how state
54:45gets moved around between the clients and the servers and all the shards.
54:49We would have very strong consistency guarantees there.
54:52So we would not use any of the knowledge of the product at that layer.
54:57So from a, like thinking of Dropbox in the file system as a CRDT.
55:03Dropbox allows, like moves to happen concurrently.
55:06It ha allows you to add something while another thing is happening.
55:10But at the protocol level, we kept things very strict.
55:12We kept them very close to being serializable that every view of the
55:17system was identified by a very small amount of state, like a timestamp.
55:21And that would fully determine the state of the system and like the
55:24amount of entropy in that was very low.
55:26And then whenever you are modifying it, you would say, here's what I expect
55:30the data to be, and if it doesn't match exactly, it will reject the operation.
55:34And then by doing it, structuring things in that way, then we made it very easy
55:39for product teams and for even us working on sync to embed all of these like
55:45looser more product focused requirements.
55:47They also may wanna change over time into the end points, like layered on top.
55:51So every time we wanted to change a policy on how like a delete reconciles with an.
55:57You know, add for a folder or something.
55:59We didn't have to solve any distributed systems problems to do that.
56:03So I think that like pattern of saying that, like is there a good abstraction?
56:07Is there something that is like very powerful that could solve a large
56:11class of problems, doing that well at the lowest layer and then potentially
56:16weakening the consistency above it.
56:19I actually really like the Rocicorp folks have a really great description of
56:24their consistency model for Replicache of it being like session plus consistency.
56:29And it's like a very similar idea where like when we build things on
56:34a platform, we may as our with our product hats on, like want users to
56:38not have to think about conflicts and merging and all that in a lot of cases.
56:42But those decisions might be very particular to our app.
56:45And that's something that holds for everything on the platform.
56:48And then there's always a way to embed those decisions onto, say.
56:52Session consistency and Replicache or serializability and other systems.
56:57And so I think that's like that separation of concerns I
57:00think is something that can apply to a lot of systems.
57:04Right.
57:04So maybe we use this also as a transition to talk a bit more about what you're
57:09now designing and working on Convex.
57:12What were some of the key insights that you've taken with you from Dropbox that
57:19ultimately led to you co-founding Convex?
57:22Yeah, when we first were starting Convex we were looking at how apps
57:27are getting built today, right?
57:28Like web apps are easier to build than ever.
57:32Even in 2021, it's incredible how much, like more productive
57:37that compared to 10 years before.
57:39Right.
57:40It was, and I think we noticed that the hard part for so many discussions
57:45was managing state and like how state propagates I think it was from
57:50the Riffle paper right, on how like so many issues in app development
57:54are kind of database problems in disguise and that how techniques
57:58from databases might be able to help.
58:00So with Convex we were saying like, well if we start with the idea of designing
58:05a database from first principles, can we apply some of those database solutions
58:10to things across the whole stack?
58:12So say for example, when I'm reading data from it within in my app, I have
58:17all of these React components that are all reading different pieces of data.
58:21It'd be really nice if all of them just executed at the same timestamp
58:24and I never had to handle consistency issues where one component knows
58:29about a user or the other one doesn't.
58:31Similarly, like why isn't it possible to be that I just use query across
58:36all my components and they just all live update whenever I read anything,
58:40it's a automatically reactive.
58:42So those were some of the like the initial kind of thought
58:46experiments for what led to Convex.
58:48I think the other one that was really motivated from our time at
58:52Dropbox and I think is like kind of a both a blessing and a curse.
58:56It's kind of like one of the key design decisions for Convex is
58:59that Convex is very opinionated about there being a separation
59:03between the client and the server.
59:05So we saw this at Dropbox where they were just different teams, right?
59:09And you know, as we've seen with like even the origin of GraphQL, right?
59:13Like that ability to decouple development between.
59:16teams working on user facing features and the way that the data fetching
59:20is implemented on the backend, it's gonna be really powerful.
59:23And so kind of the kind of thought experiment with Convex is, can we
59:27maintain a very strong separation while still getting like live updating, while
59:32still getting a really good ergonomics for both consuming data on the client
59:36and like fetching it on the server.
59:39Right.
59:39So yeah, walk me through a little bit more through the evolution of Convex then.
59:44And so, in, in terms of all the other options that are out there in terms
59:49of state management and I think most what applications are using is probably
59:55something that at least to some degree is somewhat customized and hand rolled and
1:00:01comes with its own huge set of trade-offs.
1:00:05Help me better understand sort of the, where you mentioned the,
1:00:08opinionated nature of Convex.
1:00:11What are the, benefits of that?
1:00:13What are the downsides of that and other implications?
1:00:16Yeah, so when you write an app on Convex we can use maybe
1:00:20like a basic to do app, right?
1:00:22The linear clone, everyone does.
1:00:24you write endpoints like you might be used to, right?
1:00:26Where it's like list all the to-dos in a project like update a to-do in a project.
1:00:31and those get pushed as your API to your Convex server.
1:00:35the implementations of that API can then read and write to the database
1:00:39and Convex has like a, kinda like Mongo or Firebase, like API for doing so.
1:00:44I think the main benefit then of Convex relative to more traditional
1:00:48architectures is that if you're on the client, the only thing you need to do
1:00:53is call the, like the use query hook.
1:00:56You're saying like, I am looking at a project I just do use like use query
1:01:01list tasks and project that will then talk to the server, run that query, but
1:01:07then also set up the subscription and then whenever any data that that query
1:01:12looked at changes, it will efficiently determine that and then push the update.
1:01:16So part of what is like been nice with Convex is that you are getting
1:01:21a client that has a web socket protocol, it has a sync engine built in.
1:01:26You're getting infrastructure for running JavaScript at scale and for
1:01:30handling sandboxing and all of that.
1:01:32And then you're also getting a database, which is, you know.
1:01:36One, supporting transactions or reading and writing to it.
1:01:39But then it also supports this efficient like being able to subscribe
1:01:43on, I ran this query, this query just ran a bunch of JavaScript.
1:01:47It looked at different rows and it ran some queries.
1:01:51the system will automatically efficiently determine if any right overlaps with that.
1:01:56So the combination of all of those things is like part of the benefit of
1:01:59Convex, you just write TypeScript and you write it in a way that's, feels
1:02:03very natural and everything just works.
1:02:07And I think some of the like downsides is that it's it is a different set of APIs.
1:02:13it's not using sql, it's doing things a little bit differently
1:02:16than they've been done before.
1:02:18yeah, it's like kind of interesting even today to see like what you know.
1:02:23Talking about AI code gen, right?
1:02:24Like models have been trained, pre-trained on this huge corpus
1:02:28of stuff on the internet.
1:02:29And when are they good at adopting new technologies?
1:02:32Technologies that might be after their knowledge cutoff.
1:02:35And when are they like it's better just to stick to things that they know already.
1:02:39Right.
1:02:39So what you've mentioned before where you say, Convex is rather opinionated for me.
1:02:45in let's say five years ago, I might've been much more of
1:02:49like, oh, but maybe there's a technology that's less opinionated
1:02:53and I can use it for everything.
1:02:54But the more experience I got, the more I realized no, actually.
1:02:58I want something that's very opinionated, but opinionated
1:03:02and I share those opinions.
1:03:04Those are exactly for my use case.
1:03:06So I think that is much better.
1:03:08This is why we have different technologies and they are great for different
1:03:12scenarios, and I think the more a technology tries to say, no, we're,
1:03:17we're best for everything, I think the, less it's actually good at anything.
1:03:23And so I greatly appreciate you standing your ground and saying
1:03:26like, Hey, those are, our design, decisions that we've made.
1:03:31And those are the use cases where, you'd be really well served building
1:03:35on top of something like Convex.
1:03:37And, I particularly like for now where TypeScript is really the, default
1:03:42language to build full stack applications.
1:03:45And it's also increasingly becoming the default for.
1:03:48ai, based applications as well.
1:03:51And AI based systems speak type script, just as well as English.
1:03:57And given that Convex makes that full stack super easy.
1:04:02And also I think you can, when you build local-first apps, it can
1:04:07sometimes get really tricky because you empower the client so much.
1:04:11You give the client so much responsibility and therefore there's
1:04:15many, many things that can go wrong.
1:04:17And I think Convex therefore, takes a more conservative approach and says
1:04:21like, Hey, everything that happens on the server is like highly privileged
1:04:25and this is your safe environment.
1:04:27And the client will try to give you the best user experience and
1:04:31developer experience out of the box.
1:04:33But the client could be in a more adversarial environment.
1:04:37And I think those are great design trade offs.
1:04:40So, I think that is a fantastic foundation for tons of different applications.
1:04:45Yeah.
1:04:46talking about some of these strong opinions being both
1:04:49blessings and curses, right?
1:04:50Like over the past few months, one thing we've been working on is trying
1:04:54to bridge the gap between those two points in the spectrum, right?
1:04:58we wrote a blog post on it a few months ago of like working on what we're calling
1:05:02our like Object sync engine, trying to take a lot of the principles from more of
1:05:08a local-first type approach of having a data model that it is synced to the client
1:05:14and the only interaction between the server and the client is through the sync.
1:05:18And the client then can always render its UI just looking at the local
1:05:22database and it can be offline.
1:05:24It's also fully describes the app stage so it can be exported
1:05:28and rehydrated or whatever.
1:05:29it's very interesting design exercise we've been on to say like, can
1:05:33you structure a protocol on a sync engine in a way such that the UI
1:05:39is still reading and writing to a local store that is authoritative.
1:05:43But then that local store is like to kind of use like an electric SQL terminology is
1:05:47like that is a shape that is some mapping of a strongly separated server data model.
1:05:52So we still have a client data model and server data model, which might be
1:05:56owned by different teams and evolve independently and, we also have that
1:06:01strong separation where the implementation of the shape is privileged and running
1:06:06on the server and has authorization rules built in and get the best of both worlds.
1:06:10And we've kind of, we have a like beta that we've not released publicly thought
1:06:16open, sourced out there, but kind of a thing where we, I think they're
1:06:19still figuring out like the DX for it.
1:06:21And I think we have something that like algorithmically works
1:06:24and it's like the protocol works, but it's like, it's kind of hard.
1:06:28Right.
1:06:28It kind of reminds me a lot of writing GraphQL resolvers of like saying How do I
1:06:32take the messages table from my chat app?
1:06:35Then under the hood that might be joining stuff from many different
1:06:39tables and filtering rows, or might even be doing a full tech search
1:06:43query in another view or something.
1:06:45and coming up with the right ergonomics to make that feel
1:06:48great for a day one experience.
1:06:50I think something that's like still we're working on, still
1:06:53kinda like a research project,
1:06:54right?
1:06:54Well, when it comes to data, there is no free lunch, but I'd much rather to have
1:06:58it be done in the order and sequencing that you're going through, which is
1:07:03having a solid foundation that I can trust and then figuring out the right
1:07:09ergonomics afterwards, since I think there's many, many tools that start with
1:07:14great ergonomics, but later realize that it's on a built, on a unsound foundation.
1:07:19So when it comes to data, I want a trustworthy foundation, and I think
1:07:24you're going about in the right order.
1:07:26Hey, Sujay, I've been learning so much about one of my favorite
1:07:31products of all time, Dropbox.
1:07:33I've learned so much of like how the sausage was actually made, how it evolved
1:07:39over time and I'm really excited that you got to share the story today and
1:07:45many me included, got to, learn from it.
1:07:48Thank you so much for taking the time and sharing all of this.
1:07:51Thanks for having me.
1:07:52This is super, super fun.
1:07:54Thank you for listening to the localfirst.fm podcast.
1:07:56If you've enjoyed this episode and haven't done so already, please
1:08:00subscribe and leave a review.
1:08:01Please also share this episode with your friends and colleagues.
1:08:04Spreading the word about the podcast is a great way to support
1:08:07it and to help me keep it going.
1:08:09A special thanks again to Jazz for supporting this podcast.
1:08:13I'll see you next time.