00:00There's another kind of interesting
decision here on Dropbox by
00:04design was always like a sidecar.
00:06It's always something that just
sits and it looks at your files.
00:09Your files are just regular
files on the file system.
00:12And if Dropbox, the app isn't running,
your files are there and they're safe,
00:17and it's something that you know,
regular apps can just read and write
00:21to, and in some sense like Dropbox
was unintentionally local-first
00:27from that perspective, right?
00:28Because it's saying that no
matter what happens, your data
00:31is just there and you own it.
00:33Welcome to the localfirst.fm podcast.
00:36I'm your host, Johannes Schickling,
and I'm a web developer, a
00:39startup founder, and love the
craft of software engineering.
00:42For the past few years, I've been on a
journey to build a modern high quality
00:46music app using web technologies, and
in doing so, I've been following down
00:50the rabbit hole of local-first software.
00:52This podcast is your invitation
to join me on that journey.
00:56In this episode, I'm
speaking to Sujay Jayakar.
00:59Co-founder of Convex and
Early Engineer at Dropbox.
01:02In this conversation, Sujay shares
the story on how the Sync Engine
01:06powering Dropbox was built initially
and later redesigned to address all
01:11sorts of distributed systems problems.
01:13Before getting started, also a big thank
you to Jazz for supporting this podcast.
01:19And now my interview with Sujay.
01:22Hey, Sujay.
01:23So nice to have you on the show.
01:24How are you doing?
01:25Doing great.
01:26Great.
01:26Really happy to be here.
01:28I'm super excited to have you on the show.
01:30I've been using your work really
since over a decade at this point
01:35when I was really getting into
using computers productively.
01:39And we just the other time had another
really interesting guest, Seph Gentle on
01:45the podcast, who has worked on a really
fascinating tool, called Google Wave
01:50back then that had a big impact on me.
01:53And you've been working on another
technology that had a big impact
01:56on me, which is Dropbox and still
has a very positive impact on me.
02:01That was all the way back then
over 10 years ago in 2014.
02:05I don't think I need to explain to
the audience what Dropbox is, but, I
02:11want to hear it from you, like what
led you to join Dropbox, I think very
02:15early on and just hearing a little
bit just embedded in your personal
02:20context when you joined it, and then
we're gonna go dive really deep into
02:24all things syncing related, et cetera.
02:27How does that sound?
02:28Yeah, that sounds great.
02:30It's actually a really funny story.
02:31my career here in
technology started in 2012.
02:34I was actually studying mathematics.
02:37I was going to go work at the NSA doing
cryptography, and I was born in India.
02:44but I'm a naturalized citizen for
the United States, and, you have to
02:48be, have security clearance to go do
these types of cryptography things.
02:52And you know, my clearance kept
on dragging on and on and on and
02:58they like interviewed my roommates
and apparently just a very sketchy
03:01guy so I had an offer to go work
there, but it kept on dragging on.
03:06And then my roommate, at the time was a
computer science major who wanted some
03:10like someone to go with him to the career
fair and, just started chatting with the
03:15Dropbox people and you know, it's about
like a hundred people around that time.
03:19And, chatting turned into hanging out
at dinner, turned into interviewing
03:23and being a math person, I did
my interviews all in Haskell and
03:26didn't know any real programming.
03:28and then yeah, that turned into doing an
internship, dropping out of undergrad and.
03:34just following the dream.
03:35And so I worked on, at Dropbox,
I worked on a bunch of things.
03:38I started off working on
our, like, growth team.
03:40So I did a lot of like email system.
03:43Like I did this, I worked on this thing
called the space raise, like a promotion.
03:47Oh, I remember that.
03:48Yes.
03:49I think I've, I've earned quite a lot
of like free storage, which I think
03:53over the time has like gone down.
03:56But that was a very smart
and effective mechanism.
03:58I surely invited all my friends back then.
04:01I couldn't afford a premium plan
being a broke student that worked.
04:07And then from there worked on
the sync engine for some time.
04:11And then right now I'm the co-founder and
chief scientist of a startup called Convex
04:16and my three co-founders and I met working
on this project called Magic Pocket,
04:20where Dropbox stores hundreds of petabytes
now exabytes of files, for users.
04:25And we used to do that in S3.
04:27And so the three of us worked together on
a team to build Amazon S3, but in-house
04:32and migrate all of the data over.
04:34so we did that for a few years and then,
Worked on rewriting the entirety of
04:39Dropbox, the sync engine, the thing that
runs on all of our desktop computers.
04:43we rewrote it to be really correct
and scalable and very flexible.
04:47and shipped that.
04:48after that left Dropbox in 2020 I
was trying to decide if I wanted
04:52to get back to academics or not.
04:54So I did some research in networking
and then decided to start Convex in 2021.
04:59Certainly curious, which sort of
research has had your interest the
05:03most in this sort of transitionary
per period, but maybe we stash that
05:07for a moment and go back to the
beginning when you joined Dropbox.
05:11you mentioned there were around a
hundred people working there currently.
05:15how do I need to imagine the technology
behind Dropbox at this point?
05:20it clearly started all out with like,
desktop focused like daemon project,
05:27like daemon process that's running on your
machine somehow keeps track of the files
05:33on your system and then applies the magic.
05:37So explain to me how things worked
back then and what was it like to
05:43work at Dropbox when there were
around about a hundred people.
05:46Yeah, I mean, it was
pretty magical, right?
05:49Because the company had, I think gotten
so many things right on the product side
05:54and then those showed up in technology.
05:55But just this feeling of like Dropbox
being this product that just worked right?
06:00It was for everyone.
06:01It was not just for technologists, but
anyone should be able, anyone who's
06:05comfortable using a computer should
be able to install Dropbox and have
06:09a folder of theirs become magical.
06:12And without understanding anything
about how it works, they should
06:15just think of it as like an
extension of what they know already.
06:19yeah.
06:19And so like the ways that that showed
up I think were really interesting.
06:22At the time there was a very strong
culture of like reverse engineering.
06:26So to have this daemon that runs locally.
06:30You know, there was one of the amazing
early moments in Dropbox was that,
06:34if like you open up finder or explore
and you have the overlays on it.
06:39Like that used to be done by
like attaching to the finder
06:43process and injecting code into it
06:47and to the point where, uh, when some
folks had gone to talk to Apple at the
06:51time and about like working with the
file system and everything like the,
06:57there were teams at Apple that asked
Dropbox, how did you do that in Finder?
07:05So you wanted to offer the
most native experience.
07:08There weren't the necessary APIs for that.
07:11And so you just made it happen.
07:12That's amazing.
07:13Yeah.
07:14Yeah.
07:14And so that that idea of like, how do
you create the best user experience,
07:19something that you know, for the purpose
of making non-technical users feel very
07:26confident and feel very safe using it.
07:28That was another, I think, really
deep like company value of like
07:32being worthy of trust and taking
people's files very seriously.
07:36You know, I like remember having a
friend who was in residency at the
07:39time and he was telling me that he
keeps all of his, like some of his non
07:44HIPAA stuff, but like his things that
he looks at on Dropbox and you know,
07:50pulls him up and he's consulting 'em.
07:51And there's a part of me which
is terrified by that, right?
07:54Like we think of software as
something where like throwing a 500
07:57error is fine every once in a while.
08:00And a Dropbox that was there was
a culture of making users feel
08:03like they could really trust us.
08:04And then that showed up for things
like making sure that, like when
08:08we give feedback to users, if we
put that green overlay in finder.
08:12They know that no matter what happens,
they could throw their laptop in a pool.
08:16They could like they, anything could
happen and their files are safe.
08:20Like if their house burns down, they
don't have to worry about that thing.
08:24And that's like all of that reverse
engineering and all of the emphasis
08:29on correctness and durability.
08:31It was all in service of that feeling,
which I think was really cool.
08:34so on the engineering side, at the
time it was like in hyper growth mode.
08:38So they had a Python desktop client.
08:40Almost all of Dropbox was
in Python at the time.
08:43And so there's a pre my py, like big
rapidly changing desktop client that
08:50needed to support Mac, windows and Linux
and all these different file systems.
08:53and then on the server, it was like we
had one big server called Meta Server,
08:58meta, I think was from metadata.
09:00and that like ran almost all of Dropbox.
09:03We stored the metadata in MySQL.
09:06The files were stored in S3, and then
we had a separate notification server
09:11for managing pushes and things like that.
09:13And so it was like kind of classic
architecture and like reach was
09:17starting to reach the limits of
its scaling even at that time.
09:20And, those were a lot of the things
we worked on over the next 10 years.
09:24Wow.
09:25So was the server also written in Python?
09:27So it was all one big python shop.
09:30Yeah.
09:30And the server was all written in Python.
09:33we, had some pretty funny bugs
that were due to it's kind of
09:39crazy to think about it now.
09:40You know, we, you working in TypeScript
and full time and to think of, like
09:44back in the day we just had these like
hundreds of thousands, millions of lines
09:48of code with no type safety and with
all types of crazy meta programming and
09:55decorators and meta classes and stuff.
09:57And yeah, so there was a, it was
all in Python when I showed up.
10:00it was not all in Python and not all in
one big monolithic service when I left.
10:04So you mentioning joining when there
were around a hundred people and you
10:09probably already at this point had
like multitudes more in terms of users.
10:15Being in hypergrowth, it is sort of this
race against time where you only have
10:21so much time to work on something, but
growth may be outrun you already and
10:26things are already starting to break.
10:28Or You know like, okay, if things
gonna grow like this, this system will
10:33break and it's gonna be pretty bad.
10:36So tell me more about how you were
dealing with like the constant r
10:42race against time to rebuild systems,
redesign systems, putting out fires.
10:49What was that like?
10:50Yeah, and I think there's like kind of
an interesting place to take this on.
10:53I think like the normal things
were on scale right there.
10:56Those were like.
10:57One, kinda class of problems of
being able to handle the load.
11:00But I think one kind of really
interesting, dimension of this that led
11:04to our decision to start rewriting all
of the sync engine in 2016 was actually
11:09just like customer debugging load.
11:12You know, we have we had hundreds of
millions of active users and they were
11:17using Dropbox in all types of crazy ways.
11:20Like one of the stories is someone
was using Dropbox with like, I think
11:24it was running on some, I don't know
if it was like a raspberry pie or
11:27something, something on his tractor.
11:28Like the guy ran a farm and he
was using Dropbox to sink like
11:32pads in text files to his tractor.
11:35And I might be getting some
of the details wrong, but
11:37it's something like that.
11:38And so people would just use Dropbox
in all types of crazy ways on crazy
11:43file systems with kernel modules
running that are messing things around
11:47or so I think, You know, in terms of
getting ahead of scale, I think we found
11:52ourselves around 2015, 2016, in the
place where for the syn engine on the
11:58desktop client, the entire team just
spent all of its time debugging issues.
12:04We had this principle of like anything
that's possible, anything that a
12:08protocol allows, anything that some
threading race condition that's
12:13theoretically possible will be possible.
12:16And then we would see it, right?
12:17Like users would write in
saying, my files aren't sinking.
12:20And then we would look at it and we would
spend months debugging each one of these
12:24issues and trying to read the tea leaves
from traces and reports and reproductions.
12:30And it'll be like, oh they
mounted this file system over here
12:33and then this one and this one
are in a different file system.
12:36So moving the file actually did
a copy, but then the X adders
12:40were in, preserved this and that.
12:42You know, in terms of that theme of like
getting ahead of scale, like I think there
12:46was first this realization that like the
set of possible things that can happen in
12:51the system is just astronomically large.
12:54And all of them will happen
if they're allowed to.
12:57And we do not have, no matter
how much like incremental time
13:01we put into debugging things, we
will never be able to keep up.
13:05And the cost of doing that is
that the entire team is working
13:08on maintenance like this.
13:09We couldn't build any new features.
13:11So I think that was a motivation then for
the rewrite to is can we find like points
13:15of leverage where if we just invest a
little bit in technology upfront, like by
13:20architecting things a particular way, can
we just eliminate a much bigger set of
13:25potential work from debugging and working
with customers and stuff like that.
13:29So maybe this is a good time
to take a step back and try to
13:33better understand what was Dropbox
sync Engine actually back then?
13:38So from just thinking about it through
like a user's perspective, I have maybe
13:45two computers, and I have files over here.
13:48I. I want to make sure that I have the
files synced over from here to here.
13:53So I could now think about this as
sort of like a Git style, approach.
13:59Maybe there's other ways as well.
14:01walk me through sort of like through the
solution space, how this could have been
14:05approached and how was it approached?
14:07is there some sort of like diffing
involved between different file states
14:12over time, those are being synced around.
14:14Do you sync around the
actual file content itself?
14:18Help me to understand.
14:19Building a mental model, what does it mean
back then for the sync engine to work?
14:25Yeah.
14:25Yeah.
14:26It's a super interesting question, right?
14:28Because I think like you're saying,
there's so many different paths one
14:31can take and it's, I think one of
those things where like if someone
14:34asks, like design Dropbox in an
interview question, there's like
14:37definitely not one right answer, right?
14:39It's like there are so many trade-offs and
like different forks in the decision tree.
14:44I think one of the first things is
that, so you have your desktop A and you
14:48have your, maybe you have your desktop
and your laptop, and one of the first
14:52decisions for Dropbox is that we would
have a central server in the middle,
14:56that there would be a Dropbox file system
in the middle that Dropbox, the company
15:01ran, and we did that from this trust
perspective, we wanted to say that we
15:05will run this infallibly when you get
that green check mark when it's there.
15:11You know, even if an asteroid destroys
the eastern side of the United
15:15States, like we will have things
replicated in multiple data centers.
15:18And that you know, and then
also it's accessible anywhere
15:22on the internet, right?
15:23You can go to the library.
15:24This is not so common these days but
I remember when I was a student, like,
15:26go to the library, log into Dropbox
and read all your things right?
15:30rather than having to
bring a USB stick around.
15:32And so I think that is the first
decision, but it's not necessary, right?
15:36Like there were plenty of
distributed, entirely peer to
15:39peer file syncing, designs, right?
15:42And so that was the first decision.
15:44And I think the kind of second decision
was that if we imagine our desktop and
15:48our laptop and you have the server in
the middle, the desktop might be on
15:52Windows, the laptop might be on Mac OS.
15:56So I think that decision to
support multiple platforms.
15:59Is like another really interesting one.
16:02This is like where I think Git and
Dropbox can be a little bit different.
16:06And that Git is at the end of
the day quite Linux centric.
16:09It's case sensitive for its file system.
16:12It deals with directories and it
makes particular assumptions about
16:15how directories should behave.
16:17And that was something with Dropbox.
16:19We wanted to be consumer, we wanted
to support everything and we wanted
16:22it to feel very automatic, right?
16:24That like, someone shouldn't have to
understand like what a, like unicode,
16:28normalization disagreement means.
16:29Right?
16:30Where in Git like in really bad settings,
like you might have to understand
16:34that, that you're right, you with an
accent differently on Mac and Windows.
16:38so I think that's the
kind of like next, side.
16:40So then Dropbox has its design
for a file system and it's a
16:43central, it's like the hub and all
those folks are your phone, your.
16:48desktop, your laptop and whatnot.
16:50and then so to kind of get
down to the details a bit more.
16:53So then, yeah, we have a process that
runs on your computer, that's the
16:56Dropbox app, and that watches all of
the files on your file system, and then
17:02it looks at what's happened and then
syncs them up to the Dropbox server.
17:07And then whenever changes happen on
the Dropbox server, it syncs them down.
17:11there's another kind of interesting
decision here on Dropbox by
17:15design was always like a sidecar.
17:17It's always something that just
sits and it looks at your files.
17:20Your files are just regular
files on the file system.
17:23And if Dropbox, the app isn't running,
your files are there and they're safe,
17:28and it's something that you know,
regular apps can just read and write
17:32to, and in some sense like Dropbox
was unintentionally local-first
17:37from that perspective, right?
17:39Because it's saying that no
matter what happens, your data
17:42is just there and you own it.
17:44and you know, there are
other systems, right?
17:46Like if you use NFS a like a network
file system, then if you unmount it or
17:52if you lose connection to the server.
17:54You might not be able to actually open
any files that you have the metadata for.
17:58Right.
17:59And I remember from a user perspective,
the local-first aspect, I really went
18:04through like all the stages where I
had a computer that wasn't connected
18:08to the internet yet, and that at some
point I had an internet connection.
18:12But, files were always where like
everything depended on files.
18:16Like if I didn't have a
file, things wouldn't work.
18:20Everything depended on files.
18:22There were barely websites that
where you could do meaningful things.
18:26certainly web apps
weren't very common yet.
18:30And then Dropbox made everything
seamlessly work together.
18:35And then when web apps and SaaS
software more came along, I was a
18:41bit confused because I felt Okay.
18:43I t gives me some collaboration, but seems
to be a different kind of collaboration
18:48since I had collaboration before.
18:50But I also understood the limitations
of, when I'm working on the same doc
18:56file, through Dropbox, which gets
sort of like the first copy, second
19:00copy, third copy, and now I need
to somehow manually reconcile that.
19:05And when I saw Google
Docs for the first time.
19:09That was really like a revelation because,
oh, now we can do this at the same time.
19:14But at the same while I saw that,
I still remember the feeling
19:19where, but where are my files?
19:20This is my stuff now.
19:22Where, where is it?
19:23And that trust that you've mentioned
with Dropbox, I felt like I lost some,
19:30some control here and it required a
lot of trust, in those tools that I
19:35started now step by step, embracing.
19:37And frankly, I think a lot of those tools
didn't deserve my trust in hindsight.
19:41I still feel like we've lost something
by no longer being able to like call
19:48the foundation our own in a way.
19:50And I'm still hoping that we kind of
find the best of both worlds where
19:54we get that seamless collaboration
that we now take for granted.
19:58Something like that Figma gives us.
20:00but also the control and just being
ready for whatever happens, that's
20:06something Dropbox gave us out of the box.
20:08I just wanna share this sort of
like anecdote and like almost
20:12emotional confusion as I walk
through those different stages
20:16of how we work with software.
20:18Totally.
20:19And we've ended up in a place
that's not great in a lot of ways.
20:22Right.
20:22And I think you know, I think part
of the sad thing, and maybe from
20:27even like an operating systems design
perspective is that I feel like files
20:32have lots of design decisions that are.
20:35Packaged up together.
20:36You know, like one of the amazing
things about files is that
20:39they're self-contained, right?
20:41Like on Google, I don't know what
Google's backend looks like for Google
20:44Docs, but they probably have like all
of the metadata and pieces of the data
20:49spread and different rows in a database
and different things in an object store.
20:53And just even thinking about like
the physical implementation of that
20:57data, it's like scattered around
probably a bunch of servers, right?
21:00Maybe in different data centers.
21:02And there's something really nice
about a file where a file is just
21:05like a piece of state, right?
21:08That is just self-contained.
21:09And I think the thing that I think
is one of the things I think is very
21:13unfortunate is like from a operating
systems perspective is that that decision
21:18then has also been coupled with a very
anemic API like with files, they're just
21:24sequences of bytes that can be read
and written to and impended and there's
21:30no additional structure beyond that.
21:32And I think like.
21:33Folks the way that things have
evolved is that we've given up on, too
21:37have more structure, too make things
like Google Docs, too be able to
21:41reconcile and have collaboration and
interpret things more than just bites.
21:46We've also given up this ability
to package things together.
21:49Mac os had like a very kind of
baby step in this direction with I
21:53think they're called bundles.
21:54Like the things where like if
you have like your.app, they're
21:57actually a zip file, right?
21:59And there's all types of ways, all
types of brain damage for how this
22:03like, doesn't actually work well.
22:05You know?
22:05But the idea is kind
of interesting, right?
22:07It's like what if files had some
more structure and what if you still
22:10considered something, an atomic unit,
but then it had pieces of it that
22:15weren't just uninterpretable bites.
22:17And I think that's like, the
path dependent, way that we've
22:20ended up where we are today.
22:22That makes sense.
22:23So going back to the sync engine
implementation did the Python process
22:28back in the day did that mostly index
all of the files and then actually
22:34send across the actual bites probably
in some chunks, across the wire?
22:39Or was there some more intelligent
and diffing happening client side
22:45that you would only send kind of the
changes across the wire and how do I
22:50need to think about what is a change
when I'm dealing with like a ton of
22:55bites before and a ton of bites after?
22:57Yeah.
22:58It's really, really good questions.
22:59I think maybe like the first
starting point is that like files
23:03in Dropbox were stored, just broken
up into four megabyte chunks.
23:07And that was just a decision at the
very beginning to pick some size.
23:11And on the server, the way that those
chunks were stored is that they,
23:15each four megabyte chunk was stored
by key to by its shot to 56 hash.
23:20So we would assume that
those are globally unique.
23:23So then if you had the same copy
of a bunch of file, or you had
23:27a file copied many times in your
Dropbox, we would only store it once.
23:31And that would just happen
organically because we would say
23:34like, okay, I looked at this file,
it has three chunks A, B, and C.
23:39And then the client would ask the
server, do you have A, B, and C?
23:43Like the server would say, yes, I have
B and C already, please send A, then we
23:47would upload A. so there was already like
at the file level there was this like
23:52kind of very coarse grained Delta sync.
23:56at the four megabyte chunk layer.
23:58and then the kind of, it's funny,
these things evolve, right?
24:01Like then the next thing we layered on
up top was that in that setting where
24:05you decided B and C were there already
and you needed to upload a then with
24:09a, the desktop client could use rsync
to know that there was previously a
24:15prime and do a patch between the two
and then send just those contents.
24:19the kind of thing that was pretty
interesting is that a lot of the content
24:23on Dropbox was very incompressible
stuff like video, images, so the
24:29benefits of deduplication both
across users or even within a user.
24:34And the benefit of like rsync was not
actually as much as one might think,
24:40at least from the like, terms of
bandwidth going through the system.
24:43It wasn't that reductive because a lot of
this content was just kind of unique and
24:48not getting updated in small patches.
24:51And on your server side, blob store, now
that you had those hashes for those four
24:56megabyte chunks, that also means that you
could probably deduplicate some content
25:02across users, which makes me think of
all sorts of other implications of that.
25:09When do you know it's
safe to let go of a junk?
25:12do you also now know that, you
could kind of go backwards and
25:16say like, oh, from this hash, we
know this is sensitive content.
25:20And have some further implications
for, whatever we don't need to go too
25:25much into depth on that now, but, yeah.
25:28I'm curious like how you thought
of those design decisions and
25:32the possible implications.
25:34Yeah.
25:34Yeah, for the first one yeah,
like distributed garbage collection
25:38was a very hard problem for us.
25:39We called it vacuuming and in terms
of making Dropbox economics work out
25:44of, like, when we couldn't afford to
keep a lot of content that was deleted
25:48that we couldn't charge users for.
25:50So that was you know, there's all
additional complexity where different
25:54users would have like the ability to
restore for different periods of time.
25:58So we would say like, anything
that's deleted, it doesn't actually
26:01get deleted for 30 days or a year
or whatnot based on their plan.
26:05so then, yeah, like having to do
this like big distributed mark and
26:09sweep garbage collection algorithm
across hundreds of petabytes,
26:14exabytes of content that was something
that we had to get pretty good at.
26:18And when we designed Magic Pocket,
where we, implemented S3 in-house, we
26:23had specific primitives for making it a
little bit easier to avoid race conditions
26:28where like, if a file was deleted.
26:31And we decided that no
one needed it anymore.
26:34But then just at that point in time,
someone uploads it again, making sure
26:38that we don't accidentally delete it.
26:40So that was like, yeah,
definitely a very tricky problem.
26:43And I think in retrospect this is like
an interesting design exercise, right?
26:48And that if deduplication wasn't actually
that valuable for us, we could have
26:52eliminated a lot of complexity for this
garbage collection by not doing it right.
26:58I think for the second thing, yeah.
26:59So at the beginning when Dropbox started,
if you had a file with A, B and C and you
27:06uploaded it, it would just check, does
A, B and C exist anywhere in Dropbox?
27:11And, that got changed over time
to be does do you as your user
27:17have access to A, B, and C?
27:19And you know, 'cause otherwise you could
use this for all types of purposes, right?
27:24To see if there exists some
content anywhere in Dropbox.
27:27And, that was something where we
would in the case where the user was
27:32uploading A, B, and C, say none of them
were present in their account, we would
27:38actually force them to upload it, incur
the bandwidth for doing so, and then
27:42discard it if B and C existed elsewhere.
27:46Yeah.
27:46Very interesting.
27:47I mean, this would be an interesting
rabbit hole just to go down just the
27:50kind of second order effects of that
design decision, particularly at
27:54the scale and importance of Dropbox.
27:57But maybe we save that for another time.
27:59So going back to the sync engine, now that
we have a better understanding of, how it
28:04worked in that shape and form back then.
28:07You've been already mentioning before,
like as things as usage went through
28:12the roof, all sorts of different
usage scenarios also expanded.
28:17you had all sorts of more esoteric
ways, how you didn't kind of even think
28:22before that it would be used this way.
28:25Now all of that came to light.
28:28I'm curious which sort of, helper
systems you put in place that you could
28:33even have a grasp of what's going on
since a part of the trust that Dropbox
28:39owned or that earned over time, was
probably also related to privacy.
28:44So you, you couldn't just like read
everything that's going on in someone's
28:49system, so you're probably also relying
to some degree on the help of a user
28:55that they like send something over.
28:57Yeah.
28:57Walk me through like the evolution
of that and that you, like as
29:02an engineer, if there's a bug
reproducing that bug is everything.
29:07So walk me through that process.
29:09Yeah, and you know, like we had a very
strict rule, right, where it just,
29:13we do not look at content, right?
29:15and so that was the thing when
debugging issues, the saving grace is
29:20that for most of the issues we saw.
29:22They were more metadata issues around
like sync, not converging or sync, getting
29:28to the client thinking it's in sync
with the server, but them disagreeing.
29:32so we had a few pretty,
yeah, like pretty interesting
29:35supporting algorithms for this.
29:37So one of them was just simple like
hang detection, like making sure, like
29:41if, when should a client reasonably
expect that they are in sync?
29:45And if they're online and if
they've downloaded all the recent
29:49versions and things are getting
stuck, why are they getting stuck?
29:53So are they getting stuck because
they can't read stuff from the
29:55server, either metadata or data?
29:57Are they getting stuck because they
can't write to the file system and
30:00there's some permission errors?
30:02So I think having very fine-grained
classification of that and having the
30:06client do that in a way that's like not
including any private information and
30:11sending that up for reports and then
aggregating that over all of the clients
30:14and being able to classify was a big part
of us being able to get a handle on it.
30:20And I think this is just generally
very useful for these sync engines.
30:23the biggest return on investment we
got was from consistency checkers.
30:27So part of sync is that there's the same
data duplicated in many places, right?
30:33Like, so we had the data that's
on the user's local file system.
30:37We had all of the metadata that we stored
in SQLite or we would store like what
30:41we think should be on the file system.
30:43We would store what the latest
view from the server was.
30:46We would store things that were
in progress, and then we have
30:49what's stored on the server.
30:50And for each one of those like hops, we
would have a consistency checker that
30:55would go and see if those two matched.
30:57And those would, that was like the
highest return on investment we got.
31:02Because before we had that, people
would write in and they would
31:05complain that Dropbox wasn't working.
31:07And until we had these consistency
checkers, we had no idea the
31:10order of magnitude of how
many issues were happening.
31:13And when we started doing
it, we're like, wow.
31:16There's actually a lot.
31:18So a consistency check in this regard
was mostly like a hash over some
31:22packets that you're sending around.
31:24And with that you could verify, okay, up
until like from A to B to C to D, we're
31:30all seeing the same hash, but suddenly
on the hop from D to E, the hash changes.
31:35Ah-huh.
31:36Let's investigate.
31:37Exactly.
31:38And so, and to do that in a way
that's respectful of the users,
31:42even like resources on their system.
31:45Like we wouldn't just go and blast their
CPU and their disc and their network to go
31:50and like turn through a bunch of things.
31:51So we would have like a sampling
process where we like sample a random
31:54path in the tree and the client
and do it the same on the server.
31:58we would have stuff with like Merkle
trees and then when things would diverge,
32:02we would try to see like, is there a way
we can compare on the client and see like
32:07for example one of the kind of really
important, goals for us as an operational
32:12team was to have like the power of zero.
32:14I think it might be from AWS or something.
32:17My co-founder James, has
a really good talk on it.
32:19but we would want to have a metric of
saying that the number of unexplained
32:25inconsistencies is zero and one 'cause.
32:28Then the nice thing right, is that
if it's a zero and it regresses,
32:31you know that it's a regression.
32:33If it's at like fluctuating at like 15
or like a hundred thousand and it kind
32:38of goes up by 5%, it's very hard to know
when evaluating a new release, right?
32:42That like that's actually safe or not.
32:44so then that would mean that whenever we
would have an inconsistency due to a bit
32:49flip, which we would see all the time
on client devices, then we would have to
32:55categorize that and then bucket that out.
32:57So we would have a baseline.
32:59Expectation of how many bit flips there
are across all of the devices on Dropbox.
33:03And we would see that that's
staying consistent or increasing or
33:06decreasing, and that the number of
unexplained things was still at zero.
33:10now let's take those detours
since you got me curious.
33:13Uh, what would cause bit
flips on a local device?
33:16I think a few, few causes, one of them
is just that in the data center, most
33:20memory uses error correction and you have
to pay more for it, usually have to pay
33:24more for a motherboard that supports it.
33:26at least back then.
33:27now like on client
devices we don't have that.
33:30So this is a little bit above
my pay grade for hardware cosmic
33:34rays or thermal noise or whatever.
33:36But memory is much more
resilient in the data center.
33:40I think another is just that, storage
devices are very greatly in quality.
33:44Like your SSDs and your hard drives are
much higher quality inside the data
33:49center than they are on local devices.
33:51And so.
33:53You know, there's that.
33:54it also could be like I had
mentioned that people have all
33:57types of weird configurations.
33:59Like on Mac there are all these
kernel extensions on Windows, there's
34:03all of these mini filter drivers.
34:05There are all these things
that are interposing between
34:07Dropbox, the user space process
and writing to the file system.
34:11And if those have any memory safety
issues where they're corrupting memory
34:15'cause of the written in archaic C
you know, or something that that's
34:19the way things can get corrupted.
34:20I mean, we've seen all types of things.
34:22We've seen network routers get
having corrupting data, but usually
34:26that fails some checksum, right?
34:28Or we've seen even registers on CPUs
being bad where the memory gets replaced
34:33and the memory seems like it's fine, but
then it just turns out the CPU has its
34:38own registers on CHIP that are busted.
34:40And so all of that stuff I
think just can happen at scale.
34:44Right.
34:45that makes sense.
34:45And I'm happy to say that I've hadn't
had yet to worry about flip bits, whether
34:51it's being for storage or other things,
but huge respect to whoever had already
34:56to, tame those parts of the system.
34:59So, you mentioning the consistency check
as probably the biggest lever that you
35:05had to understand which health stage
your sync engine is in the first place.
35:11was this the only kind of metric and
proxy for understanding with how well
35:18the syn system is working or were
there some other aspects that gave
35:22you visibility both macro and micro?
35:26Yeah, I mean, I think this yeah,
the kind of hangs, so like knowing
35:30that something gets to a sync state
and knowing the duration, right?
35:33So the kind of performance of that
was one of our top line metrics.
35:38And the other one was
this consistency check.
35:40And then first specific
like operations, right?
35:43Like uploading a file, like how much
bandwidth are people able to use
35:47because for like, people wanted to
use Dropbox, but, and upload lots,
35:53like huge data, like huge number of
files where each file is really large.
35:57And then they might do it on in
Australia or Japan where they're
36:01far away from a data center.
36:03So latency is high, but bandwidth
is very high too, right?
36:06So making sure that we could
fully saturate their pipes and all
36:09types of stuff with debugging.
36:12Things in the internet, right?
36:13People having really bad
routes to AWS and all that.
36:16so we would track things like that.
36:18I think other than that it was
mostly just the usual quality stuff,
36:20like just exceptions and making
sure that features all work.
36:25I think when we rewrote this system
and we, designed it to be very correct.
36:30We moved a lot of these things into
testing before we would release.
36:35So we this is I think one of the, to
jump ahead a little bit, we designed,
36:38decided to rewrite Dropbox's sync engine
from this big Python code base into Rust.
36:45And one of the specific design decisions
was to make things extremely testable.
36:49So we would have everything be
deterministic on a single thread,
36:53have all of the reads and rights
to the network and file system,
36:56be, through a virtualized API.
36:59So then we could run all of these
simulations of exploring what would
37:03happen if you uploaded a file here and
deleted it concurrently and then had a
37:08network issue that forced you to retry.
37:10And so by simulating all of those in
ci, we would be able to then have very
37:14strong in variance about them that
knowing that like a file should never
37:18get deleted in this case, or that
it should always converge, or things
37:21like the sharing that this file should
never get exposed to this other viewer.
37:26I think like the, having much, like
having stronger guarantees was something
37:31that we only could really do effectively
once we designed the system to make
37:36it easy to test those guarantees.
37:38Right.
37:39That makes a lot of sense.
37:40And I think we're seeing more
and more systems, also in the
37:43database world, embrace this.
37:45I think TigerBeetle is,
is quite popular for that.
37:49I think the folks at Torso are
now also embracing this approach.
37:54I think it goes under the
umbrella of simulation testing.
37:57that sounds very interesting.
37:58Can you explain a little bit more how
maybe in a much smaller program would
38:03this basically be Just that every
assumption and any potential branch,
38:08any sort of side effect thing that might
impact the execution of my program.
38:13Now I need to make explicit and it's
almost like a parameter that I put into
38:19the arguments of my functions and now I
call it under these circumstances, and I
38:25can therefore simulate, oh, if that file
suddenly gives me an unexpected error.
38:31Then this is how we're gonna handle it.
38:33Yeah, exactly.
38:34So it's like and there's techniques
that like the TigerBeetle folks, like
38:38we, we do this at Convex in rust with the
right, like abstractions, there's like
38:42techniques to make it not so awkward.
38:45But yeah, it is like this idea of like,
can you pin all of the non-determinism in
38:50the system can, whether it's like reading
from a random number generator, whether
38:54it's looking at time, whether it's reading
and writing to files or the network.
38:58Can that all be like pulled out so
that in, production it's just using the
39:04random AP or the regular APIs for it.
39:07so there's like for any of these
sync engines, there's a core
39:10of the system which represents
all the sync rules, right?
39:13Like when I get a new file
from the server, what do I do?
39:16You know, if there's a concurrent
edit to this, what do I do?
39:19and that I. Core of the code is often
the part that has the most bugs, right?
39:23It has the, it doesn't think about
some of the corner cases or if
39:27there are errors or needs retries
or doesn't handle concurrency.
39:30It might have race conditions.
39:32So the kind of, I think the core idea
for determination, determin deterministic
39:36simulation testing is to take that core
and just kind of like pull out all of the
39:43non-determinism from it into an interface.
39:45So time randomness, reading and
writing to the network, reading
39:49and writing to the file system, and
making it so that in production,
39:52those are just using the regular APIs.
39:55But in a testing situation,
those can be using mocks.
39:59Like they could be using things
that for a particular test
40:02and wants to test a scenario or
setting it up in a specific way.
40:06Or it could be randomized, right?
40:09Where it might be that reading from
Like time, the test framework might
40:14decide pseudo randomly to advance it
or to keep it at the current time or
40:18might serialize things differently.
40:21And that type of ability to have random
search explore the state space of
40:27all the things that are possible is
just one of those like unreasonably
40:30effective ideas, I think for testing.
40:33And then that like getting a
system to pass that type of
40:37deterministic simulation testing.
40:39It's not at the threshold of having
formal verification, but in our
40:42experience it's pretty close and with
a much, much, smaller amount of work.
40:48And you mentioning
Haskell at the beginning?
40:50I still remember when I, after a a lot of
time having spent writing unit tests in
40:55JavaScript and I, back then, in the other
order, I first had JavaScript and then I
41:00learned Haskell, and then I found quick
test and was quick test, Quick Check.
41:05which one was it?
41:06I think it was Quick check, right?
41:07Well, right.
41:08So I found Quick Check and I could express
sort of like, Hey, this is this type.
41:13It has sort of those aspects to it,
those invariants and then would just
41:18go along and test all of those things.
41:20Like, wait, I never thought
of that, but of course, yes.
41:23And then you combine those and you
would get way too lazy to write unit
41:27tests for the combinatorial explosion
of like all of your different things.
41:32And then you can say, sample it
like that, and like, focus on this.
41:36and so I actually also, started
embracing this practice a lot more in the
41:40TypeScript work that I'm doing through
a great project called Prop Check.
41:45and that is, picking up the same
ideas and for particularly those
41:52sort of scenarios where, okay,
Murphy's Law will come and haunt you.
41:56this is in distributed systems.
41:58That is typically the case.
42:00Building things in such a way where
all the aspects can be, specifically
42:05injected and the, the sweet spot.
42:07If you can do so still in an ergonomic
way, I think that's the way to go.
42:13It's so, so valuable, right?
42:15And yeah.
42:15And yeah, the ability to, for prop tasks,
for quick check for all of these to
42:20also minimize is just magical, right?
42:23Like it comes up with this crazy
counter example and it might be
42:27like a list with 700 elements, but
then is able to shrink it down to
42:31the, like, real core of the bug.
42:33It's magic, right?
42:35And you know, I mean, I think
this is something like, you know.
42:38A totally different theme, right?
42:40Like one thing at Convex we're exploring
a lot is like coding has changed a lot
42:44in the past year with AI coding tools.
42:46And one of the things we've observed
for getting coding tools to work very
42:50well with Convex is that these types
of like very succinct tests that can
42:54be generated easily and have like a
really high strength to weight or power
42:59to weight ratio are just really good
for like autonomous coding, right?
43:03Like, if you are gonna take like
cursor agent and let it go wild,
43:06like what does it take to just let it
operate without you doing anything?
43:10It takes something like a prop test
because then it can just continuously
43:13make changes, run the test, and not know
that it's done until that test passes.
43:18Yeah, that makes a lot of sense.
43:20So let's go back for a moment to the
point where you were just transitioning
43:25from the previous Python based sync
engine to the Rust based sync engine.
43:32So you're embracing simulation
testing to have a better sense of
43:36like all the different aspects that
might influence the outcome here.
43:41walk me through like how you, went about.
43:44Deploying that new system.
43:46Were there any sort of big headaches
associated with migrating from the
43:52previous system to the new system?
43:54since you, for everything, you
had sort of a defacto source
43:57of truth, which are the files.
43:59So could you maybe just forget everything
the old system has done and you just
44:04treat it as like, oh, the, user would've
just installed this fresh, walk me
44:09through like how you thought about
that since migrating systems on such
44:14a big scale is typically, quite dread
44:17Yeah, dreadsome is, yeah.
44:19appropriate word.
44:20I think one of the biggest challenges was
that by design we had a very different
44:26data model for the old sync engine.
44:29We called it sync engine Classic.
44:31Affectionately.
44:32And then we had for Nucleus was a new one.
44:34Nucleus had a very different data model,
and the motivation for that was that
44:40sync engine Classic just had a ton of
possible states that were illegitimate.
44:46It could, if you had like a, the server
update a file and the client update
44:50a file, but then a shared folder gets
mounted above it, things could get
44:54into all of these really weird states
that were legal but would cause bugs.
45:00And then I think that was like one
of the big guiding principles more
45:04than even just like Rust or Python,
was just like designing what states
45:09should the system be allowed to be
in and design away everything else,
45:14make illegal states unrepresentable.
45:17And so that, what that then
meant is once we had that.
45:21When we needed to migrate, we had a long
tail of really weird starting positions.
45:27So where you basically realized, okay,
this system is in this state A, how the
45:33heck did it ever get into that state?
45:35And B, what are we gonna do about
it now where we can basically,
45:40it's like from a mapping function,
this is like invalid input.
45:44So can you explain a little bit of like,
how you constrained the space of, and how
45:49you designed the space of, legitimate,
valid states and what were some of the,
45:56if you think about this as like a big
matrix of combinations, what are some
46:00of the more intuitive ones that were,
not allowed that you saw quite a bit?
46:06Yeah, so I think part of the difficulty
for Dropbox, like as syncing things
46:13from the file system is that file
system APIs are really anemic.
46:17File system aPIs don't have transactions.
46:19They don't things can get
reordered in all types of ways.
46:23So we would just read and write to
files from the local file system, and
46:26we would use file system events on
Mac, we would use the equivalent on
46:30Windows and Linux to get, updates.
46:32But everything can be reordered
and racy and everything.
46:36So one, like common invariant
would be that if you have a
46:40directory you know, like files
have to exist within directories.
46:44If a file exists, then it's
parent directory exists.
46:48And like simultaneously, if you
delete a directory, it shouldn't
46:51have any files within it.
46:53And that invariant guarantees and
that the file system is a tree.
46:57Right?
46:58And then we, it's very easy to come
up with settings, with reads from the
47:03local file system where if you just
naively take that and write it into
47:07your SQLite database, you will end up
with data that does not form a tree.
47:12and then especially even with like
I know it's being unique, right?
47:16Like if I move a file from A to B, then
I might observe the add for it at B
47:23way before the delete at B or I might
observe it vice versa, where the file
47:28is transiently gone and disappeared and
we definitely don't wanna sync that.
47:31and then with directories, if I have
like a, as a directory and then B as
47:37a directory, and then I move it's, I
could observe a state where A moves into
47:43B, which then without doing the right
bookkeeping, might introduce a cycle in
47:48the graph and a cycle for directories
would be really bad news, right?
47:52so all of these invariants were things
that the file system APIs, they don't
47:57respect, even though the file system
internally has these invariants, right?
48:01You cannot create a direct
recycle on any file system.
48:05Definitely.
48:05I mean certainly without root And
all of these invariants exist but
48:09are not observable through the APIs.
48:12And so then we sync Engine Classic
would get into the state where it's
48:16like local SQLite file would have
all types of violations like that.
48:20So then how do we read the tea
leaves of like the database is in
48:24a really weird state we can't lose.
48:26And to go back to, I think what you had
talked about at the beginning of this was
48:30that we always had the nuclear option of
dropping all of our local state and doing
48:36a full resync from the files themselves.
48:39But then the problem is that we
would entirely lose user intent.
48:42So if, for example, I was offline for
a month and I had a bunch of files,
48:48and then during that month other
people in my team deleted those files.
48:53If I came back online and didn't have
my local database, we would have to
48:58recreate those files and people would
complain about this all the time because.
49:03They would delete something and wanna
delete it, and then Dropbox would
49:05just randomly decide to resurrect it.
49:07So those types of decisions we, we tried
to avoid that as much as possible, but
49:12then that meant having to look at a
potentially really confusing database and
49:17read what the user intent might have been.
49:19Right.
49:20I wanna dig a little bit more
into the topic of user intent.
49:24Since with Dropbox you've built a sync
engine very specifically for the use
49:30case of file management, et cetera, where
user intent has a particular meaning that
49:36might be very different from moving a
cursor around in a Google Docs document.
49:41So can you explain a little bit, what
are some of the, common scenarios of, and
49:47maybe subtle scenarios of user intent,
when it comes to the Dropbox design space?
49:55Yeah, totally.
49:56and I think the for regular
things like say editing files.
50:01I think we saw that like people just
generally did not, maybe because
50:06of the way the system was even
its capabilities, people did not
50:09edit the same files all too often.
50:12So maintaining user intent when file,
when everyone is online, just kind of
50:17taking last writer wins Where I think
user intent became very interesting is
50:21if someone went offline, like they're on
an airplane before wifi and airplanes
50:27And they worked on their document and
someone else worked on the same time.
50:31In that case, we observed that users
always wanted to see the conflicted
50:35copy and that they wanted to get
the opportunity to say, like, I did.
50:39I put in a lot of effort into working
on this when I was on the plane.
50:43Someone else, put in probably a similar
amount of effort when they were online and
50:48you know, so last writer wins policies.
50:50There violated user expectations
quite a lot because either a person
50:55had to win and then the person
who lost would be really upset.
50:58so I think those were pretty interesting.
51:00I think with Moose, like with more
metadata operations I think people
51:05were a little bit more permissive.
51:06Like if I moved something from one
folder to another, another person
51:10moved it to a different folder.
51:12having it just converged on
something as long as it converges.
51:15We observed it being like people
didn't worry about it too much.
51:18I think the place where user
intent is really interesting
51:21with moves is with sharing.
51:23So I think thinking about this
from like the distributed systems
51:26perspective on causality, there would
be like someone might have like,
51:31I dunno, their HR folder, right?
51:33And I don't know, like, let's say that
someone is transferring to the HR team is
51:38they're getting added to the HR folder.
51:41But then say before they were
on the team, they were on a
51:44performance improvement plan.
51:46So then the administrator for HR
would delete that file, make sure it's
51:50deleted, and then add them to the folder.
51:54And so their user intent is
express in a very specific
51:59sequencing of operations, right?
52:01That like this causally depended on this.
52:04I would not have invited 'em to the folder
unless the delete was stably synced.
52:08And that making sure that gets
preserved throughout the system,
52:12even when people are going online
and offline and everything is a very
52:16hard distributed systems problem.
52:18Right.
52:18and it was intimately related
with the details of the product.
52:22Right.
52:23yeah.
52:23How did you capture that causality
chain of events since you probably also
52:29couldn't quite trust the system clock?
52:32How did you go about that?
52:34Yeah, this became even
more difficult, right?
52:36Where file system metadata was partitioned
across many shards in the database.
52:41So then we ended up using something like
Lamport timestamp, where every single
52:45operation would get assigned a timestamp.
52:47And those timestamps were usually
only reading and writing to their
52:50particular shard and for whatever
timestamp the client had observed.
52:55But then in these cases where there
were potentially cross shard, they
52:59weren't transactions, but like causal
dependencies, we would be able to say
53:03like, the operation to mount this or
to add someone to the shared folder
53:07and there them mounting it within
their file system has to have a higher
53:11timestamp than any right within that or.
53:15Rights including deletes.
53:16so then that way when the client is
syncing it would be able to know that when
53:21I am merging operation logs across all of
the different shards, I need to assemble
53:26them in a causally consistent order.
53:29And that would then respect all
of these particular invariants.
53:33Right.
53:34So you having thought through those
different scenarios for Dropbox and
53:38made very intentional design decisions
that, for example, in one scenario
53:43last writer wins is not desirable.
53:46Since that might lead to a very sad
person stepping off the plane because
53:51all of your data is suddenly gone,
or the other person's data is gone.
53:55so you make very specific
design trade-offs here when it
53:58comes to somehow squaring the
circle of distributed systems.
54:03Which sort of advice would you have for
application developers or people even
54:08who are sitting inside of a company
and are now thinking about, oh, maybe
54:12we should have our own Dropbox style,
linear style sync engine internally.
54:17Which sort of advice would
you give them when they Yeah.
54:21Start thinking this through to the detail.
54:23Yeah, I'll talk through kind of how we
structured things at Dropbox to be able
54:28to navigate these types of problems.
54:30And I think the patterns
here, can be quite general.
54:33I think what we ended up with was
that like thinking like distributed
54:37systems syncing is hard, right?
54:40So we would have the kind of base layer
of the sync protocol and how state
54:45gets moved around between the clients
and the servers and all the shards.
54:49We would have very strong
consistency guarantees there.
54:52So we would not use any of the
knowledge of the product at that layer.
54:57So from a, like thinking of Dropbox
in the file system as a CRDT.
55:03Dropbox allows, like moves
to happen concurrently.
55:06It ha allows you to add something
while another thing is happening.
55:10But at the protocol level,
we kept things very strict.
55:12We kept them very close to being
serializable that every view of the
55:17system was identified by a very small
amount of state, like a timestamp.
55:21And that would fully determine the
state of the system and like the
55:24amount of entropy in that was very low.
55:26And then whenever you are modifying
it, you would say, here's what I expect
55:30the data to be, and if it doesn't match
exactly, it will reject the operation.
55:34And then by doing it, structuring things
in that way, then we made it very easy
55:39for product teams and for even us
working on sync to embed all of these like
55:45looser more product focused requirements.
55:47They also may wanna change over time
into the end points, like layered on top.
55:51So every time we wanted to change a policy
on how like a delete reconciles with an.
55:57You know, add for a folder or something.
55:59We didn't have to solve any distributed
systems problems to do that.
56:03So I think that like pattern of saying
that, like is there a good abstraction?
56:07Is there something that is like very
powerful that could solve a large
56:11class of problems, doing that well at
the lowest layer and then potentially
56:16weakening the consistency above it.
56:19I actually really like the Rocicorp
folks have a really great description of
56:24their consistency model for Replicache of
it being like session plus consistency.
56:29And it's like a very similar idea
where like when we build things on
56:34a platform, we may as our with our
product hats on, like want users to
56:38not have to think about conflicts and
merging and all that in a lot of cases.
56:42But those decisions might be
very particular to our app.
56:45And that's something that holds
for everything on the platform.
56:48And then there's always a way to
embed those decisions onto, say.
56:52Session consistency and Replicache
or serializability and other systems.
56:57And so I think that's like that
separation of concerns I
57:00think is something that can
apply to a lot of systems.
57:04Right.
57:04So maybe we use this also as a transition
to talk a bit more about what you're
57:09now designing and working on Convex.
57:12What were some of the key insights that
you've taken with you from Dropbox that
57:19ultimately led to you co-founding Convex?
57:22Yeah, when we first were starting
Convex we were looking at how apps
57:27are getting built today, right?
57:28Like web apps are easier
to build than ever.
57:32Even in 2021, it's incredible
how much, like more productive
57:37that compared to 10 years before.
57:39Right.
57:40It was, and I think we noticed that
the hard part for so many discussions
57:45was managing state and like how
state propagates I think it was from
57:50the Riffle paper right, on how like
so many issues in app development
57:54are kind of database problems in
disguise and that how techniques
57:58from databases might be able to help.
58:00So with Convex we were saying like, well
if we start with the idea of designing
58:05a database from first principles, can we
apply some of those database solutions
58:10to things across the whole stack?
58:12So say for example, when I'm reading
data from it within in my app, I have
58:17all of these React components that are
all reading different pieces of data.
58:21It'd be really nice if all of them
just executed at the same timestamp
58:24and I never had to handle consistency
issues where one component knows
58:29about a user or the other one doesn't.
58:31Similarly, like why isn't it possible
to be that I just use query across
58:36all my components and they just all
live update whenever I read anything,
58:40it's a automatically reactive.
58:42So those were some of the like
the initial kind of thought
58:46experiments for what led to Convex.
58:48I think the other one that was
really motivated from our time at
58:52Dropbox and I think is like kind
of a both a blessing and a curse.
58:56It's kind of like one of the key
design decisions for Convex is
58:59that Convex is very opinionated
about there being a separation
59:03between the client and the server.
59:05So we saw this at Dropbox where they
were just different teams, right?
59:09And you know, as we've seen with like
even the origin of GraphQL, right?
59:13Like that ability to
decouple development between.
59:16teams working on user facing features
and the way that the data fetching
59:20is implemented on the backend,
it's gonna be really powerful.
59:23And so kind of the kind of thought
experiment with Convex is, can we
59:27maintain a very strong separation while
still getting like live updating, while
59:32still getting a really good ergonomics
for both consuming data on the client
59:36and like fetching it on the server.
59:39Right.
59:39So yeah, walk me through a little bit
more through the evolution of Convex then.
59:44And so, in, in terms of all the other
options that are out there in terms
59:49of state management and I think most
what applications are using is probably
59:55something that at least to some degree is
somewhat customized and hand rolled and
1:00:01comes with its own huge set of trade-offs.
1:00:05Help me better understand sort
of the, where you mentioned the,
1:00:08opinionated nature of Convex.
1:00:11What are the, benefits of that?
1:00:13What are the downsides of
that and other implications?
1:00:16Yeah, so when you write an app
on Convex we can use maybe
1:00:20like a basic to do app, right?
1:00:22The linear clone, everyone does.
1:00:24you write endpoints like
you might be used to, right?
1:00:26Where it's like list all the to-dos in a
project like update a to-do in a project.
1:00:31and those get pushed as your
API to your Convex server.
1:00:35the implementations of that API can
then read and write to the database
1:00:39and Convex has like a, kinda like Mongo
or Firebase, like API for doing so.
1:00:44I think the main benefit then of
Convex relative to more traditional
1:00:48architectures is that if you're on the
client, the only thing you need to do
1:00:53is call the, like the use query hook.
1:00:56You're saying like, I am looking at a
project I just do use like use query
1:01:01list tasks and project that will then
talk to the server, run that query, but
1:01:07then also set up the subscription and
then whenever any data that that query
1:01:12looked at changes, it will efficiently
determine that and then push the update.
1:01:16So part of what is like been nice
with Convex is that you are getting
1:01:21a client that has a web socket
protocol, it has a sync engine built in.
1:01:26You're getting infrastructure for
running JavaScript at scale and for
1:01:30handling sandboxing and all of that.
1:01:32And then you're also getting a
database, which is, you know.
1:01:36One, supporting transactions
or reading and writing to it.
1:01:39But then it also supports this
efficient like being able to subscribe
1:01:43on, I ran this query, this query
just ran a bunch of JavaScript.
1:01:47It looked at different rows
and it ran some queries.
1:01:51the system will automatically efficiently
determine if any right overlaps with that.
1:01:56So the combination of all of those
things is like part of the benefit of
1:01:59Convex, you just write TypeScript and
you write it in a way that's, feels
1:02:03very natural and everything just works.
1:02:07And I think some of the like downsides is
that it's it is a different set of APIs.
1:02:13it's not using sql, it's doing
things a little bit differently
1:02:16than they've been done before.
1:02:18yeah, it's like kind of interesting
even today to see like what you know.
1:02:23Talking about AI code gen, right?
1:02:24Like models have been trained,
pre-trained on this huge corpus
1:02:28of stuff on the internet.
1:02:29And when are they good at
adopting new technologies?
1:02:32Technologies that might be
after their knowledge cutoff.
1:02:35And when are they like it's better just
to stick to things that they know already.
1:02:39Right.
1:02:39So what you've mentioned before where you
say, Convex is rather opinionated for me.
1:02:45in let's say five years ago,
I might've been much more of
1:02:49like, oh, but maybe there's a
technology that's less opinionated
1:02:53and I can use it for everything.
1:02:54But the more experience I got,
the more I realized no, actually.
1:02:58I want something that's very
opinionated, but opinionated
1:03:02and I share those opinions.
1:03:04Those are exactly for my use case.
1:03:06So I think that is much better.
1:03:08This is why we have different technologies
and they are great for different
1:03:12scenarios, and I think the more a
technology tries to say, no, we're,
1:03:17we're best for everything, I think the,
less it's actually good at anything.
1:03:23And so I greatly appreciate you
standing your ground and saying
1:03:26like, Hey, those are, our design,
decisions that we've made.
1:03:31And those are the use cases where,
you'd be really well served building
1:03:35on top of something like Convex.
1:03:37And, I particularly like for now where
TypeScript is really the, default
1:03:42language to build full stack applications.
1:03:45And it's also increasingly
becoming the default for.
1:03:48ai, based applications as well.
1:03:51And AI based systems speak type
script, just as well as English.
1:03:57And given that Convex makes
that full stack super easy.
1:04:02And also I think you can, when
you build local-first apps, it can
1:04:07sometimes get really tricky because
you empower the client so much.
1:04:11You give the client so much
responsibility and therefore there's
1:04:15many, many things that can go wrong.
1:04:17And I think Convex therefore, takes
a more conservative approach and says
1:04:21like, Hey, everything that happens on
the server is like highly privileged
1:04:25and this is your safe environment.
1:04:27And the client will try to give
you the best user experience and
1:04:31developer experience out of the box.
1:04:33But the client could be in a
more adversarial environment.
1:04:37And I think those are
great design trade offs.
1:04:40So, I think that is a fantastic foundation
for tons of different applications.
1:04:45Yeah.
1:04:46talking about some of these
strong opinions being both
1:04:49blessings and curses, right?
1:04:50Like over the past few months, one
thing we've been working on is trying
1:04:54to bridge the gap between those
two points in the spectrum, right?
1:04:58we wrote a blog post on it a few months
ago of like working on what we're calling
1:05:02our like Object sync engine, trying to
take a lot of the principles from more of
1:05:08a local-first type approach of having a
data model that it is synced to the client
1:05:14and the only interaction between the
server and the client is through the sync.
1:05:18And the client then can always render
its UI just looking at the local
1:05:22database and it can be offline.
1:05:24It's also fully describes the
app stage so it can be exported
1:05:28and rehydrated or whatever.
1:05:29it's very interesting design exercise
we've been on to say like, can
1:05:33you structure a protocol on a sync
engine in a way such that the UI
1:05:39is still reading and writing to a
local store that is authoritative.
1:05:43But then that local store is like to kind
of use like an electric SQL terminology is
1:05:47like that is a shape that is some mapping
of a strongly separated server data model.
1:05:52So we still have a client data model
and server data model, which might be
1:05:56owned by different teams and evolve
independently and, we also have that
1:06:01strong separation where the implementation
of the shape is privileged and running
1:06:06on the server and has authorization rules
built in and get the best of both worlds.
1:06:10And we've kind of, we have a like beta
that we've not released publicly thought
1:06:16open, sourced out there, but kind
of a thing where we, I think they're
1:06:19still figuring out like the DX for it.
1:06:21And I think we have something
that like algorithmically works
1:06:24and it's like the protocol works,
but it's like, it's kind of hard.
1:06:28Right.
1:06:28It kind of reminds me a lot of writing
GraphQL resolvers of like saying How do I
1:06:32take the messages table from my chat app?
1:06:35Then under the hood that might be
joining stuff from many different
1:06:39tables and filtering rows, or might
even be doing a full tech search
1:06:43query in another view or something.
1:06:45and coming up with the right
ergonomics to make that feel
1:06:48great for a day one experience.
1:06:50I think something that's like
still we're working on, still
1:06:53kinda like a research project,
1:06:54right?
1:06:54Well, when it comes to data, there is no
free lunch, but I'd much rather to have
1:06:58it be done in the order and sequencing
that you're going through, which is
1:07:03having a solid foundation that I can
trust and then figuring out the right
1:07:09ergonomics afterwards, since I think
there's many, many tools that start with
1:07:14great ergonomics, but later realize that
it's on a built, on a unsound foundation.
1:07:19So when it comes to data, I want a
trustworthy foundation, and I think
1:07:24you're going about in the right order.
1:07:26Hey, Sujay, I've been learning
so much about one of my favorite
1:07:31products of all time, Dropbox.
1:07:33I've learned so much of like how the
sausage was actually made, how it evolved
1:07:39over time and I'm really excited that
you got to share the story today and
1:07:45many me included, got to, learn from it.
1:07:48Thank you so much for taking the
time and sharing all of this.
1:07:51Thanks for having me.
1:07:52This is super, super fun.
1:07:54Thank you for listening to
the localfirst.fm podcast.
1:07:56If you've enjoyed this episode and
haven't done so already, please
1:08:00subscribe and leave a review.
1:08:01Please also share this episode
with your friends and colleagues.
1:08:04Spreading the word about the
podcast is a great way to support
1:08:07it and to help me keep it going.
1:08:09A special thanks again to Jazz
for supporting this podcast.
1:08:13I'll see you next time.