#10 – Matt Wonlaw: cr-sqlite, syncing strategies and incremental view maintenance
00:00So I think it's a lot easier to
develop when your queries are
00:03co located with your components.
00:05So every component is responsible
for getting the data it needs.
00:08And this makes your app like
super composable, right?
00:11Like you can add and remove components
to your app or your component tree.
00:16And because they're responsible
for getting their data, like, you
00:19don't have to do any additional
wiring or prop drilling.
00:21Welcome to the localfirst.fm podcast.
00:24I'm your host, Johannes Schickling,
and I'm a web developer, a
00:27startup founder, and love the
craft of software engineering.
00:30For the past few years, I've been on a
journey to build a modern, high quality
00:34music app using web technologies.
00:37And in doing so, I've been falling down
the rabbit hole of local-first software.
00:41This podcast is your invitation
to join me in that journey.
00:44In this episode, I'm speaking to Matt
Wanlaw, a prolific local-first tool
00:48builder who's behind projects such
as Vlcn, CR-SQLite, and Materialite.
00:53Most recently, Matt also joined
Rocicorp to work on their new product.
00:57In this conversation, we go deep on
his projects covering CRDTs, SQLite,
01:02and incremental view maintenance.
01:04Before getting started, also a
big thank you to Rocicorp and
01:08Expo for supporting this podcast.
01:10And now my interview with Matt.
01:13Hey Matt, welcome to the show.
01:14How are you doing?
01:16Good.
01:16Yeah.
01:17Thanks for having me.
01:17I'm super excited to have you on the show.
01:20You and I have been collaborating
on a few projects now over the
01:23course of the last few years.
01:25Most notably, you've been
working on CR-SQLite and also
01:29Materialite most recently.
01:31But for those of you who don't
know Matt would you briefly
01:35mind introducing yourself?
01:36Yeah um, Matt Wonlaw, and I guess
I've been involved in the local-first
01:41community for about two years now.
01:43It originally started with me trying
to resurrect an ancient project
01:46of mine and trying to add, like,
multiplayer and sync between devices.
01:51And I saw, like, that's actually a pretty
hard problem, and there weren't, like,
01:54many off the shelf solutions, so then I
just, like, dove into this , and that's
01:59where all these projects like CR-SQLite,
Materialite and other things came from.
02:02Yeah, and I think this is also how
you and I have met quite a while ago.
02:07And before that, you've been
working at Facebook for many years.
02:10And even before that, I think you
had your first engineering role
02:13working on some software that by
definition had to be very local.
02:18There was no local-first yet.
02:20Would you mind telling us a little bit
more about what those first apps were that
02:25you've been working on professionally?
02:26Yeah, so like my first job out of
college was at a defense contractor
02:31called Lockheed Martin, but essentially
what we were doing was we were building
02:35software for submarines, so like U.
02:37S.
02:37Navy submarines, and yeah, obviously a
submarine has no internet connection,
02:41everything has to be self contained, but
even beyond that, so the software I built
02:45was for, Like these thick clients for the
workstations that sonar operators used.
02:50And, you know, there, there are servers
on the sub, you know, servers that
02:54provide storage and some that provide
the signal processing and whatnot.
02:58But like an extra requirement was, you
know, a sub is in a hostile environment.
03:03Things can break.
03:04So the work, workstations need to be able
to work even if all the servers were down.
03:08So, you know, they need to
hold the last 24 hours of data.
03:10So the sonar operators can, you know.
03:12If every other system's down, they
can at least work with the last
03:1624 hours of data that they had.
03:18So that, you know, I guess was my
first brief introduction to local
03:23apps that eventually do need to do
some sort of syncing once connections
03:26are re established and, you know,
reconcile the changes that were made.
03:29That's fascinating.
03:30Have you learned some patterns back then
that you still see out in the wild today?
03:36Or do you think the approach that
you've taken back then is entirely
03:39different compared to today?
03:41Uh, Back then the data
was very well segmented.
03:44So there was like never really conflicts.
03:46But event sourcing was the main pattern.
03:49So just have an event log of
everything that has occurred.
03:52And when connectivity is
reestablished, do you know.
03:55Play that event log forward
from, you know, the time you lost
03:57connection to the current time.
03:59Yeah.
04:00I mean, event sourcing, I feel
like is still a super active topic.
04:04I'm actually looking into it much more
closely myself for the use case of
04:09building Overtone, and I'm also exploring
to embrace it a bit more for like syncing
04:14and migration strategy for LiveStore.
04:16But I also think that there might
be a renaissance for event sourcing.
04:20We had it for a while with Redux in a
much more self contained space, but I
04:26think there's a, that's an interesting
topic for research and production as well.
04:31Curious to hear your thoughts
on event sourcing and how it
04:34could fit into local-first.
04:36I mean, I really like it.
04:37, you have a source of truth,
like of all the facts, all
04:41the things that ever happened.
04:43Whereas, you know, a regular
app or state or database that
04:46doesn't do event sourcing.
04:48You're losing all the
things that happen, right?
04:51And you just have this snapshot of
what things are right at this instant.
04:55So yeah there's no way to rewind.
04:57There's no way to, merge other people's
changes if they did come in the past.
05:01, so I guess event sourcing, you know,
probably fell out of favor in some
05:05ways because, you know, storage
requirements, like are you really
05:08going to store all the event logs?
05:10And then also, you know,
processing those event logs.
05:12You do eventually need to process
them into some you know, snapshot of
05:16state so your app can work with it.
05:18So I think, you know, people seeing
that, oh, I have to process the events
05:20and turn it into like these tables
anyway, like why even do the event logs?
05:23Why not just mutate in place?
05:25Yeah, I think in case anyone of the
listeners is interested in local-first
05:30and event sourcing, please get in touch.
05:32I think this is a very interesting
topic to research a little bit After
05:37building those apps for like submarines
which sounds super fascinating, you've
05:43at some point also decided to move
on and join Facebook nowadays, Meta.
05:49I'm curious to hear more what
you've been working on there and
05:52whether there was any intersection
with your local-first topics now.
05:58Yeah.
05:58So Meta,
05:59, I guess, like, it shaped the way I
approached the local-first problems after
06:04I left, but the problems there were very
, , not related to local-first at all.
06:08So I guess, yeah, what I worked on,
it was like three different teams, but
06:11they all kind of shared the same theme.
06:12And a lot of what I was doing, we had a
system called Download Your Information.
06:16So this is like any Facebook user
can go click a button and get
06:19an archive of all their data.
06:21So we'd have to like crawl
your entire Facebook graph
06:25and find Everything you own.
06:27So we call it the ownership graph.
06:29And then another thing one of my team's
built was the deletion framework.
06:33So this is like you delete your account.
06:35So go delete every post you've ever
uploaded, every like you've ever
06:38made, every comment, every like ad
you've ever run every message you've
06:43ever sent across all the products,
Instagram, Messenger, Facebook.
06:46What's I guess not WhatsApp.
06:47And then, yeah, so that, that was
like traversing this deletion graph.
06:50And then a third product, like,
variation on all these themes was
06:54like investigations and human review.
06:56So, like, obviously there's
lots of abuse on Facebook.
06:59And we had to have a way for, like,
every product that's ever created, like,
07:03Can we make their content reviewable
by like a human review team at Meta?
07:08So this is like, yeah, trying to fan out
from the content and traverse the graph
07:12of things associated with it for review.
07:14So I guess like, all this like
graph walking was very, goes
07:18very hand in hand with query
languages and also schematization.
07:23So, I guess I mentioned like
three different areas of like
07:27quote unquote compliance, right?
07:28Deletion, download your
information, and content review.
07:32Well, if you're a product team
at Meta, you don't want to
07:35be slowed down with having to
integrate into all these systems.
07:37You don't have to learn about them.
07:38You just want to like ship your product.
07:40So like our key focus was how
can we make it so developers can
07:43just make their product and not
have to understand these things.
07:46So we kind of, we built it
into the schema language.
07:49So as you're defining your schema for
your product, you can say like for a
07:53given edge, like this is an ownership
edge, or this is like a deletion edge.
07:57And then just like by declaratively
specifying all this stuff all these
08:01other systems could just work.
08:03And.
08:03Yeah, eventually when I pivoted to
local-first problem, I was like, wow, this
08:06like developer experience of everything
being declarative and the schema layer
08:12down, taking care of it for you maybe
we can do this for local-first too.
08:16I couldn't agree more.
08:17Did you eventually land on something
where you got those declarative
08:21benefits for some local-first problems?
08:24Yeah, so, the project I ended up
working on was called CR-SQLite,
08:29so I was, like, taking these
ideas and adding them to SQLite.
08:32So, yeah, SQLite's already, you know,
a relational database, and it's already
08:36pretty declarative in terms of you
can define your schema for your table,
08:40and you can specify an index, and you
can specify foreign key relations, and
08:43the database manages all that for you.
08:45But it didn't have any primitives
for collaboration, right?
08:49So, like What if I give you a copy of
my database and I have a copy of my
08:54own database and we both go offline
and we both make a bunch of changes?
08:58What if we want to like sync our
changes together or merge our DBs?
09:01So what I started adding to
SQLite was this concept of CRDTs.
09:05So you could say for a given table,
what type of CRDT should we modeled as?
09:10Like if it's a grow only set, or if
it's like an add or move set, like
09:15technical terms in CRDT literature.
09:17And then for specific columns
in the table, you could also say
09:21like, what sort of CRDTs should
be used to merge those columns.
09:24So yeah, the idea was any developer who
has an app backed by SQLite, they can
09:30just go in and make some schema changes.
09:33And then that app can become
collaborative by allowing merging
09:36databases with other collaborators.
09:39Got it.
09:39And I think the parallel there is that
many things about SQL can be also seen
09:44as rather declarative and the schema
modeling, et cetera, those is where
09:49I think what we gain with local-first
software is the the collaborative
09:53nature of the apps working together.
09:56And I think this is where you.
09:58Brought some of the nice experiences
that you've seen at Meta, where there's
10:02such a strong engineering culture for
having a great developer experience,
10:07bringing that to the new topic that
you were interested in, local-first.
10:12Meta had a really interesting
philosophy when I got there.
10:15I think like one of my first questions
I asked was like, oh, how do we like
10:18make sure people use our stuff that
our teams, our security infrastructure?
10:22And they're building things to make
sure Meta's code was more secure.
10:25That was my first team.
10:26And like, security at Lockheed
was always Everybody hated it.
10:29It was like, nobody wants to do it.
10:31And there's always this
like mandated thing.
10:33So I got to Facebook.
10:35I'm like, how are we, I'm on
security for, how are we going to
10:37make sure people use our stuff?
10:39And it was interesting.
10:40Like the manager at the time was just
like, yeah, like we can't do anything.
10:45Facebook's bottom's up.
10:46We can't do anything.
10:47Make sure anybody uses our stuff.
10:48The only way we can do it is make the
safe way, the default, and easiest way.
10:53Like, people are going to
choose the secure way because
10:55we've made it the easiest.
10:57And like, there's no other way
we can make them choose it.
10:59So yeah, there's a huge, always a huge
focus on everything we did from DevX.
11:03So, that like, developers
would just pick that solution.
11:07Because it was easiest, not necessarily
because it was secure, just because
11:09it was the easiest thing around.
11:10And then it was also secure.
11:12Which is like the nice added benefit.
11:14So after having worked for many years
at Facebook, you've later then started
11:19working on CR-SQLite, but you didn't
arrive at CR-SQLite immediately.
11:24I think you were for a little
bit also still working on
11:28bringing back an app you've been
working on before called Strut.
11:32So I'd be curious to hear a little
bit more about that and particularly
11:36given that you worked on it quite a
bit before you joined Facebook and then
11:41you started working on it again after.
11:43So tell me more about the app and
how that led you to local-first.
11:47Yeah.
11:47So it was my first, I guess,
web app I was building.
11:51So yeah, at Lockheed, I was doing
all these like Java rich clients,
11:54and I wanted to get into web dev.
11:56So I started building this
thing called Strut.io.
11:58It's a presentation editor.
12:00And I knew people's devices,
you know, they have storage and
12:03compute, and the browser had,
you know, some storage back then.
12:07And I didn't want to, like,
run servers for people.
12:09I just wanted them to be able to
work and edit presentations if
12:13they're online or offline and not
have to sign up for an account.
12:17Like you go to the site and you
can immediately start doing stuff.
12:20So yeah, like I guess without knowing it,
I was building a presentation editor that
12:25was, you know, local-first in some ways.
12:28And at that time, it was like 2011, 2012,
there was, you know, I guess the prequel
12:33to the local-first movement, there's
like remoteStorage.io and ownCloud.
12:37And I started getting involved with
those, but then, yeah, eventually
12:40I got the Facebook offer and I
just put Strut on the shelf to be
12:43resurrected sometime in the future.
12:45And then, yeah, when I was leaving
Meta or Facebook, yeah, I didn't
12:49know exactly what I was going to do.
12:50So I figured, oh, I can just resurrect
Strut.io real quick and you know,
12:54it'll bring in some side income.
12:56In the early days before I abandoned
it, it had like 10, 000 monthly actives.
12:59So I was like, Oh maybe a decade
later we can get back to that again.
13:03But yeah, as I resurrected it, kind
of the landscape had changed a bit.
13:06Like people had a lot more devices and
also multiplayer was an expectation where
13:12like, You know, Google Slides, right?
13:13You can have multiple
people on the same deck.
13:16So I was, you know, looking into how I
was going to solve that problem for Strut.
13:19And, you know, rather than I guess I've
been building frameworks for so long at
13:23Meta, like, rather than making this a one
off thing in Strut to solve the problem.
13:29I was like, Oh, like, can I
solve it at a lower level?
13:32And that's when I went down the rabbit
hole of CRDTs investigating SQLite,
13:36adding stuff to SQLite and so on.
13:40So the data architecture you had for
the first version of Strut around, like,
13:45yeah, before Facebook, you mentioned
like around 2012 to when you've left
13:50Facebook and what you've arrived at.
13:53Very different.
13:54So I'm curious, like how you went
from like whether you took rather
13:59some smaller steps or some much bigger
steps altogether to rethink the data
14:05architecture and how you went about that.
14:07Yeah, I mean, I guess beyond just
collaboration, I wanted to support
14:12Yeah, arbitrarily large presentations.
14:15So right, the original version of Strut,
it would load everything off disk and
14:19into memory, and then when you're closing
the tab, or periodically, it would save.
14:23A big dump of memory to disk and yeah,
I think that works fine, probably
14:27for the vast majority of cases.
14:29But I wanted to be able to handle
presentations with hundreds of slides
14:33and tons of binary content or like images
and videos and all sorts of things.
14:39So like at first I was like, okay, how do
I like lazily load this stuff off of disk?
14:43And, you know, as you're starting
to implement that in your own data
14:46model, you're like, Oh, well, this
is kind of like a database, which,
14:50you know, knows how to page in
and out things in and out for me.
14:54So I think, you know, that was also an
indication that, okay, like, something
14:58more like a database would be good.
15:00And then I don't know how I
found that SQLite was better.
15:04I guess I saw James Long's
Absurd-SQL and Actual Budget.
15:08That's when I first realized that, yeah,
like, I mean, I'd always known Wasm was
15:12a thing but that like SQLite was being
compiled to Wasm and could run in the
15:16browser and it was actually faster than
IndexedDB when you were, I guess there's
15:21some caveats there, but SQLite would store
to IndexedDB, but it would store pages.
15:26So it would store four kilobyte chunks.
15:28IndexedDB, when you're like.
15:29Reading things in and out
individually, like single objects
15:32by key value is like super slow.
15:34And the fact that like, okay, you're
using SQLite and reading in chunks
15:38rather than individual objects, like
it would be faster than IndexedDB.
15:42Like if you change your IndexedDB
access to be paged, okay, it
15:45could be faster in SQLite.
15:48SQLite, you can still do point queries and
be faster than a point query in IndexedDB.
15:52So that was something else that turned me
on, like, oh, maybe I should use SQLite
15:55in the browser rather than IndexedDB,
so I don't have to implement all this
15:58crazy stuff to make IndexedDB go fast.
16:00And then, you know, there's all the
problem of, like, collaboration still
16:04and my justification for picking SQLite
then was, you know, not only is SQLite
16:09running the browser now, if I ever want
to take this app and make it native
16:14well, SQLite, Everything I build for
collaboration in the SQLite, like it'll
16:18work on native, it'll work on desktop
it'll work in the cloud if you want
16:24some like cloud peer or something rather
than like a JavaScript only solution.
16:29So you made the decision, SQLite is the
way forward, SQLite on top of IndexDB or
16:34on top of like some persistence mechanism,
but you now also need collaboration.
16:39You looked a little bit into
CRDTs, but you still decided
16:42instead of going CRDTs first in
terms of the developer experience.
16:46You still want to give a developer
yourself SQLite as the primary way to
16:52think about the developer experience,
but you started to use CRDTs or you
16:57wanted to use CRDTs as a implementation
detail to bring into SQLite.
17:02So how did you go about that?
17:03How did you solve that puzzle of making
SQLite collaborative out of the box?
17:09Yeah, so SQLite has a bunch of
mechanisms for extending it.
17:13You can make new virtual tables.
17:16yeah, a virtual table
is essentially a table.
17:18It presents itself as a table in SQLite,
except that The implementation of that
17:24table is any arbitrary code you want to
write, so C code, Rust code, whatever.
17:28You can make functions, new functions,
like a mat, you know, like min or max
17:32or whatever function you want to define.
17:34You can make new virtual file
systems, so it's pretty extensible
17:38from the plugin side of things.
17:39And yeah, one of the teams I had worked
with at Meta, they built this thing called
17:44osquery way back, I think, You know,
some other company may have seen osquery
17:48now, but it basically lets you query all
details about your OS through SQLite,
17:53and they did that through virtual tables.
17:55So yeah, for adding CRDTs to SQLite,
17:57it was kind of a combination of
a bunch of special functions.
18:01That like when you create a
table so, plus special functions
18:06and special virtual tables.
18:07So you can like, create a table, and
then you can call a function that
18:11converts that table to a CRR, or
Conflict-Free Replicated Relation.
18:16So just a fancy name for
a table that's a CRDT.
18:20Or you could use this like virtual
table syntax where you're like, I want
18:22to create this table as a CRR and then
you list all the columns and in the
18:31columns, so when you're creating a
virtual table, like you can customize that
18:34syntax all you want in your extension.
18:37So I could like have a
slot for custom data types.
18:40So you can say this is a string.
18:42And then like another addition to
the data type is like the CRDT type.
18:46So it's not only text, it's like a
perryText, or it's like, not just an
18:50int, it's an int that's a counter.
18:53Or it's not just a date, it's
a date that slash right wins.
18:56Yeah, and then and then I guess the
last thing, right, like, it's great
19:00to have a database that's, you know, a
CRDT but how do you merge efficiently?
19:06. There had been some earlier work, actually
by a guy named Iver and we actually worked
19:11together pretty early on, on CR-SQLite.
19:15He did his master's thesis on essentially
adding CRDTs to SQLite but some of the,
19:20like, Downsides of like that original
approach was there's, there was no
19:24way to incrementally merge databases.
19:25You had to literally send the
entire copy of the SQLiteDB to
19:29somebody else, and it would like
scan every row and merge them all.
19:32So yeah, one of the key problems
we had to solve was like how do we
19:34incrementally sync And that was done
essentially like implementing a global
19:41version that gets tagged on every row
and a virtual table that knows how
19:45to find all the rows past the given
version and just send you those to sync.
19:51Was there any prior art to combining
a relational database and CRDTs or
19:58another mechanism to synchronize?
20:00Yeah, so there's Iver's work, and
then, I guess, James Long's work on
20:05well, Absurd-SQL, but I guess Yeah, I
don't know if he ever gave a name to
20:08the layer above it that did the CRDTs.
20:10But yeah, James work, if I recall
correctly, was all in JavaScript.
20:15So it wasn't built into SQLite itself.
20:17You couldn't just run it
anywhere SQLite could run.
20:20It had to be in a JavaScript environment.
20:22And he took a different approach.
20:24I think he used hybrid
logical clocks and Merkle DAG.
20:29So yeah, I guess we started
getting some trade offs of like,
20:32Strut was the first use case.
20:34Which actually informed a lot of design
decisions for CR-SQLite and why I
20:37didn't want to use something like James
Long's approach or, I don't know other
20:41solutions that were available at the time.
20:43Yeah.
20:44I'm curious to hear more about
your your judgment of the trade
20:48offs and what made it a good versus
not so good fit for your use case.
20:52Yeah.
20:52So I think James's approach yeah, I hope
I'm remembering everything correctly.
20:55This was like a couple of years ago,
but when I looked into it, every
20:58single message for all time was stored.
21:01So this is, you know, kind of
like event sourcing, right?
21:03But like, one of my key requirements,
like, I was using Strut to drive
21:08requirements for CR-SQLite.
21:10And I wanted the ability for people
to be offline as long as they wanted.
21:14For it to be able to run on
constrained, low memory devices.
21:18And so those two in combination with
the fact that it's like a presentation
21:22editor where you can like, you can drag
and drop components, you can spin them,
21:26rotate them type, like I didn't want
every single event for every keystroke
21:31and every drag and all these things
to be logged for all time, right?
21:35Especially if you're offline indefinitely.
21:37So if you're offline for like, I
don't know, a couple days editing
21:41a really big presentation, you
come back offline, I want it to
21:43be able to sync almost instantly.
21:45So the approach I chose was the set of
CRDTs that allow you to collapse history.
21:51So, yeah, if you've been off The
database size never grows more than
21:56a constant factor of the base data.
21:59So no matter how much you're editing,
you're always going to be within a
22:01constant factor of the regular DB size.
22:04I don't know, I think that constant
factor was like three or four times,
22:07I don't recall the exact number.
22:09Whereas the other systems, if you're
the longer offline, the more and
22:12more the data grew which was, you
know, not a trade off I liked.
22:15Yeah, I guess the downside to my
approach is there's a few less
22:18transactional guarantees and you
can't, you know, rewind history.
22:22But if somebody really wanted
rewinding history, I was like, Oh,
22:24they could do it in user space.
22:26Like, that's not the
problem I'm trying to solve.
22:28That makes perfect sense.
22:29And I think I really like, you focusing
on the use case that you're most familiar
22:34with Strut and the requirements around
that, Since we're dealing with like
22:38distributed systems and programming
is hard enough, that you can't really
22:43focus on all use cases at the same time.
22:47And this way you design for
the use case that you know of.
22:51And I think this is where you have like
a canvas app or something like that
22:55where you move things around constantly.
22:58If you work for multiple days, you
don't really care exactly about
23:02the change you've made over like
five seconds, three days ago.
23:06You just care about, okay, this is
the state it is in right now and that
23:10other users converge to the same thing.
23:13Whereas I think a different set of
requirements might be for a chat app
23:19where you might not send around, like,
which sort of keystrokes someone has
23:25taken to arrive at the final message.
23:27You send the final message, but
then those final messages, you
23:30might actually care about, okay,
this came at this point in time.
23:32One message does rarely
override another one.
23:35So this is, I think, where
that might be a better fit.
23:39And that makes also sense that in James
use case with Actual Budget that's all
23:44about, like, historic transactions.
23:46You want exactly those
like records and time.
23:49There's probably not so many,
and even if there are many,
23:52they're all kind of worth it.
23:54They need to be accurate.
23:56So I think that makes perfect sense, and
I think there will be like many different
23:59solutions for many different use cases.
24:02And I think there's even, you know, an
approach how you could apply a bit of like
24:06that compaction, even to event sourcing.
24:08I think there's like a concept
called subsumption, I think.
24:12And that allows you to also like
flatten some of those historic events.
24:16So that's super interesting
topic of research, I think.
24:20. Yeah, I know there for certain
CRDT algorithms and certain
24:23event sourcing things, like.
24:26Yeah, there are ways to compact
history, but it did seem like, still
24:29a pretty evolving area of research.
24:32I think now it's a lot more
understood especially in the,
24:35like, list based CRDT space.
24:37Sure, like, I don't know, I guess,
Yjs kind of pioneered a lot of
24:41that work of, like, run length
encoding, everything, and then stuff.
24:45Yeah, I guess what I want to
quickly mention, so there's, when
24:47I was first getting started, there
was a community called Braid.org
24:51which I got involved with early on.
24:54And yeah, I don't know, that was
an amazing community in terms
24:56of like learning and being open.
24:59So I just want to call them out, like
they helped me a ton to understand
25:03all these CRDT algorithms and when
history can be compacted, when it can't.
25:07What kind of weird circumstances you
get into if you know you're emerging
25:11changes peer-to-peer versus, you
know, client server and whatnot.
25:14Yeah.
25:15There this entire community, I'm
not yet a part of, but given that
25:19this year I'm also looking much more
into implementing, syncing this.
25:24I feel like there's a lot I have
to learn about and I would love
25:28to also have some folks from that
community here on the podcast.
25:32Yeah they're trying to add.
25:34Syncing to the HTTP protocol, so
essentially adding a new verb rather
25:40than get, put, and post to allow you to
merge documents or resources together.
25:46That sounds fascinating.
25:48How far along is that effort?
25:49They are part of IETF now.
25:52I don't remember if they're a research,
there's like two separate types of groups.
25:56There's like research groups and
maybe there's standards groups.
25:59I think those are the two.
26:01Yeah, I don't know if they're
still in the research phase or if
26:03they're on the standards track yet.
26:05Very interesting.
26:06I have to look more into that.
26:08So, besides choosing the right CRDTs
to fit into SQLite through virtual
26:14tables I think this, like, just picking
SQLite to run in the browser quite a
26:21few years ago, that's probably you've
must have been running into a bunch of
26:25challenges and issues along the way.
26:27So I'm curious to hear first getting
sort of like a broad overview of
26:31the different challenges you were
running into and then going into them.
26:35Yeah, I think I wasted months of my
life, like fighting SQLite issues.
26:39Cause yeah, it was really early where
the official SQLite WASM project
26:44had kind of like just started.
26:46And I guess some of the first
problems I ran into was.
26:51I guess compiling an
extension into the WASM build.
26:55So yeah, it's one thing to
write a SQLite extension.
26:57It's another thing to get
it to run correctly in WASM.
27:03It wasn't so hard when the extension
was originally written in C, but like,
27:06I don't know, like trying to code review
contributions in C and make sure there's
27:10no like memory issues was a nightmare.
27:13So I eventually like bit the bullet,
started learning Rust and re implemented
27:16the extension in Rust, which I think
was, yeah, it became like a third
27:21or two thirds less code and just way
faster to write after doing that.
27:24I think it was worth the time.
27:26But that introduced this whole new
complication of like getting the
27:29Rust compiler toolchain to compile an
extension that can be linked to the
27:33SQLite compiler toolchain that then goes
through Emscripten to build a Wasm bundle.
27:39And let me see what else was there.
27:42Transactions was an interesting thing.
27:45So the inner, the originally
the interfaces to SQLite
27:49in the browser were async.
27:51And if you ever had Well, I guess
there's two builds of SQLite.
27:55The official SQLite build
was always synchronous.
27:57There's this unofficial build, which I
think is better, called wa-sqlite, which
28:02all the interfaces were asynchronous.
28:03But it had these interesting caveats,
like if you await two calls to
28:07SQLite at once, it would deadlock.
28:09So you had to make sure like there's
only one Call to SQLite ever at once.
28:14So, you know, working
around that challenge.
28:16Transaction interleaving was another,
so like if it's asynchronous and you
28:20start a transaction and you have all
the statements in the transaction, okay.
28:24You, you began the transaction
and then you like do a read,
28:26you're waiting the read.
28:28Well, while you're waiting in the read,
you delegate control of the event loop,
28:31so somebody else could come in and,
you know, kick off a new task, right?
28:36A new statement to do
a write or something.
28:39Well, now that's somebody else's
transaction that's trying to run while
28:41your other transaction is running, so
you get this, like, weird interleaving.
28:44So, yeah trying to lock out transactions
in the JavaScript layer that was
28:48another annoying thing to deal with.
28:49And then, yeah, this is not related
to WASM, but just getting SQLite
28:56with the extension built for all
the platforms I wanted to target.
28:59So, like, eventually Expo wants to
include it in their Expo SDK, so figuring
29:04out how to do Android and iOS builds in
addition to, you know, the WASM build.
29:09And Mac and Linux builds.
29:12Those sound like quite the buffet of
different challenges that some sounds
29:18familiar to me as well as I think some
you've been pioneering a tad earlier.
29:23And by now, some of those
are maybe a bit more.
29:27A well trodden path, but I
think there's probably even more
29:31that we don't understand yet.
29:33One more, I know one more interesting
one I think more interesting than
29:36the other ones is since SQLite
in the browser, at least the one
29:41I'm using, is IndexedDB backed.
29:43IndexedDB has a very high cost
for opening a transaction which
29:48I was unaware of at the time.
29:49So yeah, there's a lot of work.
29:51And anytime you open a SQLite transaction,
it would open an IndexedDB transaction.
29:56So there's a lot of work making sure
that, like, for reads Yeah, normally
30:00you just issue reads, you wouldn't
necessarily batch them into a transaction.
30:03Like normally, you know, you'd batch
your writes into a transaction.
30:06But yeah, to make SQLite in the browser
go fast if you're doing a whole bunch
30:09of reads, like making sure a transaction
was open automatically all the reads
30:14would happen and then the transaction
would close once all the reads are done.
30:16So it's like way of batching that was like
a 10x speed up in the browser for SQLite.
30:21So once you've solved all of those
performance challenges and also like
30:26correctness, syncing challenges,
et cetera, then it's also needs
30:30to be competitive somehow with
the typical developer experience
30:34you have in a React app or in a
different framework you're using.
30:40How did you go about designing that?
30:42Yeah, so, yeah, I've developed a set of
hooks, so, I don't know, it's, I guess
30:47like the React integration was probably
the least amount of, I don't know,
30:51I'd say it's the least amount of work.
30:52Maybe not once I like think about
all the strict code problems and
30:55stuff but yeah, essentially the
DevEx was like a use query hook.
30:59So I think it's a lot easier to
develop when your queries are
31:03co located with your components.
31:04So every component is responsible
for getting the data it needs.
31:08And this makes your app like
super composable, right?
31:10Like you can add and remove components
to your app or your component tree.
31:15And because they're responsible
for getting their data, like, you
31:18don't have to do any additional
wiring or prop drilling.
31:20And yeah, they're not going to break
other components because suddenly,
31:26Some data dependencies are missing.
31:28So I really like that, yeah.
31:29Co located queries, so every component
you could use this hook called UseQuery.
31:34And in that UseQuery, you just
literally write your SQL that fetches
31:39the data required by that component.
31:41And that SQL could be as complex
or as simple as you wanted.
31:44So, you know, like, select from issue
where id equals issue id or something
31:49simple or, you know, something
complex like select star from issue,
31:52order by modify, join labels join
owner, and yeah, all these things.
31:58So you've managed to tame most challenges
with SQLite in the browser, making sure
32:03it works in Wasm, making sure that your
own extension is working, et cetera.
32:08And SQLite can be super, super fast
for most workloads, but once your
32:14cardinality grows, et cetera, your
tables get really big then queries can
32:19also start Taking a little bit longer.
32:22And that is because SQLite always kind
of, besides a little bit of caching,
32:26always starts with your results for
your queries from scratch, kind of.
32:31And you know, I've been chatting about
this for quite a while back then.
32:35This is how I know about
all of this context.
32:37And that has led you to look into.
32:40incrementally maintaining the
query results and the views.
32:43So I'm very curious to hear more
what you've learned back then
32:46and which path this led you down.
32:49Yeah.
32:50Yeah.
32:50So I guess, you know, I was talking
earlier about each component
32:53fetches its own data, has this
use query hook, or just, you know,
32:56specifies a SQL query it's running.
32:58So when you're developing like a rich
client side application like this,
33:02ideally your database is reactive, right?
33:04So any, anytime some data changes.
33:07It's like you, you work
on Overtone, right?
33:09The music app.
33:10If somebody presses play, then, you
know, all the components that care about
33:16play state need to immediately update.
33:18And yeah, as you're saying in SQLite, if
you're doing this somebody changes some
33:22state, well, you know, SQLite doesn't have
any, you know, facilities for reactivity.
33:27It has a few but they're not very good.
33:29Like they don't give you many fine
grained notions about what changed
33:32or like what queries it impacts.
33:34So what you're left doing is rerunning.
33:38Essentially from scratch the queries
that could have been affected.
33:42So, like, the SQLite change notifications,
they'll tell you the row ID that
33:46changed, but not the contents, and
they'll tell you, like, the table.
33:49So it's usually not enough to figure
out exactly which queries to invalidate.
33:52And yeah, so you rerun tons
of queries throughout the app.
33:56And those are all rerunning from
scratch, so like, if you have, like,
34:00for your track list, you're showing a
few hundred tracks, and that track list
34:03has to do a number of joins, right?
34:05Has to, for a track, has to join the
album to get the album title, has to
34:08join the artist to get the artist names.
34:12Yeah, I don't know if
there's any other joins.
34:13Maybe there could be joins about,
like, like status, or favorite status,
34:18or something about the track, right?
34:20And, yeah, rerunning that join to
select, you know, 200 some items.
34:25And yeah, maybe they just want to
sort, and sorting it doing that from
34:28scratch every time somebody mutates
something severely limits I guess how
34:32fast you can interact with the app.
34:34And yeah, I'd worked, tried a
bunch with like, okay, you know,
34:38maybe a purely in memory SQLite,
and that gets you pretty fast.
34:42But yeah, somebody was saying,
oh, like, The goal for a reactive
34:45database is it's memory fast.
34:47And then when I started, like, I
don't know, that idea, a lot just
34:50helped my brain and I started like
actually benchmarking, you know,
34:54if I'm just observing a value
in JavaScript, how fast is that?
34:57And if I'm like, Observing a
query in SQLite and updating the
35:02row and re running the query.
35:03Like, what is that?
35:04And I have a observable notebook
somewhere that like compares all this
35:08and like the difference was massive.
35:09So like just seeing how far I was
from memory fast, like started
35:13making me a bit sad on SQLite.
35:15And also like benchmarking the original
strut and the new one, which use
35:20SQLite and these use query hooks.
35:23Yeah, like the original one I could
throttle my Chrome, you know, but in
35:27the dev tools you can like downgrade
your CPU right to like 6x slower or
35:31something and I could go all the way down
to the slowest possible and everything
35:35was buttery smooth, but in the new
one, like if I downgraded it too much,
35:40like you could see some visible lag.
35:42SQLite
35:45is fundamentally built in this
request response style, right?
35:50This is the era of the LAMP stack
where, you know, databases request
35:55response made sense, right?
35:56A user would go to the website
it would do a query against the
35:59DB and it would render, right?
36:01There was no, like, rich interactivity.
36:03It was always a full
page refresh every time.
36:04So request response made sense.
36:06Yeah, but for these rich apps, request
response no longer makes sense.
36:09And trying to fit a request response
sort of DB designed around that,
36:13it into this reactive scenario.
36:15And I think maybe one day
somebody can get there.
36:19But I think right now it didn't seem
to make much sense to me and the juice
36:23didn't seem to be worth the squeeze.
36:25So I started, I guess my love affair with
SQLite has slowly been coming to a close.
36:30And I started, you know,
investigating other projects.
36:33Yeah maybe one, one day
port it if there's time or.
36:36porting some of these projects
back to SQLite and contributing
36:39incremental data flow, differential
data flow to them or something.
36:43But yeah, I guess I should describe
what these projects are, right?
36:45, So I like the idea of queries, being able
to like, Issue a complex declarative query
36:52against some set of relational data to
get back the data you need for your view.
36:56So I wanted to like, how can
I , incrementally maintain an
37:00arbitrary query against some data?
37:02So like that track list example
you query the, Tracks and their
37:06artists, and their albums, and the
play state, and all these things.
37:10How can we, when somebody does
it right, rather than re running
37:13the query to get the track list
it knows exactly which queries.
37:17Should be invalidated by that write,
and rather than re running them,
37:22knows exactly how to patch up the
rows that are impacted by the write.
37:27So, you know, at first this problem
seems absurdly complex and hard.
37:32And you're like, Oh, yeah, I
want to like invalidate and
37:34patch up any arbitrary SQL query.
37:37But yeah, I started
reading some of the papers.
37:39There's like a paper called DBSP.
37:41It's a differential data flow paper.
37:43I think Materialize is based on that.
37:47And then, yeah, started implementing it.
37:49And then I realized like, I don't know,
this, once you've read it and started
37:52implementing it and you realize like,
this is very similar to other stream
37:56processing stuff , like, and other query,
you know, builders I've done,, right,
38:02where rather than, you know, you create
a series of operations and rather than
38:06them asking a database for the data,
they're like taking streaming data in.
38:12So yeah, like some of the work I'd done
at Meta was on real time abuse systems
38:17and that was all streaming systems.
38:20So I was like, oh, like this
problem isn't that mysterious.
38:22Like I've done stuff like this before.
38:25It seems tractable.
38:26So I, you know.
38:27went full on, decided like, I think we,
I think I can implement this yeah, based
38:32on prior experience, based on having
done query languages before so yeah,
38:36MaterialLite was born which is bringing
differential data flow to JavaScript, so
38:43you can compose, filter, map, reduce, and
a join operator to create these pretty
38:48rich queries and have them reactively
updated anytime there's a write.
38:53Right.
38:53And so for those of the listeners who
have not yet built their own database
38:58and maybe are familiar with like SQL
where in SQL, you have like select from
39:04where, and all of those and JavaScript
or other programming languages who you
39:08might've like, Map, filter, et cetera.
39:11They might have different
names, but they're sort of
39:13like conceptually very similar.
39:15And this is where you're basically just
now like trying to recreate the same
39:20semantics that we have from SQL, where you
can say select star from this where so,
39:25and you can basically have like an array
where it now say .Map .Filter, et cetera,
39:32and where you flip the trade offs from
before you owned in SQLite, you might get
39:38this query just once and then it's done.
39:40You no longer are interested in like
subsequent changes and then like, smaller
39:45updates, but in JavaScript where our
app stays warm, we click the button.
39:49Something's changed slightly.
39:52You want to change the trade
offs quite significantly.
39:55And that has led you to Materialite.
39:57Yeah, so like just to build off on like
how SQL concepts map to like filter,
40:02map, and reduce, or like map represents
select or function application, like
40:08you're selecting some columns, well
that's mapping the original object to some
40:12other object with a subset of the fields.
40:14Filter is the same as where Reduce
is what you use for any aggregate
40:19function, like group by, sum, count.
40:21Yeah, and I guess one other
operator would be concat.
40:24So if you want to do or, you just,
you fork your stream that you filtered
40:28or mapped over, and then after you've
filtered it, you concat the streams back
40:32together, and then you run a distinct.
40:34So you can do an or and then join,
joins a special thing where essentially.
40:39You kind of maintain what you've
seen from one stream and then as you
40:43see results from another stream, you
link them together based on some key.
40:46At first, it like, Seemed like a large
barrier, but then I don't know, once you
40:51get into it and you realize, Oh, like
I've done all this before and filter,
40:53map, reduce, like these have direct
analogs in SQL, like it's not so bad.
40:58So we've so far like touched
the most common usage of SQL.
41:04But SQL has, or SQLite, Postgres,
et cetera, but like sticking with
41:08SQLite for a moment there's quite
a bit more than like the common
41:12like select from where, et cetera.
41:15There's like various kinds of
aggregations, various kinds of ways
41:18to combine queries, sub queries,
joins, various kinds of joins.
41:23Is the system that you've came up
with MaterialLite, is that on feature
41:27parity on what SQLite is able to do?
41:31Or are there some parts where you
say, okay, that's out of scope because
41:35there's years worth, maybe decades
worth of optimization have gone into
41:40SQLite into making that fast and
that's out of scope for Materialite.
41:44Yeah.
41:44So there's no window functions
and there's no recursive queries.
41:48And base Materialite is
strictly for streaming.
41:52So, like, if you're processing a
stream of writes or events, like
41:57Materialite is a great fit because
it's gonna, you know, patch up your
42:01query results as those events come in.
42:04But say you have a table of a
million items and you want to
42:09query it from scratch, right?
42:11Like you didn't see the events,
you couldn't maintain any queries.
42:15You're just like, I want to
know what's in this table.
42:17So, so you're going to have to run that
query from scratch, even in Materialite.
42:21But for Materialite, what that means
running a query from scratch is like
42:24setting up the Dataflow pipeline and
then feeding literally every row,
42:27all million rows into that pipeline.
42:30Which, yeah, since everything's
implemented with, like, filter and map,
42:34where, like, it's opaque, like, when
you do a filter, you provide a lambda.
42:38It's opaque.
42:38That's what fields you're filtering on
which makes this from scratch case hard
42:42because, like, Oh, if I don't know what
fields you're filtering on, then I don't
42:45know if I can apply some sort of index
that you might have against this data.
42:49So, yeah, I guess after working
on Material 8 and getting it to
42:54work pretty well, like, okay,
that was the next problem.
42:57And it's like, we need an actual
query language because in a
43:01query language is declarative.
43:02You express, like, This is the
field name I'm filtering on.
43:05This is the operator being
used and this is the value.
43:07So the engine knows exactly where the
fields are ordering by the tables you're
43:12using the fields you're filtering on.
43:15And once you have that knowledge, you
can also make that like first query
43:19case fast, where I guess you're doing
the traditional database thing of like
43:24figuring out what indices to pick.
43:26I don't think we'll ever, Be as
fast as SQLite for that case.
43:29But I think, you know, For an
app like, for a rich client
43:32like, Overtone or Strut, right?
43:35, Most of your queries are
subscriptions, they're not this
43:37like from scratch Sort of thing.
43:38Like, You're gonna subscribe, you're
gonna like, You're gonna like, Set up
43:42a query that subscribes to the slide
list, set up a query that subscribes
43:45to the component list on a slide.
43:48And then as writes happen, you just
want to update those rather than,
43:52yeah, always, like, rather than
having to have first class support
43:55for running a query from scratch.
43:58So, yeah, so that latter thing
needs to be just fast enough, but
44:00it doesn't need to be, like, Yeah,
the fastest thing in the world.
44:04I totally agree that there's a very
interesting, different kinds of trade
44:08off that should be explored and should be
built that makes so much more sense for a
44:14live active application where everything
stays up to date and then changes
44:20quite minimally as opposed to request
response, which is how SQLite works today.
44:26That being said, you can still get quite
a bit of mileage out of SQLite today.
44:31If you are a little bit more mindful
about like how you work with it.
44:36So some patterns that I found to Makes
SQLite work for me with Overtone.
44:42Is that what makes SQLite the slowest
in a browser context is shoveling data
44:48from within the SQLite database into
JavaScript land and back and forth as
44:54this is where you need to cross the
memory boundaries from memory staying
44:59within Wasm to how it's being like
marshaled into JavaScript objects.
45:05And what I found as a pattern that helps
quite a lot is just keeping that small.
45:11So if you need like a, and this is where
I think you can apply quite a few of the
45:15tricks that you've now mentioned and you
embrace with Materialite, you can actually
45:20bring back and layer on top of SQLite.
45:23Probably still not quite
as fast as Materialite.
45:27But probably fast enough to make a use
case like Overtone work within that
45:31paradigm as well, that when the list
changes, instead of like, getting a
45:36new list of 5, 000 tracks for that
playlist, every time those 5, 000
45:40tracks, you just get a change in terms
of like, Oh, this new track was added.
45:46And getting that SQLite, I think can
also be done with a few tricks namely
45:52also like a temporary table where you
can save the previous result of your
45:58query before and after the change.
46:01And then you query that in a diffing way.
46:05And then you still need to
sort of like work with that
46:08diff information in JavaScript.
46:11But this is how you can work
around that performance challenge.
46:14But your system with Materialite.
46:17All of that is absorbed from
you, which is very attractive.
46:20Yeah, and I think right, so Materialite,
right now, it can be backed by
46:24an in memory collection, or it's
just some little interface that's
46:28called a source that you implement.
46:30So I think, Yeah, SQLite could be
that source, and then that would
46:33solve the like, first query problem.
46:36So I just compile the Materialite
queries to SQL queries.
46:40and then the incremental updates are
maintained by Materialite going forward.
46:44Yeah, obviously you have to like,
listen to the transaction failures
46:46and somehow roll those back.
46:48But yeah, I think it's doable to
like, Marry those two together and
46:51have kind of the best of both worlds.
46:54And I know there's like
a project called GRDB.
46:56I think it's like a iOS SQLite wrapper
which adds a bunch of reactivity.
47:02it's not like completely general.
47:04It doesn't support like completely
arbitrary SQL queries, at
47:06least in an efficient way.
47:08But I think it gets you close enough
with the queries it does support.
47:11One more thing about like, moving
from the request response style to
47:15like, queries being subscriptions.
47:17It gives you these really
cool opportunities to optimize
47:19queries at a global level.
47:22So like, if all your queries
are subscriptions, you can see
47:24what subscriptions are open.
47:26And then you can start seeing
which queries are either
47:29identical and deduplicate or
which queries share operations.
47:34So like, maybe you have a whole bunch
of queries that all do the same join.
47:37While in the streaming system.
47:39You can optimize that so that join happens
once and then the results of the join are
47:44fanned out to all the queries that use it.
47:46So yeah, you can do some really neat
tricks at a global level once you
47:49model your queries or subscriptions.
47:50Yeah, that's super interesting.
47:52And a few of those things I'm exploring
for LiveStore myself where they're
47:58basically the problem of reactivity.
48:01And efficiency, trying to keep the
performance as good as possible,
48:05but also squeezing that into a frame
budget since LiveStore and Overton
48:10all run within the main thread,
which I'm not sure that's probably
48:14also the way how MaterialLite works.
48:16There is Interesting to also see it
through a lens of like a different
48:21paradigm shift evolution in the JavaScript
front end world which is about signals.
48:27And signals is also about like, I think
a more lighter weight reactivity system
48:32that is all about being composable.
48:35And and I think those worlds are
now finding, like, a way together
48:39within live store with the prior work
with Jeffrey and Nicholas on Riffle.
48:45This was like some early implementations
based on a paper called Adapt On.
48:50And from my understanding all of like
the research that has gone into adapt on.
48:56And sort of like in parallel,
what's been evolving as signals.
49:00Those are actually the same thing.
49:02And I think now we can do some really
interesting work on combining the
49:07power that like more substantial state
management primitives, Like a database
49:13gives us or something like material
light gives us and bring that together
49:17with something like a signal system that
also kind of out of the box address the
49:23thing you've just mentioned, which is
reusing parts of the computational graph.
49:28The only thing signals is missing.
49:29I'm surprised nobody's done this yet.
49:31Is Incremental computation
against collections, right?
49:34So you think of like, I have an array,
and I map the array, then I filter
49:37the array, then I reduce the array.
49:39That's like creating a new
copy of the array each time.
49:42And if you like, add one element
to the array, you like, rerun all
49:45those against the full array again.
49:47So it's kind of like the query
from scratch case, right?
49:49Whereas Materialite, right, You
add an element to the array, well,
49:52it's only going to run the filter
on that one element and the map
49:55on that one element rather than
running against the full array.
49:58So there's some interesting benchmarks
maybe I can share with you after of
50:01like, yeah, incrementally maintaining
a map filter reduce pipeline
50:05against an array of a million items.
50:07Yeah, so it'd be really cool if like
the signals implementers would even
50:10implement this sort of stuff eventually.
50:12So even though your love affair with
SQLite has for now come to an end.
50:18I think the problems and goals
you're chasing after are still very
50:22similar to what I think is feasible
with SQLite, just maybe with like
50:27a different performance footprint.
50:29I think with SQLite you
also get the benefit.
50:32of just that being like a very
familiar and trusted thing that's
50:35like been proven and battle
tested for literally decades now.
50:40And even though you said like you
were running into some performance
50:44issues SQLite is still so ridiculously
fast that it's very feasible to do
50:49write those applications in the,
in a browser context, et cetera.
50:52And I think even if you now on
a modern CPU straddle your your
50:57CPU, it's still manageable.
51:00But I think there's multiple
paths to arrive at the same goal.
51:05And so as my understanding is that the
project Materialite opened a couple of
51:11really interesting follow up conversations
for you which is now opening a new
51:16chapter for your professional life.
51:19So do you want to tell
us a bit more about that?
51:21Yeah, I guess, yeah, Materialite
is my least known project,
51:25least complete project.
51:27But yeah, it did pick up some interest
by Rocicorp specifically Aaron, who I
51:32met at the local-first conference in St.
51:35Louis.
51:36And yeah, we're talking about, you know,
the model of subscript, those queries as
51:39subscriptions and like having a reactive
app and I think he's the one that said
51:42like, oh, things need to be memory fast.
51:45And yeah, so, so I started working on
MaterialLite and we did some explorations
51:49of like, if you powered one of their
demo apps with MaterialLite, like they
51:54have a linear clone that's a demo
app and they want to see like, Oh,
51:56how many issues can we scale this to?
51:58If we like use this
differential data flow approach.
52:02Yeah.
52:02And they really liked kind
of the initial results.
52:05And yeah, we worked together, I guess.
52:08quite a bit over the last
four slash six months.
52:10There was like a two month
break for me on paternity leave.
52:13And yeah, so I'll be starting a job with
Rocicorp here pretty soon as a partner.
52:18Yeah, I'm super excited about that.
52:20That's amazing.
52:21I mean, we had Aaron on the show.
52:23I think that the second show for
localfirst.fm, and there were so many
52:28deep insights there by Aaron, and I'm a
big fan of the products that's already
52:33been built there, Replicache, Reflect,
and sounds like your work with Materialite
52:40is also going to be incorporated
into making the products even better.
52:44So I'm curious to hear more of your
thoughts on you know, what do you feel
52:49like where's local-first going with the
products you're working on with Rosicorp?
52:55Local-first provides a superior
DevX because you're no longer
52:59worried about APIs of, like, how
do I get data from the server?
53:03Like, a sync engine, or your database is
solving, you have a sync engine solving
53:07that for you, right, where you're just
coding against local data rather than
53:11setting up all these weird REST APIs
that you have to call to get the data.
53:14And yeah, I think that's the key value
of local-first and the main thing
53:19that Rocicorp is pushing forward.
53:21It's like making that sync
engine experience as best, as
53:24good as it can possibly be.
53:25So you're now joining Rocicorp where
you're building Replicache and Reflect,
53:30but the local-first space has grown
quite significantly to a point that
53:34can be almost a bit disorienting.
53:36For newcomers who are trying to first
figure out what does it mean to build
53:41the app local-first and then also
choose the right tool for the job.
53:45Can you provide a little bit of guidance
which how someone should think about
53:49use cases and picking the right tool?
53:52Yeah.
53:52So I guess maybe when you're somebody,
when the first question somebody
53:56asks, should ask themselves is, "Do
they want servers involved at all?"
54:01Is the first question.
54:02So, like, if you want all sync to
go through a central authoritative
54:06server, and you want that server to
be able to override decisions made
54:10by the client apply data integrity
rules revert changes made by a client
54:16Replicash is a very good choice for that.
54:18If you want something, Where, you
know, you don't want to preclude the
54:22ability for peer, for nodes to sync
peer to peer, or maybe the data they're
54:27syncing is end to end encrypted.
54:29I think something like
CR-SQLite actually shines there.
54:33CR-SQLite lets you merge in any
topology you want, whether it be
54:37peer to peer or hub and spoke.
54:39So we've seen a couple of companies that
are, have like end to end encryption,
54:44where they use CR-SQLite to sync,
or they have literally a pub sub
54:48channel, where all peers just publish
their messages, and any peer who
54:51receives the messages integrates it.
54:54In those use cases and then I guess
if you're really looking for, like,
54:57if your data model is like document
based, I guess going back to syncing,
55:01so not just CR-SQLite for peer to
peer, but also Yjs and AutoMerge
55:05is capable of peer to peer sync.
55:07They're also, of course, if you
can sync peer to peer, you can
55:09sync through central server too.
55:11If you're looking for, you know, you
just want to support a collaborative text
55:16field collaborative text doc and you don't
need, you know, ways to query over your
55:22documents, like you're fine just storing
this doc in a blob in your Whatever your
55:26storage is you know, Yjs is a very good
choice because it's primarily targeting
55:30it at text editing, collaborative
text but they don't have, you know,
55:33facilities like built in for persistence
or querying and these sorts of things.
55:38So I haven't kept super up to date with
AutoMerge, but from what I've seen if
55:42you want like, the ability to fork and
look at different portions of history.
55:47So like, say people are editing a
document somebody wants to fork their
55:50document off, make a bunch of changes
and merge the fork back in I mean, I see
55:55AutoMerge talks about this concept a lot.
55:57So I think if it's not there already,
AutoMerge would be your best bet for some.
56:03for that sort of behavior.
56:04Yeah, then there's ElectricSQL.
56:06So that is like a, you know,
Postgres centric solution.
56:11So if your database is Postgres and you're
okay with, like, merging based on CRDT
56:17rules You know, that's a good choice.
56:18So yeah, so I say like your
back end is Postgres and you're
56:21okay like using CRDT rules.
56:23Like yeah, what does that mean?
56:24So a really interesting thing about
Replicache is it can work with any
56:27back end, pretty much any back end.
56:30So Postgres, MySQL, all sorts of things.
56:34And the way this works is that it uses
sort of a rebase model where , the
56:39developer, defines their mutations.
56:41And those mutations get run in a specific
order on the server and clients when
56:47they receive updates from the server,
any mutations they have outstanding.
56:50They apply the server updates
and any mutations that are
56:52outstanding, they rebase on top.
56:54And this is like super flexible in that
you can write any, like, if you want to
56:59change merge and conflict behavior, well,
you just write your mutations however
57:03you like to get the behavior you want.
57:05Whereas something like CR-SQLite
or electricsql to get the merge
57:09behavior you want, you have
to pick specific CRDT types.
57:12And it, it seems a little bit
less flexible since you can't
57:15write like a custom mutation.
57:16But ElectricSQL does seem to have a
pretty good end to end story of like,
57:19you set up your Postgres DB, that
schema gets replicated down into a
57:22local SQLite DB, and as you make schema
changes upstream, they flow downstream.
57:27So in terms of like, Having a
good story between two specific
57:32databases it seems pretty good.
57:34And then I guess the
last thing is PowerSync.
57:37So PowerSync, uh, works on a different
model that's not rebase or CRDTs.
57:43So with PowerSync, a client will
accumulate changes, and if a
57:47client has local changes, it won't
take any changes from the server.
57:50So in the rebase model, if a
client has changes, it'll take
57:53changes from the server and then
replay its local changes on top.
57:56Uh, but in the PowerSync model, the client
has local changes, it won't take any
57:59server changes until those local changes
have been incorporated by the server,
58:02and then it can take server changes.
58:04Um, So yeah, to me, I've never built an
app with PowerSync, but this seems like
58:07it'd be a lot higher latency, uh, for
the syncing, because if you have any
58:12outstanding local changes, you simply
cannot take a change from the server.
58:15Whereas rebase model, if you have
outstanding changes, it's fine.
58:18You get the server changes, you
just replay your local changes
58:21on top of the server changes.
58:22That's a really great overview,
and there's many other tools that
58:26more more tools than we have time
to exhaustively can cover here.
58:31But I think that's a really great
orientation, particularly with Asking
58:34yourself the first question, like where
should the authority originate from?
58:38Do you want to have a server?
58:40That's, I guess, more traditional
how web apps work today.
58:43And that can simplify things a lot because
in that server you can enforce if there's
58:48like a merge conflict or if there's
like maybe a user has set some data in
58:54a way That might be not compliant with
the app that you want to build or for
58:59other simplification reasons, et cetera.
59:02So I think picking that,
where does the authority live?
59:04Is there an authority?
59:06And , which role should a server play?
59:09I think that's a really important one.
59:11And then also the two kinds of
like ways to go about syncing or
59:15at least two ways going more about
certain spacing, everything on CRDTs.
59:21Or going with a event rebase
model, our Replicash is working.
59:27I'm curious how someone who has used
neither and might have not really
59:32a lot of intuition for what are the
implications of choosing one or the other.
59:37Can you provide a little bit of an
intuition which kind of app use cases
59:42are a good fit for one or the other?
59:44Yeah.
59:44So with CRDTs, you don't.
59:47Have code on the right path, like
the CRDT algorithm is going to
59:53determine, I guess I should say you
don't have code on the merge path.
59:55The CRDT algorithm is going
to determine how things merge.
59:58And you're just picking a set of CRDT
algorithms where something like Replicash,
1:00:02where you write custom mutators.
1:00:04You can write a set of mutators that run
on the client, and you can write mutators
1:00:08with the same name, same args, but do
something totally different on the server.
1:00:11Which, one it's, since you're writing code
in the mutation, it's giving you control
1:00:16over, kind of, how things merge and sync.
1:00:20And two, since The code doesn't have
to be identical on every node because
1:00:24the server is the authoritative answer.
1:00:26You could do something in the server
mutation like check permissions or,
1:00:30you know, set last modified times
or something from the server's
1:00:33time rather than the client's time.
1:00:35Yeah, but I guess the one downside to that
is a CRDT, like, you can sync messages in
1:00:40any order, like, yeah, if you have message
A, B, and C, if you sync them B, C, A,
1:00:46or C, B, A, like, you always get the same
state, which means that you don't have to
1:00:50have a single server being the authority.
1:00:52You can have every peer be on equal
footing and essentially have no server
1:00:56whereas like this, you know, Replicache
model, you do have to have, Someone
1:01:00that is that authority that's going to
provide the total ordering of messages.
1:01:04And to me, it sounds like at least
the latter, the rebasing model with
1:01:09that authority living in a server, for
example, that is a bit easier to get
1:01:15into less of a radical shift of like
thinking, changing the way, how you
1:01:20think about data modeling, et cetera,
and it's probably the easier one to
1:01:25get into and might be a better catch
all solution for many app use cases.
1:01:30Like if you look at yeah, I think
it's the easier, it's like, it's more
1:01:33familiar and if you have an existing app
that you're trying to add multiplayer
1:01:38to Like, yeah this can work on your
existing data model without much
1:01:41changes or with little to no changes.
1:01:44Whereas, oh, you want to switch to CRDTs,
like you might have to make a new data
1:01:47model specifically for this collaboration
thing and migrate your stuff over.
1:01:51Right.
1:01:51And I mean, the advice I would
give on top of that is that.
1:01:55There are now like so many interesting
and easy to use options out there,
1:02:00whether it's Replicache, whether it's
AutoMerge or others just maybe try
1:02:04building the same mini version of your
app in different technology stacks,
1:02:09see what works best for you, and you'll
develop that intuition just by yourself.
1:02:14And I think that's a
really fun thing to do.
1:02:16Cool.
1:02:17Matt, thank you so much for sharing
about like all those different
1:02:21projects you've been involved with.
1:02:22I'm really excited to see what
you'll be building at Rosicorp.
1:02:27Is there anything else you want
to share with the audience now?
1:02:30Yeah, just stay tuned.
1:02:31We're building the next.
1:02:32Kind of iteration of
Replicache and Reflect.
1:02:34And , it was exciting enough to make
me like drop everything I was doing
1:02:39and join the team and yeah, go back
to, I yeah, work, working for somebody,
1:02:44I guess, rather than just, you know,
following whatever research interests,
1:02:49wherever they took me from day to day.
1:02:50So yeah, it's, I think what they're
building is super compelling and
1:02:53You'll hear more, you know, over
the summer and coming months.
1:02:57That's incredible.
1:02:59I'm really excited to hear more about
what you all have been working on.
1:03:03And yeah, thank you so much
for coming on the show.
1:03:06Yeah.
1:03:06Thanks for having me.
1:03:08Thank you for listening to
the localfirst.fm podcast.
1:03:11If you've enjoyed this episode and haven't
done so already, please subscribe and
1:03:15leave a review wherever you're listening.
1:03:17Please also tell your friends about it.
1:03:18If you think they could be interested
in local-first, if you have feedback,
1:03:22questions or ideas for the podcast,
please get in touch via hello at
1:03:26localfirst.fm or use the feedback form on
our website, special thanks to Expo and
1:03:31Crab Nebula for supporting this podcast.
1:03:34See you next time.