XTDB compared to other databases

Jacob O'Bryant | 31 Oct 2023

XTDB, the database that Biff uses by default, is still fairly niche. Why not go with Postgres? Datomic is another relevant option since it has a lot in common with XTDB and has been around longer. I’ve used and like all three databases, but XTDB is my personal preference. I’d like to explain that here, especially for anyone who’s considering using Biff and is wondering if XTDB is a good fit for them.

As always, I’m speaking from the standpoint of a solo developer; my analysis would be different if I were, say, the head of an engineering department. In particular, I’m making the assumption that any differences in query and transaction performance will be unimportant for my apps (with the exception of latency differences, discussed in the next section). If you’ll be deploying something at scale, that’s not an assumption you should make. These databases all have vastly different implementations; better to run some tests and see what the performance actually looks like for your use case.

As another disclosure I should mention that JUXT (the company behind XTDB) sponsors Biff.

XTDB vs. Postgres

The main architectural difference to Postgres is that XTDB is immutable. When you submit a transaction, XTDB puts it into a log of some sort. That log could be stored in Kafka, the local filesystem, Postgres*, … the implementation is pluggable. Then you can have any number of XTDB nodes (JVM processes) which consume the transaction log and use it to create queryable indexes on the node’s local filesystem (typically with RocksDB). These indexes let you query the database at any point in time: transactions that were indexed after the point you specify will be ignored. This architecture comes with several benefits and at least one downside you should be aware of.

\*And yes, this means that using XTDB “instead” of Postgres sometimes technically means using XTDB on top of Postgres.

First of all, since queries are served from local indexes, you get “read replicas”/caches for free. If your app is written in Clojure or another JVM language, it can run in the same process as the XTDB node, which means queries don’t even require a network hop. That makes it easier to write HTTP endpoints with low response times, especially when there’s complex logic involved. I’ve found this particularly helpful in my own work building recommender systems, which are very read heavy.

You also have the option of running your XTDB nodes in dedicated processes and querying them over HTTP. You could still get some latency benefits from this setup if you put your app servers and XTDB nodes close together, e.g. with Fly. (Speaking of Fly, these latency benefits are the same as what you get from distributed SQLite.)

Another benefit of immutability is that it’s easy to recover data without going through a separate backup system, in the same way that it’s easy to recover deleted code from old git commits. Even when you “delete” a record, you can still go back and see all the previous versions of that record and restore it if needed. When you do need to permanently delete data—e.g. when a user wants to delete their account—there is an “evict” operation.

In a similar vein, you also don’t have to worry about what timestamps you might need to store. If your records have a foo column and you later decide you need a foo_added_at column, you can run a migration that inspects each record’s history and backfills foo_added_at for existing records.

The number of times I’ve actually done either of those things is only a few, but in the day-to-day I still benefit from peace of mind/not even having to think about it.

Finally, immutability can be handy for debugging. For example, your HTTP request logs can include the point in time at which the database was being queried within that request, which makes it trivial to rerun the queries and get the same results later on. In Clojure, I’ll often save an incoming HTTP request to a var so I can manipulate it in the REPL; it’s convenient that the request includes a snapshot of the database. Pure functions can take the database snapshot as a parameter and remain pure.

With all that being said, immutability isn’t free. Reads may be trivial to horizontally scale, but your write performance in XTDB will be bottlenecked by how fast transactions can get onto the transaction log and be indexed by the XTDB nodes. It’s effectively a single-writer system. As a solo developer I have no experience with hitting XTDB’s scaling limits, so if you’re worried about that, I’ll refer you to the XTDB team. My impression is that you should be able to get pretty far with Kafka as the transaction log. ¯\_(ツ)_/¯

Bitemporality

This is the whole reason XTDB exists, which makes it somewhat amusing that I haven’t mentioned it until now. Although the “time travel” benefits of immutability (being able to query past snapshots of the database) can be useful for operations, doing time travel in your business logic is fraught with peril. Suffice it to say that bitemporality addresses that problem.

I haven’t yet needed bitemporality in any of my apps. Fortunately, XTDB was designed so that the bitemporality features stay out of the way until you need them. You can use XTDB as a general-purpose database, and if you ever find that you would benefit from bitemporality, it’s there.

Quality of life

Aside from the high-level architecture, I also find the combination of Datalog and pull expressions to be more ergonomic than SQL. I will admit though that the majority of queries I do would be equally compact in either query language.

In addition, XTDB is quite flexible when it comes to data modeling—you don’t need a separate table to model many-to-many relationships, for example. XTDB is also schemaless, or as I think of it, “bring your own schema.” Biff lets you define your schema with Malli and then ensures your transactions conform before passing them to XTDB. I like that I don’t need migrations just for adding new columns.

XTDB vs. Datomic

Much of the previous section also applies to Datomic. It’s an immutable database with support for Datalog queries and graph data modeling, although it isn’t bitemporal.

I used Datomic throughout 2019 but switched to XTDB in 2020 (back when it was still called Crux). I didn’t want to run my apps on AWS anymore (I switched to DigitalOcean), so Datomic Cloud was no longer a good fit, and the $5k/year license fee for Datomic On-Prem was a nonstarter. On top of that, setting up Datomic was less convenient than it was for XTDB, since Datomic required two separate JVM processes (the transactor and a peer).

A lot has changed in the past few years! There are no more licensing fees, and you can run single-process systems with Datomic Local. Datomic is much more feasible now for solo developers, and while I still prefer XTDB, you can’t go wrong with either.

Main factors

In my mind there are three big factors that might point you to one or the other:

If you know that bitemporality will be useful for you, consider XTDB.
If you’re deploying on AWS and/or want tight integration with AWS, consider Datomic Cloud.
If you want first-class SQL support, then wait with bated breath for XTDB 2. I will probably make SQL the default in Biff since lots of people are already familiar with it, and those who prefer Datalog can switch easily.

If you’re ambivalent about those points, then again, they’re both good choices. Take them each for a spin and see what feels best. Below are a few minor factors I’ve thought about myself.

Operation

For a certain class of solo-developed app, XTDB is still slightly more convenient operations-wise (unless you’re already planning to run on AWS, as mentioned above). You can deploy your app to a single VM and then use a managed Postgres instance as the storage backend—a nice setup for apps that are serious enough that you’d be uncomfortable storing all your data on the filesystem, but still early-stage enough that keeping the number of moving parts to a minimum is helpful. Like a part-time business.

You can also use managed Postgres as the backend for Datomic Pro, but besides that you’ll still have to run at least two separate JVM processes, probably on separate VMs: the transactor (which processes new transactions and sends them to the storage backend) and the peer (which contains the query indexes and can run in the same JVM as your application, similar to an XTDB node). XTDB avoids the need for a separate transactor because transactions are submitted directly to the storage backend and all transaction processing is done subsequently on the XTDB nodes.

Transaction lifecycle
=====================
App server --> storage backend --> XTDB node/app server
App server --> transactor --> storage backend
                   |
                   --> Datomic peer/app server

Documents vs. datoms

In Datomic the atomic unit of data is a “datom” (which is like the value for a single row and column, such as “User 12’s email address is hello@example.com.” Transactions are mostly collections of datoms.

{:db/id 1234
 :user/email "hello@example.com" ; <-- that's a datom
 :user/name "bob"                ; <-- there's another one
 :user/age 30}                   ; <-- and another

In XTDB, transactions operate on entire documents (similar to rows), and XTDB provides only a few relatively low-level operations. If you want to update a single attribute in a document, you have to resubmit the entire document.

;; Old document
(def bob {:xt/id 1234
          :user/email "hello@example.com"
          :user/name "bob"
          :user/age 30}

(xt/submit-tx node
  [[:xtdb.api/put (assoc bob :user/email "different@example.com")]])

And if you do that naively, there’s a possibility that concurrent transactions will clobber each other.

;; If we run this at the same time as the submit-tx call
;; above, the email address might end up as
;; hello@example.com instead of different@example.com.
(xt/submit-tx node
  [[:xtdb.api/put (assoc bob :user/age 31)]])

You can fix that by using a combination of match operations and transaction functions. Biff does this already via its submit-tx wrapper; if you’re not using Biff, you’ll need to roll your own. Though I am planning to release Biff’s submit-tx as a standalone library, hopefully soon….

In any case, once you have something in place to handle concurrent XTDB transactions, I haven’t found the datoms-vs-documents difference to be significant. They both get the job done.

Schema

Datomic schema is baked into the database, similar to Postgres and other RDBMSs. To add new schema, you run a transaction. As mentioned above, XTDB leaves schema enforcement to you. So Biff uses Malli for schema definitions, and transactions are checked against the schema before they’re submitted. Similar to the documents-vs-datoms issue above, there’s a bit of extra setup involved if you’re using XTDB (unless you’re using it via Biff), but after that the difference is moot in my experience. I do really like being able to define schema with Malli.

One point possibly worth mentioning: in Biff’s approach, schema lives in your code (i.e. in your apps’ memory), not in the database. The Datomic team views this as unequivocally a bad thing. As a solo developer I’ve found schema-in-code to be more convenient than Datomic’s approach, but for more complex systems I could see myself wanting to store schema in the database—for example, if you have a database that’s accessed by multiple services.

Even with XTDB, you could store your Malli schemas inside the database and query for them whenever you submit a transaction. Though you would still be relying on clients to do their own schema enforcement; XTDB won’t do it at the database level.

The future

Regardless of which route you take, I’m excited to see what happens in the world of immutable, Clojurey databases. Datomic being backed by a large company (Nubank) and XTDB getting first-class SQL and more are both promising developments.