Introduction
In the last post, we have discussed what is an isolation guarantee of database transactions and different levels of isolation. The highest isolation level we discussed last time is called Serializability, where transactions are executed concurrently but in a way that is the outcome (e.g. the resulting database state) is equivalent to as if they are executed serially. We have mentioned 6 anomalies. None of these anomalies will be possible if the database system provides Serializability guarantee.
However, the discussion is mostly limited to non-replicated database systems. In this post, we will extend our discussion to the replicated or distributed database systems, where data items are replicated into different database nodes.
One-Copy Serializability
Today, many replicated database systems are implemented in a way that data is updated at the nearest database node and replicated to the remote database node asynchronous, for better performance.
For example, let’s say a website owner decided to record the number of visits of their website using a replicated database. The replicated database has 2 nodes. Node 1 is in the EU data center, Node 2 is in the US data center, and the database table row X recorded the value.
Initially, let’s say the value is 100 in both Node1 and Node2.
Then let’s say, at the same time, there are new visits to the website from the EU as well as the US. Transaction T1 is executed from the EU node to increase the number of visits and transaction T2 is executed from the US node to increase the number by 1 as well.
If T1 and T2 are executed concurrently before the data gets replicated between the two nodes, both T1 and T2 will update the value from 100 to 101, and write the new value to their local nodes. Now the new value is 101, however, the expectation is 102 since there are actually 2 more new visits in total.
What is unique here (but not possible in a single node database system) is that we need to replicate data and the replication is done asynchronously from the transaction execution. The other transaction may still see the original value although a new value is already committed in another replica.
In fact, the same problem is also possible if T1 and T2 are executed serially one after another in real time. It will happen if the asynchronous replication logic between the two nodes is done after T1 and T2 have executed.
Replicated database systems provide many advantages — high scalability, high availability, high reliability and better read throughput. However, human programmers are not good at dealing with the complexity of replicated data. Would it be nice if the database system can provide an illusion that there’s only a single logical data item even though the underlying data is replicated to multiple places.
To provide such an illusion, we extend the Serializability guarantee further with the concept of One-Copy Serializability for replicated database systems. Recall that with the Serializability guarantee, concurrent transactions are executed in a way that is equivalent to that they are executed in some serial order. We extend the idea further by stating that any reads of a data record must see the most recent write from previous transactions in that (equivalent) serial order.
Is it obscure? Let’s exemplify it with our last examples: whichever transaction (T1 or T2) is executed after the other one (in the equivalent serial order), when it reads from the replicated database, the database should guarantee that it must see the write of the first transaction.
In a simple word, it guarantees that from the outside of the system, it looks like that there’s only one copy of such data — any writes of previous transactions is seen by any reads of any following transactions. Note the “previous” and the “following” is defined by the “equivalent” serial order of the concurrent transactions, where the database system still has the full control to define which transaction happens before, and which transactions happen after.
As you can imagine that the most naive implementation of One-Copy Serializability could be that the replication database system executes one transaction at a time, and replicates the data synchronously across all nodes before executing the next transaction. However, other replicated database system implementations can also claim that they provide One-Copy Serializability guarantee as long as they provide the “a-single-copy-of-data” illusion and the outcome is equivalent to some serial order.
Anomalies in replicated database systems
In the last section, we described what is One-Copy Serializability, it extends the Serializability guarantee with an illusion that all replicated data items are behaved as a single copy of data. A replicated database system that guarantees One-Copy Serializability will not be vulnerable to any of these anomalies that we mentioned in the previous post. However, there are some other anomalies that are possible in replicated database systems.
Serializability only guarantees that the outcome of transactions (could be concurrent or not concurrent with each other) are executed in a way that is equivalent to some serial order. However, it doesn’t put any restriction on what the serial order is.
In an extreme case, for example, if transaction T1 starts and then completes, and after some period of time, transaction T2 starts and then completes. A database system that provides the Serializability guarantee will execute T1 and T2 in some serial order, and most likely in the order that T1 is executed before T2 — which is aligned with what happened in real time. However, the Serializability guarantee doesn’t disallow other equivalent serial orders — for example, T2 executes before T1.
Neither Serializability in non-replicated database systems nor One-Copy Serializability prevent this — that T1 happens before T2 in real time but is scheduled in a way that T1 is executed after T2. In a sense, T2 “time traveled” back to a point before T1 happened.
Such “time travel” is very unlikely in a single node database system. In fact it is very different or not even possible to implement such a system to do “time travel”. However, “time travel” is not negligible any more in a replicated database system. And in fact, some replicated database system allows some form of time travels. To understand why, let’s review some “time travel” anomalies.
The immortal write
It happens when a write transaction “time traveled” back before an earlier write transaction.
For example, one day a user Tim decides that he wants to change his email address to tim@outlook.com in a social media website (transaction T1). Later, he decides to change his email address again from tim@outlook.com to tim@gmail.com (transaction T2). He went to his profile page and made the change successfully. However, when he refreshes the page, he found the email address is still tim@outlook.com (transaction T3). He refreshes the page multiple times and update the profile again and again, but the page keeps showing tim@outlook.com
In this case, T2 “time traveled” to a point that is before T1. Although in real time T1 happens before T2, the system decided on the serial order that is equivalent to that T2 happened before T1, which is still a valid equivalent serial order that doesn’t violate the Serializability guarantee.
The immortal write anomaly is possible in replicated database systems where writes can happen from multiple replicas, and are replicated asynchronously. In this case, the write-write conflict is resolved in a way that some new (in term of real time) transactions are somehow placed to a point that is before some older transactions in the equivalent serial order
The stale read
The stale read anomaly is very similar to the immoral write anomaly. It happens when a read transaction “time traveled” back before an earlier write transaction.
Take the same example, in transaction T2, Tim updates his email address from tim@outlook.com to tim@gmail.com. Afterwards, in transaction T3, Tim reads his email address, but he still gets tim@outlook.com.
In this case, T3 “time-traveled” to a point that is before T2 happens. Again, although in real time T2 happens before T3, the system decided on a serial order that is equivalent to that T3 happened before T2, which doesn’t violate the Serializability guarantee.
In a non-replicated database system, it is trivial to implement a system that a read resolves to the most recent write of the data. However, in the replicated database systems, it has a performance penalty to replicate data synchronously considering network latency, so that many of them replicate data asynchronously, and read data from different asynchronous replicas. Note One-Copy Serializability is not violated as long as they do it in a way that data is visible in the same order as an equivalent serial order. In some of such serial orders, the read transactions (e.g. T3) are arranged to be placed before the write transactions (e.g. T2), although in real time it is not the case.
The causal reverse
It happens when a transaction that is caused by an earlier transaction, somehow “time traveled” back to a point that is before the earlier transaction.
For example, say that in transaction T1, Tim booked a flight ticket of $500, and the system created a new order for that. Later in transaction T2, Tim cancelled the flight ticket he just booked and the system created a cancellation order for that. In this case, T1 causes T2, since T2 cannot happen if T1 didn’t happen in the first place.
If there’s another transaction T3 that reads all orders that Tim has, depending on when T3 is executed, T3 may see (1) no orders, (2) only the booking order or (3) both the booking order and the cancellation order. However, if the T2 are time-traveled before T1, and T3 is arranged between T2 and T1 in the equivalent serial order, T3 will then only see the cancellation order but not the original booking order.
Although T1 happens before T2 in real time, the system is free to decide on a serial order that transaction T2 happens before T1 without violating the One-Copy Serializability guarantee. (assuming the application logic of creating the cancellation order does not read and verify the original order).
In fact, the causal reverse is possible in CockroachDB. In CockroachDB, data is divided into different partitions and the data write transactions on the same partition are committed and replicated synchronously. Writes are assigned to timestamps from one of the local servers of that partition. However, clocks in different machines are not consistent with each other. Although CockroachDB defined a max value of clock skew it allows across different servers in the same cluster, it doesn’t wait until the max clock skew to pass before committing a transaction. If two data write transactions are on different partitions, it is possible that the latter transaction is assigned with a timestamp that is before the earlier transaction in real time, which makes the “time travel” possible.
Isolation Levels above Serializability
As we can see from the last section, “time travel” anomalies are possible even if the database system provides One-Copy Serializability guarantee. Application developers need to be aware of those anomalies and make sure necessary waiting or check is in place to avoid application bugs that are introduced by “time travel” anomalies.
Another option is to use a replicated database system that provides a higher isolation level than One-Copy Serializability.
In fact, there is another higher isolation level — Strict Serializability — where none of those anomalies are possible. It provides all guarantees that One-Copy Serializability provides. In addition to that, it puts the real time constraints that if a transaction X is executed before another transaction Y in real time, transaction X will be placed before transaction Y in the equivalent serial order.
The Strict Serializability guarantee is available in replicated database systems such as FaunaDB/Calvin, FoundationDB, and Spanner. Note again that for non-replicated (single server) database systems it is trivial to provide the real time guarantee, therefore, the Strict Serializability level is equivalent to the Serializability level in that case.
One-copy Serializability is vulnerable to all three “time-travel” anomalies while Strict Serializability guarantees that none of the three anomalies are possible. Between them, there are some other isolation levels as well.
Firstly, Strong Session Serializability, which guarantees Strict Serializability of transactions within the same session, otherwise One-Copy Serializability. It is implemented in a way that within the same session, the transactions are executed in the real-time order.
Secondly, Strong Write Serializability, which guarantees Strict Serializability for all write transactions, and guarantees One-Copy Serializability for read transactions. It is often implemented in a way that writes can only happen from the master replica, and then are asynchronously replicated to all replicas by the order of the writes. Read can happen from all replicas so that some may receive stale data. Therefore, it is vulnerable only to the stale read anomaly but not the other two.
Lastly, Strong Partition Serializability, where data is divided into different partitions. It provides Strict Serializability for data within the same partition, but only provides One-Copy Serializability for data that is in different partitions. It is often implemented in a way that writes within the same partition are synchronously replicated to all replicas, but no such coordination for data of different partitions. CockroachDB, as we mentioned previously, falls into this category. Since immortal write and stale read anomalies will only occur for read and write the same data, which must be in the same partition, these two anomalies will not be possible under this isolation level.
As an application developer, what can I learn from this?
Sometimes, some guarantees seem obvious to you when the system is running at a single node server. However, when the system is distributed to multiple nodes. They will not be trivial any more. “real time” guarantee is one of the good examples as we mentioned in this post. And it is the fundamental problem we want to solve by introducing the Strict Serializability guarantee.
In this post, we learnt that, in a replicated database system, data is replicated to multiple replicas. We learnt One-Copy Serializability guarantee which provides the illusion for developers to view replicated data as a single copy of logic data.
We learnt that “real time” becomes a non-trivial guarantee in a replicated system, which needs to be considered and handled with cautions.
We learnt three “time travel” anomalies that are possible and the most strict isolation guarantee — Strict Serializability — that prevents all “time travel” anomalies. And three new flavors of isolation levels which can provide options between One-Copy Serializability and Strict Serializability.
What is next
In the next post, we will review another database transaction property — Consistency — which people are often confused with Isolation guarantee. After understanding what is the consistency, we will review and compare the difference between consistency and isolation. Stay tuned.