Monday, December 29, 2008

Considering consistency at Amazon

Amazon CTO Werner Vogels posted an copy of his recent ACM Queue article, "Eventually Consistent - Revisited". It is a nice overview of the trade-offs in large scale distributed databases and focuses on availability and consistency.

An extended excerpt:
Database systems of the late '70s ... [tried] to achieve distribution transparency -- that is, to the user of the system it appears as if there is only one system instead of a number of collaborating systems. Many systems during this time took the approach that it was better to fail the complete system than to break this transparency.

In the mid-'90s, with the rise of larger Internet systems ... people began to consider the idea that availability was perhaps the most important property ... but they were struggling with what it should be traded off against. Eric Brewer ... presented the CAP theorem, which states that of three properties of shared-data systems -- data consistency, system availability, and tolerance to network partition -- only two can be achieved at any given time .... Relaxing consistency will allow the system to remain highly available under the partitionable conditions, whereas making consistency a priority means that under certain conditions the system will not be available.

If the system emphasizes consistency, the developer has to deal with the fact that the system may not be available to take, for example, a write ... If the system emphasizes availability, it may always accept the write, but under certain conditions a read will not reflect the result of a recently completed write ... There is a range of applications that can handle slightly stale data, and they are served well under this model.

[In] weak consistency ... The system does not guarantee that subsequent accesses will return the updated value. Eventual consistency ... is a specific form of weak consistency [where] the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value ... The most popular system that implements eventual consistency is DNS (Domain Name System).

[In] read-your-writes [eventual] consistency ... [a] process ... after it has updated a data item, always accesses the updated value ... Session [eventual] consistency ... is a practical version of [read-your-writes consistency] ... where ... as long as [a] session exists, the system guarantees read-your-writes consistency. If the session terminates because of a certain failure scenario, a new session needs to be created and the guarantees do not overlap the sessions.
As Werner points out, session consistency is good enough for many web applications. When I make a change to the database, I should see it on subsequent reads, but anyone else who looks often does not need to see the latest value right away. And most apps are happy if this promise is violated in rare cases as long as we acknowledge it explicitly by terminating the session; that way, the app can establish a new session and either decide to wait for eventual consistency of any past written data or take the risk of a consistency violation.

Session consistency also has the advantage of being easy to implement. As long as a client reads and writes from the same replica in the cluster for the duration of the session, you have session consistency. In the event that node goes down, you terminate the session and force the client to start a new session on a replica that is up.

Werner did not talk about it, but some implementations of session consistency can cause headaches if a lot of clients doing updates to the same data where they care what the previous values were. The simplest example is a counter where two clients with sessions on different replicas both try to increment a value i and end up with i+1 in the database rather than i+2. However, there are ways to deal with this kind of data. For example, just for the data that needs it, we can use multiversioning while sending writes to all replicas or forcing all read-write sessions to the same replica. Moreover, a surprising vast amount of application data does not have this issue because there is only one writer, there are only inserts and deletes not updates, or the updates do not depend on previous values.

Please see also Werner's older post, "Amazon's Dynamo", which, in the full version of their SOSP 2007 paper at the bottom of his post, describes the data storage system that apparently is behind Amazon S3 and Amazon's shopping cart.

1 comment:

Anonymous said...

Regarding "session consistency", I suggest the following paper:
Khuzaima Daudjee, Kenneth Salem: Lazy Database Replication with Ordering Guarantees. ICDE 2004: 424-435.
It proposes, defines and formalizes strong session serializability, which has been called "session consistency" by some. It's weaker counterpart is described in Khuzaima Daudjee, Kenneth Salem: Lazy Database Replication with Snapshot Isolation. VLDB 2006: 715-726.