Kill Your Database with Terracotta

Ever since the beginning of my time as a professional software developer, I've felt the whole scheme of persistence with a database is usually a kludge.

The various ORM tools available do make the conversion between objects and data structures much easier. None, are ideal. This is often refered to as the ORM Impedance Mismatch. While abstracting away the database is a lofty (and ideal) goal, the fact that a relational database is underneath the covers will always leak. Joel Spolsky calls it the The Law of Leaky Abstractions.

The simplest form of the disconnect is represented by mapping hierarchal objects to database tables. It absolutely can be done. The result leaves little doubt about the implementation. The amount of effort expended designing the ideal mapping could probably be better expended on a solution to the real problem instead of, the problem created by choosing a solution before examining the problem.

More evidence comes from a recent post on DZone. The author complains about a developer writing code that makes horribly inefficient use of a database. While true, this is only revealed if you know the underlying implementation. From a purely OO view, the code is just fine.

Fundamentals

I believe the fundamental problem with the database solution comes from the fact that it is often slapped on an application by default. "We need persistence." "Well, let's use a database [and the ORM du jour]".

While a RDBMS is a fine (and mature) solution, it is not always optimal. Choosing a solution before giving the domain problem some serious analysis is always a mistake.

The core issue is, we want to be able to preserve and restore state of the certain data structures in an application. Bonus points for transparently sharing the states amongst various machines (for scalability).

Other Attempts to Remove the RDBMS

All of the attempts floating around of trying to build a so called object oriented database provide evidence that I'm not alone in my goal to replace the RDBMS. We get some cool toys, like Apache's CouchDB that change the way we look at the database. Specs, like the JCR (Content Repository for Java) provide alternative methods of storing data that look more like the objects that we truly want to deal with.

All of these methods have one huge drawback, at some point you are mapping some other data format to your objects, whether with property/xml files, metadata (annotations), or just code. Various systems make it easier but, it's always there. It just feels wrong.

Many are just wrappers around a RDBMS. This gets exposed in the way that some queries are exceptionally slow while others are blazing fast. You won't which is which until you understand how the database is being used. This causes code to be modified to use it in the fastest possible way. Abstraction is broken.

Several years ago, I even made an attempt to replace read-only databases with a Lucene search index. It actually worked exceptionally well. Using the Lucene search index to query for data is an order magnitude faster than calling a RDMS. In that particular case, it was greater than 2 orders of magnitude faster but, there were other issues... The concept never really took off. It's hard to break the psychological connection with the database solution, no matter how great the discomfort.

Martin Fowler Joins the Party

Martin Fowler managed to stir things up a bit by pointing out what we've all been thinking all along with his post: Database Thaw.

The Ideal Solution

Wouldn't the ideal solution be where your application just maintains its state?

between restarts
among machines in a cluster

In such a world you don't have acknowledge that a persistence mechanism exists at all. You just write your application code; set fields in objects; treat processes in the various machines in the cluster as if they were threads of execution on a single machine.

Just a Dream?

We are coming to the close of 2008. You'd think by now we'd have a way to share system state among a group of computers. A way to keep that state backed up on a file system to allow seem-less restoration if a reboot is required or a box just crashes.

You should be able to write your application as if it only lived on a single machine that never crashed.

A Solution with Serialization?

What about using serialization to simply persist the state of the application? Or image based persistence, like Smalltalk?

In the days of C/C++, we could get the address of our objects in memory and just write the bytes to disk. It was an easy way to save and restore system state. Java provides a whole serialization API (addresses aren't available for security reasons).

A thread could be created that constantly kept the serialized data file up-to-date with the objects in the application. However, such a solution probably won't scale well across a cluster. Transparency would be lost. Interfaces polluted (things need to implement serializable).

Though simple, serialization probably would not be the best solution but, it would be an interesting experiment.

Shared Memory

The most obvious way to do this would be to set up a background process that kept memory synchronized across a cluster of machines and a file. This would keep the various machines in sync with one another, the files would allow state restoration if a machine crashed (if it couldn't just pull state from a neighbor).

(starting to look like the serialization solution again)

It would also seem that a virtual machine could offer the greatest chance of success to implement a solution. With a virtual machine, it's much easier to make some magic happen behind memory access than with something that has direct access to the memory space.

Solutions

So, what's out there?

Oracle Coherence

Oracle makes a good attempt with their Coherence product.

The problem with this solution lies in its implementation. Sending whole objects across the network can quickly saturate the network (as evidenced by various http session sharing schemes). Coherence also requires interfaces to become polluted a bit by requiring that object implement Serializable (but, this is pretty minor).

For its problems, the Oracle solution could be useful in some cases and may improve as it matures. The risk is, the solution cuts into Oracle's database clustering business. The motivation to improve the project may not be very high.

Terracotta

Terracotta appears to offer everything in the short list of requirements:

syncs across the network
keeps state sync'd with the disk
transparent
fast

Terracotta gives me everything that I asked for and manages it with an optimized transparent solution. Instead of forcing objects to implement serializable or requiring other types of implementation changes, it works transparently under the covers of the virtual machine. It manages to optimize network usage by only sending the diffs of objects instead of whole objects. It even keeps the state synch'd with the file system. Basically, the most transparent persistence system around, right now.

The only caveat, this one is for Java only. Sorry .net guys. So, to exploit it you're gonna be stuck with Java, Haskell, Scala, Groovy, jRuby, Jython, JavaScript or one of the other hundred languages that run on the JVM (isn't there some C# compiler for the JVM out there somewhere?).

The Real Magic

Terracotta doesn't replicate across machines needlessly. It does just enough to provide fail-over protection and the rest is 'on demand'. It can even push unused data off a machine.

Added up, for each machine added to a cluster, the effective the memory for each machine is increased.

When I see something like this I wonder, what I would even need a database for.

The only reason I can see is, to make data available for data-mining and (so called) business intelligence packages (or warehousing, of course). Most of these tools are already designed around a database.

So, the RDBMS effectively becomes a logging mechanism.

Kill the RDBMS

So, by using common collections (sets/lists/maps) transparently backed by Terracotta the RDBMS can be effectively ripped out of an application. The result is cleaner (more maintainable) code, more efficient use of memory, and faster execution times.
What's not to love?

Will the Concept Take Off?

Will the concept of using shared memory take off as a way to get rid of the database and not simply be a means to massively scale?

I hope so. This is the day and age where everyone is embracing simplicity. The proliferation of Ruby on Rails, Grails, Spring, Wicket, and other frameworks show that most developers have had it with over-complex solutions.

Maybe they'll be willing to get rid of one solution altogether.

Maybe, I'm Just Completely Wrong

One role where the RDBMS could be difficult to remove comes from when it serves a greater purpose, like an integration point for multiple applications (as mentioned by Martin Fowler, in his Database Thaw Post).

Fowler actually suggested placing an HTTP wrapper around the database. This converts it from an integration point to an application. I've actually been involved with this type of application and it does have some very powerful attributes (I'll bore you with that in another post some time).

Another area where this probably won't work is data warehousing. But, that would be an excellent application for wrapping it up in a REST layer.

As a common way for various applications, regardless of implementation, to share data, the use of a RDBMS seems hard to beat. My thoughts for a Terracotta solution could work across apps built on languages that can run on the JVM. But, reaching out to others (C/C++/Smalltalk) might be a bit difficult.

When All is Said and Done

I suppose we can't kill all of the databases out there.
OTOH...
We, the designers and builders of applications should take responsibility,
we should analyze the problems we are trying to solve,
we should try to select the best solution for the specific problem at hand.

References

Terracotta
DDJ - Stateful Web Applications that Scale Like Stateless Ones
Thinking about data lifetimes
Martin Fowler's Bliki : Database Thaw
Almost 2 years ago, Taylor Gautier questioned the need for a persistence layer.
Of course, he works for Terracotta ;-)
Shameless capitalism :-)
Here's
The Definitive Guide to Terracotta_
Wikipedia on the Object-Relational Impedance Mismatch
Neo4j is re-thinking the database. By structuring the data as a graph, it looks more like objects.
Old is new again. Re-evaluating the Columnar Database
Gavin King's (The Hibernate Creator) post In defence of the RDBMS
Uncle Bob just wrote a post about developers taking responsibility

Thoughts? Feedback?

I'm a big fan of DZone so, I try to move discussions there (instead of hogging all the visitors to this site). DZone is a great news aggregation site for developers. The discussions there (though short) tend to be of pretty high quality and the members are for the most part a pretty intelligent bunch.

Feel free to leave your thoughts or join the discussion on DZone. You can even use your OpenID there.

by: