Tonight Lucas, Wouter and myself went to a very interesting session at JavaOne, to hear how two of my favourite Java technologies, EclipseLink and Coherence, can work together to create the ultimate scalable JPA implementation. And if the technologies weren’t enough to make this a must-see session, then the speakers would have made it so: the session was presented by Doug Clarke, principal product manager for Oracle TopLink and co-lead of the Eclipse Java Persistence Platform, and Mike Keith, co-specification lead for EJB 3.0 (JPA).
I worked extensively with Toplink (back when Oracle just purchased it, before it was donated to the open source community as EclipseLink) on a large-scale implementation at the Port of Rotterdam, and learned to love it for its out-of-the-box persistence features, but especially for the powerful optimalization and tuning capabilities in the API that we needed to achieve the very high transaction throughput that was required. My “love affair” with Coherence started at OpenWorld 2007 (when Oracle had just aquired it from Tangosol), and is based on not much more than a memorable presentation by Brian Oliver followed by a few experiments on my laptop. Not really profound, I admit, but it was love at first sight. If you don’t know what Coherence is, think ultra-reliable, endlessly scalable, in memory enterprise datagrid, with a programming API that is as complex as that of java.util.Map. Truly amazing stuff!
After learning what Coherence was and what it could do, my first thoughts were: “this would be great to use as a distributed cache for Toplink!” (immediately followed by “this would make a great, ultrafast dehydration store for Oracle BPEL). One of Toplinks most compelling features is a flexible and highly configurable shared cache, that really rocks when running in a single JVM, and for which Toplink offers automatic, asynchronous distribution of changes when running in a clustered environment where each node has its own cache. In most scenarios this would be an adequate “distributed cache” solution, but at the beforementioned project we ran into “race conditions” where a change made to data in one node would (also asynchronously) trigger some processing logic that could run on any node, without a way to ensure that the data changes that triggered that logic had already reached the node on which it was running. Since submitting a (set of) Object(s) to a Coherence datagrid is an atomic, synchronous operation, this problem (and its incredibly complex solution/workaround) could have been avoided.
I was therefore happy to see that Doug and Mike were presenting on exactly this kind of solution: EclipseLink as JPA provider with Coherence acting as a distributed L2 Cache. They outlined a number of possible configurations, with the responsibility of reading and writing data from and to the database being placed with EclipseLink, with Coherence, or both. Also interesting was that they leverage not just the basic “store” and “retrieve” functionality of Coherence, but also much more advanced features such as parallel execution of queries against the content of the Coherence cache. As Doug put it when we met him after the session: “Coherence offers so many powerful features that it would almost be an insult if we only use the basic “get” and “put” operations”. He mentioned he had already written logic that converts a JPA Query to a Coherence Filter, which is like a query/whereclause for searching Objects in a Coherence datagrid.
At the sessions’ summary, Doug warned everyone not to see this architecture as a “performance booster”, but as a scalability solution. The final remark was that “next year, this would be a much longer presentation”. Personally, I can’t wait! And maybe next year there will also be a presentation on Coherence as BPEL dehydration store. Just so that I can say “I thought of that two years ago, and already blogged about it last year”.. Just kidding 🙂