Expert Session on Persistency Framework: Toplink americas cup win 2682133k1

Expert Session on Persistency Framework: Toplink

Yesterday we had an excellent meeting on Java persistency frameworks, in particular Toplink. Oracle Consultant and Java expert Peter Ebell presented on his experiences with Toplink on a very complex (over 50 developers) J2EE project. He answered many, some very critical questions from the audience that consisted of some 25 Dutch Java/J2EE consultants with largely a strong background in relational databases. The session included a Chinese dinnerbuffet and a handson-workshop. In this post I try to describe the most interesting and remarkable statements from Peter and the audience.

The presentation slides can be downloaded here.
In a series of expert meetings, the so called AMIS Query sessions, a group of experts on Java/J2EE and Oracle technology gathers regularly at AMIS, to discuss hot topics in the areas of Java/J2EE and Oracle development. This session was devoted to persistency frameworks in general and Toplink in particular.

Peter made it very clear from the beginning that he talked from practical experience, not necessary a very strong theoretical background. He also stressed not to be from product marketing, i.e. he would tell the things that are really useful for a project – not necessarily all marketing features – and talk an honest story as tell – not the marketing b.s. one might fear to hear. One important statement: he does not have extensive EJB knowledge or experience; from his many Java/J2EE projects, he never really encountered the need for the benefits nor the complexity introduced by EJBs, a feeling replicated by many in the audience.

He did a quick poll to find about the audience’s experience with persistency frameworks. The score was roughly as follows:

  • 12 Oracle Business Components for Java (BC4J)
  • 4 Toplink
  • 2 OJB
  • 5 Hibernate – by far the most popular open source persistency framework
  • 1 EJB
  • 3 Castor JDO
  • most also had straight JDBC experience; many had worked with JHeadstart as well

To follow the line of the remarks below it might be useful to open the presentation whose link is given above. These remarks were noted as Peter went through the presentation.

Toplink can be used against a large number of different Relational Databases (more precisely: against JDBC drivers for a number of different databases). It can also map XML datasources to OO models. Proof of that latter pudding is Toplink itself: the XML files that Toplink uses to store meta-data are managed by Toplink through a Toplink mapping to the OO model used by the Toplink IDE.

One question from the audience: can Toplink be used against Object Databases? The answer: no.

It has a Visual Mapping Workbench – both as stand-alone tool and integrated into Oracle 10g JDeveloper. Peter was convinced that both stand-alone Mapping Workbench and Toplink support within JDeveloper were equally important; they have the same functionality and Peter does not detect any preference within the product development organization for either of them. It was remarked that while Hibernate does not have a visual mapping tool of its own, Middle Gen (see for example Hibernate and MiddleGen and Middle Gen Homepage) offers a GUI that can be used to design and generate Hibernate mapping files (as well as JDO, Torque, EJB 2.0).

Toplink is often linked with J2EE (Web) Applications but can also be used with Java Client applications (two-tier mode) or Java back-end applications with no obvious Web-tier, GUI or Client Side.

An interesting observation from Peter when it was suggested that the flexibility Toplink offers in marrying OO designs to Relational Database Designs would probably have a performance penalty: “Because virtually any class model can be mapped by Toplink to almost any Table Design, you can optimize your table design for performance without sacrificing the Business Domain Model of the OO design. That in fact probably means performance can be better than in most situations where either table design is less than optimal to achieve a good fit with the OO design or where the mapping itself causes performance degradation because of the mismatch between Table Design and OO model.

Peter stressed how Toplink is NOT intrusive on the Object Model. Toplink imposes no constraints or requirements on the domain classes. It is one of Toplinks core ambitions to allow the domain classes to be modelled and designed without regard to the persistency; that is taken care of by the Toplink framework, based on meta mapping data that is stored externally to the domain model – in the so called deployment desciptor file – an XML file with descriptor (mapping) definitions; for each domain class a descriptor tthat refers to one or more tables. He immediately indicated that there is one important exception to this rule: in order to implement indirection (“lazy loading” which means basically that when an object is instantiated from a database record, the associations this object has with other objects do not instantly require instantiation of the entire object graph: objects at the other end of associations are only instantiated when referenced. This incredibly important feature is only enabled when such references and collections are implemented using Toplink specific classes. It seems a small price to pay for a very important piece of functionality.

We wondered how Toplink and other tools at runtime seem to be able to get access to private stuff in our classes such as private members. May be this post by Brian Duff on a completely unrelated matter provides a clue.

One common observation, not just for Toplink but for most other Persistency Frameworks as well, is that a coordinated team effort of working on the mapping files is quite hard. The XML files used for storing meta-data about mappings quickly become bottleneck-resources and dependencies between these files can make it even harder to successfully work on them simultaneously. Toplink files at least seem to support merge operations (something certainly not supported for similar files for BC4J). Peter suggested that the best solution, if practical, was to have only one person manage the mapping-details (at least only one person at the same time and preferably only one person at all).

Toplink runtime either uses an XML files with Deployment Descriptors or a Java Class with the same information. Peter could not find out how to generate the latter from the JDeveloper plugin for Toplink though. He had no preference for either Java Class or XML file, they have the same functionality and there seems to be no performance difference either. The single advantage for the Java Class implementation of the Deployment Descriptors seems to be that it provides a good example of how you can program at runtime against the Toplink API. One interesting feature of Toplink is that Descriptors (mapping definitions) can be manipulated at runtime; you can read and inspect, create descriptors and change them etc. programmatically. One area where that is frequently done is where the Visual Mapping Tool falls short and lacks support for certain features that are available in the API (properties that can be set on descriptors in the API but that are not yet supported by the Visual tool).

For relational database developers, it was interesting to see how Toplink supports a many-to-many relation, something that in relational databases requires an intersection table.

Peter was very enthusiastic about the Toplink Query Language. Even though Toplink has support for EJB QL, JDO QL and also allows direct SQL, Peter was adamant that the Toplink QL was by far the best. Oracle seems to dissuade people from using JDO with Toplink – there is partial support and they do not advocate using it. EJB QL expressions are interpreted and translated into Toplink QL expressions at runtime.

Peter demonstrated a number of queries that were constructed in a very elegant way using Toplink query expressions, very readable, even though the underlying SQL was quite complex and contained a series of joins. The Toplink Query Expressions are pure OO; no reference to database tables, columns or functions is used. It is also very simple to build various independent query expressions that are subsequently combined in a single query. By using Query Parameters is is also easy to reuse Query Expressions. Peter, who has a strong background in SQL programming himself, was very pleased with the SQL as produced by Toplink. It is smart code, for example it has the intelligence to recognize that when two query expressions reference the same table combination, it can reuse the SQL-joins for those tables and does not blindly include two sets of table joins. It even seems possible to leverage user defined SQL functions through the Toplink Query Expressions. The SQL produced by Toplink is largely Standard SQL, it can be made only slightly database aware by setting a ‘native database sql’ property. That also implies that Toplink does not benefit from vendor-specific SQL Functions such as the Analytical SQL Functions in the Oracle RDBMS. See also documentation on Building Queries.

You cannot specify query conditions on non-mapped Object properties. A typical query expressions looks something like:
custExpression.anyOf(“orders��?).anyOf(“orderItems��?).get(“product��?).get("productName").equal(“Widget)
The references to orders, orderItems, product and productName are all references to Mapped Object Properties as defined in the Descriptor for the current (Reference)Class. You can not query on getter-methods that are not linked to mapped properties. That means for example you can not query by derived values if that derivation is done in the domain class and not as part of an underlying database view. Note that the references in the query expressions are not type safe; they are just Strings that at runtime are compared with the descriptor definition for the current class. If the names of the properties in the descriptor definition change, the query expressions become unusable, but that cannot be detected by the compiler.

Various approaches were suggested by the audience, from defining Constants in the reference class for all it’s properties and referring to constants in Query Expressions rather than hardcoded strings to generation of Query Repositories and automated tests against all Queries in those Repositories.

Peter demonstrated the concept of Named Query Repositories his project had worked out: special classes containing definitions of reusable queries were used to extend at runtime the Descriptors with a set of Named Queries. The application, instead of reconstructing QueryExpressions, could simply perform such predefined queries, only having to provide values for the QueryParameters.

They also made use of the Query Redirector feature: by setting a QueryRedirector on a Query, Toplink hands control to a user defined class, just prior to executing the query. At that point, the query can be manipulated in various ways. Peter used the Query Redirectors primarily for adding QueryExpressions to the Query for those optional query parameters for which a value had been specified; this eliminated a potentially long list of QueryExpressions otherwise included for search criteria that may never be defined.

Apart from the Indirection (lazy loading) discussed before, Toplink has several others ways to maximize performance; for example: it performs minimal DML – only those columns that are required to be updated are actually included in the UPDATE statement sent to the database (rather than a full update of all columns). It is also possible to perform partial reading: not all attributes of an object are populated when an object is instantiated.

Key to the Toplink Architecture is its Shared Object Cache, also called the Identity Map. All data that is queried by Toplink through the mapping definitions from an underlying persisten datastore (i.e. RDBMS) are instantiated as objects in this Cache. Toplink guarantees that the same data will only live once, in a single object, in the cache. This object is reused by all sessions in the JVM. Note that this is completely opposite to the per-user cache employed by for example Oracle Business Components for Java framework. There is support for Clustering; each JVM will contain its own Cache and Toplink asynchroneously synchronizes the caches.

The Central Shared Cache turns out to be at odds with the Oracle RDBMS VPD (Virtual Private Database) or FGAC feature. VPD limits the number of rows visible to a user based on policy-functions. Howver, Toplink is not aware of these policies: if User A has queried a Department with all its Employees into the Central Cache, asking the Department object for its collection of Employees returns the collection of all Employees User A retrieved from the database. When User B asks Toplink to query the same Department, User B gets a handle to the same Department Object in the cache. When User B asks the Department object for its Employees collection, he receives the same collection that was returned to User A. However, that collection returned to User B includes the set of Employees visible under VPD restrictions to User A! Oracle apparently is working on a workaround, that probably involves some sort of per-user cache (perhaps on top of the central cache).

Toplink manages transactions, if needed in conjunction with a JTS, using the Unit of Work. A UoW is requested from Toplink, objects that need to be changed (or created/deleted) are registered with the UoW. Toplink returns a clone of the object in the Cache, as working copy for the application to act on. When all changes are made, the UoW is committed and Toplink calculates the required DML to perform on the persistent datastore to bring the changes about that were made on the Working Object Clones, then executes the DML and only after a successful commit is complete in the database will it update the base objects in the Cache. There are several ways in which to configure the Cache, but most of them only were relevant for JDK prior to 1.3.

Due to time constraints Peter had to skip the section on Business Rules in his presentation. He stressed that Toplink expects you, the developer, to take care of the integrity of the business model. It does not offer specific support for enforcing business logic, even logic that spans objects – for example aggregation related rules – is your very own responsibility. Toplink offers an event mechanism that allow you to register methods to be called on specific events in the transaction cycle (prior to or just after commit for example). Logic to enforce business rules can be performed during those events. However, during these events, you can no longer change the contents of the transaction, so you can not perform additional derivations or execute so called Change Event With DML rules. Peter demonstrates in his presentation how his current project developed a framework for implementing Business Rules using Toplink.

Toplink can work together with CDM RuleFrame in a sense that Toplink is aware of database exceptions and there probably is a way to read the RuleFrame message stack. However, any DML actions performed by CDM RuleFrame or indeed any trigger in the database is not picked up by Toplink; it requires an explicit refresh of the objects in the cache.

He concluded with a list of tips & tricks (see presentation for details). Among the issues: bi-directionality (automatic maintenance by Toplink of the other end of a relation when one end is manipulated, part of the EJB 2.0 specs) sounds like a very good idea, but its implementation seems unstable and support for this feature from the development team seems far from secure. This basically means that you have to implement control over both ends of relations in the Domain Model.

He warned for setCollection(Collection coll) methods that when implemented as: this.coll = coll; might override the IndirectList used instead of a specific Collection class to realize lazy loading. The approach he took was to implement such methods with:
// Be careful not to overwrite the Indirectlist
Public void setUsers(Collection newUsers){
users.removeAll();
users.addAll(newUsers)
}

Although support for Scrollable Cursors (Queries whose results are fetched in batches, for example to subsequently populate pages with 1-10/2000, 11-20/2000 etc.) apparently is available in Toplink, Peter suggested – inspired by members of the Toplink development team – using a ReportQuery that returns only the primary keys of the objects and the when required instantiating the individual objects using a key based query.

Summary of Conclusions

  • Toplink is not intrusive on your class model (your domain classes) except for use of indirection (lazy loading)
  • Toplink is quite intrusive on the application logic on transactions: developers need to be very aware of which objects to change and how to register objects to be changed with Toplink’s unit of work; dealing with Toplink transactions and clones/working copies can be quite cumbersome (it is the price to pay for the shared object cache)
  • Toplink is completely unaware of any changes made in data in the database; there is no automatic refresh or feature such as refresh after insert or update to automatically pick up changes in the data from trigger actions inside the RDBMS
  • Toplink has support for most relational databases and also XML datastores. It seems to develop a slight preference for the Oracle RDBMS in the sense that new Oracle database features stand a better chance of being supported by Toplink than other RDBMS’s new features. Having said that, Toplink primarily relies on standard SQL and fails to make use of more recent SQL enhancements such as for example Oracle’s Analytical SQL Functions.
  • Toplink has an issue with working with Oracle’s VPD functionality; a workaround is or will become available
  • Toplink is best suited for shops where the Object Model grows more or less in parallel with the Database Design (or even where the Class Model leads the way). Compared to Oracle’s other persistency framework, BC4J, which is much more data centric, database aware and much more intrusive on the object model. Peter suggested that within 15 minutes he can tell which from the two – Toplink and BC4J – is best suited for a development organisation.
  • Peter believes Toplink and BC4J are really complementary offerings, with only minor overlap. They serve different purposes. Looking back he would have used Toplink on some of the projects he did in the past using BC4J given the choice today; at the same time he would still use BC4J on many others.
  • Documentation on Toplink is a tragedy. There does not seem to be a good standard work – or any book at all for that matter – covering Toplink. The discussion forum, monitored by product development, is very useful.

Resources:
Connect Java and databases more quickly and easily using Oracle9iAS TopLink. – Donald Smith, Oracle Magazine (March 2003)
Toplink Documentation: Getting Started Guide, Mapping Workbench User’s Guide , Application Developer’s Guide
Toplink Home Page
Toplink Demos

9 Comments

  1. Samba Siva Rao January 24, 2007
  2. Lucas Jellema January 9, 2007
  3. Samba Siva Rao January 8, 2007
  4. brebs May 15, 2006
  5. Manohar April 7, 2006
  6. Jay September 17, 2004
  7. the_mindstorm August 17, 2004
  8. Leon van Tegelen August 16, 2004
  9. Leon van Tegelen August 16, 2004
  10. Pingback: Individualism August 14, 2004
  11. Pingback: Andrej Koelewijn » TopLink Expert Session August 16, 2004