Java Database access

I just finished my yearly evaluation run of new techniques to quickly build web applications in Java. And while doing that I also tried the two mainstream solutions for accessing the database: Hibernate and JPA. And I am starting to dislike them both. A huge problem with these frameworks is that they abstract things that cannot be properly abstracted. And when using the framework lots of problems occur that have to do with the real world leaking thru the abstraction, leaving it useless. In many cases the frameworks work only if the database is designed having the limitations of the framework in mind. But those limitations are often only learned thru experience.

JPA and Hibernate inheritance sucks

One example of bad or leaking abstractions is the whole inheritance thing. Take the classes Figure, Circle and Rectangle where Circle and Rectangle are specializations of Figure. Now put these classes in the database using an arc: the base class Figure and it's data goes in the FIGURES table; the Circle class goes to CIRCLES and the Rectangle class goes to RECTANGLES.

Now we add another class Ref which has a reference to a Figure (which can be a Circle or a Rectangle). When we load an instance (or a list of instances) of this class from the database these frameworks are in trouble. To have at lest reasonable performance they can often only read the actual class you're loading, i.e. the engine does a select on the REFS table and initializes an instance for each row read.

When the "figure" property of the instance is to be filled we have a problem. To fill it we would have to do another database access (a select on Figure) to retrieve the value before we can (1) determine the actual type and (2) create the proper instance of Circle or Rectangle. However this extra database access gets very expensive rendering the framework unusable very quickly, so they code around it by using lazy proxies. A lazy proxy is a class which gets "augmented" by the framework (usually by bytecode manipulation or by using a Dynamic Proxy) to intercept all access to it's properties. This proxy would be the "same" as the Figure class but would only have the ID field filled in (that field was known from the FK relation). As long as we don't need the value, or as long as we only need the id of the Figure we're OK. But as soon as we access a property from the proxy the call will be intercepted by the proxy and it will do a lazy load: it will issue a select on the Circle table to retrieve the info, and from that point on the proxy will have all data.

This is all fine and dandy for a normal relation. But it goes horribly wrong when inheritance of the above kind is used. The reason is the proxy. When it is created the actual type of the Figure is unknown and cannot be read (that is the whole point of having the proxy in the first place). So what type should the proxy be? A Circle? A Rectangle?? It turns out that Hibernate always creates a proxy which is a "xxx extends Figure" which is clearly no Circle nor a Rectangle. And once the proxy is created it cannot be replaced anymore so when data is read the resulting field behaves as a "Figure" and not as a Circle even if it is one. It has no Circle fields, nor is instanceof Circle true (of course).

The horrible thing is that this is not mentioned in the docs. What's even worse is that Hibernate lets you specify this atrocity without even warning you ("This won't work"). And working this out is quite disastrous for many people.

Problems, problems..

Another problem with Hibernate is the level of abstraction. Lots of things happen below in the engine, and when something goes wrong you enter Hibernate Hell. Often when something goes wrong you get some arcane Exception. Because of the Java Checked Exception idiocy you first have to decypher what actually went wrong and where it went wrong. This can be hard because the helpful JDK of course often hides details of nested exceptions showing only the details of the last place where the thing was wrapped for the nth time.

Once you have such an exception you're often forced to debug the Hibernate source. Luckily there is source for it; I would have given up on it way earlier if no source was available. For your own piece of mind I would advise to only use frameworks that come with source.

These frameworks work exceptionally bad when the database you access uses suck things as triggers. Of course no one in it's right mind would use triggers in a database but many legacy databases do. When they fire they often change fields in records. Hibernate at least has some stuff in place where you can handle the case when a trigger changes fields in the record just stored (although it is buggy as hell and the cause of many debugging sessions). But when a trigger changes a related record which happens to be in the Hibernate session you're in deep trouble. The only way to fix things then is to really know how Hibernate will write your changes and adding arcane invocations of stuff like session.flush() and session.refresh() in a carefully orchestrated order.

Lousy support for long transaction or dialogue

What continues to amaze me is the utter craving for pain in large parts of the Java Standards world. One of the most prevalent ways to use a database is:

  • Show a screen displaying the current customer and whatever 1->n or n <-> m relation
  • Wait for the dude to add, edit, delete stuff on that page, possible in many HTTP requests
  • Allow him to save or cancel his work, ending the dialogue.

You would imagine that this simple access pattern would be the easiest to implement because it is used to often. You would maintain the state of the data in memory until either the save or the cancel; only at that point would changes be saved to the database in a single transaction (or would memory be discarded on a cancel). This would use optimistic locking or field change detection at commit time to prevent or at least detect concurrent changes.

The only way this works with these frameworks is if you are able to maintain a Transaction over the whole conversation. This is usually undesirable from a database standpoint: it requires lots of connections and may cause lock and other trouble in your database. Utter stupidity. It should be possible to force the data access layer to keep it's state in memory while you are working with it and without it having a Transaction current. Only at commit time would this state be saved. If the data is rolled back the state should be invalidated causing either a reread of the data subsequently used or even better: just use the "old" values already saved in the session cache.

It almost seems Hibernate has such a mode: flushmode=manual. But here again there's trouble: as soon as a query is done against a table that is in the cache all records are flushed to the database before the query is executed. This is needed because Hibernate does not allow querying it's own cache but it silently forgets about the flushmode=manual, again causing a long and frustrating debugging session because records are being inserted/updated while I dont want them to be.

Also frustrating is Hibernate's insistence on having a class be valid wrt the database checks (nullity) when a record is persisted. If you start a new record in the above pattern you must add dummy values to the record before persisting it or you'll get an exception because not-null fields are null. But you *must* persist the record before it can be used in any collection of sorts or you'll get an exception because you use a transient entity. This is probably due to a bad implementation of Hibernate's persist() call.

These problems are only the tip of the iceberg. It seems like easy database access is still far from being solved. I will start to look for more bare-bones data access frameworks; the heavyweights suck.

Comments

Use it in another way

The easiest way is just to keep all objects in transient state, and persist them when they need to be, and not any earlier.
You can easily do this with Hibernate: you just use Hibernate to load whatever data you need to present to the user.

I never use long conversations, I find it easier to just reload what's needed after user think time: I don't have to care about merging or reattaching objects, I just use the id of the detached object and reload it, in a fresh Session with one transaction, using a servlet filter around the http request.

I tend to avoid inheritance of data, in the end, your database doesn't support it, and when your application is very data oriented, I see little added value.

The heavyweights don't suck, just use them in a way you can handle, and fits your needs.