SQL in Java: To batch or not to batch with Oracle JDBC
Last week I had to parse thousands of sql insert statements, read from files (one file per table), within a Java environment. These files contain complete SQL insert statements, so I used a normal simple Statement object.
Each insert statement was parsed to the normal execute method. It worked fine until I tested it with a production set containing files with 220.000 inserts. It took about 30 minutes! Hmm, that’s too long and such a simple job should run much faster.
To speed it up I switched to JDBC batch processing by using the addBatch method for each insert statement and afterwards call the executeBatch to process it. And… it still took 30 minutes! How that’s possible? Google found the answer delivered by O’reilly:
As you saw in the previous section, PreparedStatement objects eventually become more efficient than their Statement counterparts after 65-125 executions of the same statement. If you’re going to execute a given SQL statement a large number of times, it makes sense from a performance standpoint to use a PreparedStatement object. But if you’re really going to do that many executions of a statement, or perhaps more than 50, you should consider batching. Batching is more efficient because it sends multiple SQL statements to the server at one time. Although JDBC defines batching capability for Statement objects, Oracle supports batching only when Prepared-Statement objects are used. This makes some sense. A SQL statement in a PreparedStatement object is parsed once and can be reused many times. This naturally lends itself to batching.
Hey! Batch processing is not supported by Oracle using the normal Statement object! To comply with the Java API, the methods are available, but do about the same as the normal execute method.
After converting the insert sql strings to Prepared Statement strings using bind parameters (you know, the syntax with question marks as the params), I could use the PreparedStatement object.
Now the upload of the 220.000 records takes an acceptable 45 seconds.
Conversion regarding null values
To convert plain sql statements to a prepared statement syntax, simply replace all params by question marks and use the appropriate set param methods (e.g. setString) to pass the param values to the prepared statement. The only problems are null values, because the setNull method needs the sql type as a param. At first I used a hard coded null in the prepared statement syntax. This resulted in lots of prepared statements, because each combination of null values results in a different prepared statement! To solve this problem I first retrieve the metadata of the table with
select * from table where 1 = 0. No rows are returned, but the metadata is! By looping over the columns, the sql types, which are integers, are stored in a Hashtable with the colum name as key. When passing the null value with method setNull, the sql type can be retrieved using the Hashtable.
- Worst practices when using Oracle JDBC Drivers – great presentation by Douglas Surber
- Introducing Spring JDBC – frequently the best introduction of Spring in an organization
- New Oracle JDBC driver
- Getting started with Unit Testing of JDBC based DAOs with the Spring Framework (introducing DBUnit)
- EJB 3.0 Java Persistence API – "Refresh After Insert": Using EntityListeners to absorb the effects from Database Triggers