
SQL in Java: To batch or not to batch with Oracle JDBC
Last week I had to parse thousands of sql insert statements, read from files (file per table), within a Java environment. The files contain complete SQL insert statements, so I used a normal simple Statement object.
Each insert statement was parsed to the normal execute method until I tested with a production set containing file with 220.000 inserts…. it took about 30 minutes. Hmm, that’s too long and such a simple job should run much faster.

So to speed it up I switched to batch processing by using the addBatch method for each insert statement and then call the executeBatch afterwards to process it. And…..still 30 minutes! How that’s possible? Google found the answer delivered by O’reilly:
Batching
As you saw in the previous section, PreparedStatement objects eventually become more efficient than their Statement counterparts after 65-125 executions of the same statement. If you’re going to execute a given SQL statement a large number of times, it makes sense from a performance standpoint to use a PreparedStatement object. But if you’re really going to do that many executions of a statement, or perhaps more than 50, you should consider batching. Batching is more efficient because it sends multiple SQL statements to the server at one time. Although JDBC defines batching capability for Statement objects, Oracle supports batching only when Prepared-Statement objects are used. This makes some sense. A SQL statement in a PreparedStatement object is parsed once and can be reused many times. This naturally lends itself to batching.
Source: http://www.onjava.com/pub/a/onjava/excerpt/oraclejdbc_19/index.html?page=2
Hey! Batch processing is not supported by Oracle using the normal Statement object! To comply with the Java API, the methods are available, but do about the same as the normal execute method.
After converting the insert sql strings to prepared statement strings (you know, the syntax with question marks on the params), I could use the PreparedStatement object.
The upload of the 220.000 records now takes an acceptable 45 seconds.
Conversion regarding null values
To convert plain sql statements to a prepared statement syntax, simply replace all params by question marks and use the appropriate set param methods (e.g. setString) to pass the param values to the prepared statement. The only problems are null values, because the setNull method needs the sql type as a param. At first I used a hard coded null in the prepared statement syntax. This resulted in lots of prepared statements, because each combination of null values as param values results in a different prepared statement. To solve this problem I now retrieve the metadata of the table with "select * from table where 1 = 0". No rows are returned, but the metadata is. By looping over the columns, the sql types, which are integers, are stored in a Hashtable with the colum name as key. When passing the null value with method setNull, now the sql type can be retrieved using the Hashtable.



7/12/2007 - 7:41 pm
Just as a final remark.
The rest of the article of O-Reilly is also quite interesting.
7/12/2007 - 10:14 pm
For performance purposes, if you were repeating the same statement 200,000 times, you should have been using PreparedStatement from the start.
In general, I like the PreparedStatement object better because I just find it cleaner to use.
9/12/2007 - 8:25 pm
Parameter binding rules
Patrick
10/12/2007 - 5:18 pm
Emiel kun je me een plezier doen en je font style aanpassen naar “default”. Alles (de site) gaat nu mee in jou fontstyle zoals je ziet…
10/12/2007 - 5:58 pm
Recently I had to do exactly the same: parse hundred thousands of records from file to a database. To keep the performance acceptable, we had to process the file as a stream and to apply JDBC batching, precisely in the way you described. However, we soon ran into java.lang.OutOfMemory errors, due to the PreparedStatement growing beyond bounds (containing the batch with hundred thousands of records). For those who are interested in our solution, see my post Tackling JDBC incurred memory problems using JMX.
14/12/2007 - 3:59 pm
If you need/want performance, he is my experience to share;
1. No “SQL in Java” -> package your database calls on pl/sql, and just callable statements within java.
2. No “To batch or not to batch with Oracle JDBC” -> pl/sql bulk operations for batches, and again just callable statements within java.
If you are doing data intensive operations finish the task within the kernel, database itself. And SQL is the native language of databases not Java.
Best regards.
16/12/2007 - 4:27 pm
@TongucY: The uploading is part of a synchronization process. The synchronization is not straight forward, it involves also lots of extra tables (link, history and log tables) and also an Excel sheet has to exported with all the data changes (sheet per table). Because the incoming data is delivered as pure SQL insert statements into import tables, the fist option I used was adding these complete insert statements as JDBC batch statements and execute them.