Last week I had to parse thousands of sql insert statements, read from files (one file per table), within a Java environment. These files contain complete SQL insert statements, so I used a normal simple Statement object.
Each insert statement was parsed to the normal execute method. It worked fine until I tested it with a production set containing files with 220.000 inserts. It took about 30 minutes! Hmm, that’s too long and such a simple job should run much faster.
To speed it up I switched to JDBC batch processing by using the addBatch method for each insert statement and afterwards call the executeBatch to process it. And… it still took 30 minutes! How that’s possible? Google found the answer delivered by O’reilly:
Batching
As you saw in the previous section, PreparedStatement objects eventually become more efficient than their Statement counterparts after 65-125 executions of the same statement. If you’re going to execute a given SQL statement a large number of times, it makes sense from a performance standpoint to use a PreparedStatement object. But if you’re really going to do that many executions of a statement, or perhaps more than 50, you should consider batching. Batching is more efficient because it sends multiple SQL statements to the server at one time. Although JDBC defines batching capability for Statement objects, Oracle supports batching only when Prepared-Statement objects are used. This makes some sense. A SQL statement in a PreparedStatement object is parsed once and can be reused many times. This naturally lends itself to batching.
Source: http://www.onjava.com/pub/a/onjava/excerpt/oraclejdbc_19/index.html?page=2
Hey! Batch processing is not supported by Oracle using the normal Statement object! To comply with the Java API, the methods are available, but do about the same as the normal execute method.
After converting the insert sql strings to Prepared Statement strings using bind parameters (you know, the syntax with question marks as the params), I could use the PreparedStatement object.
Now the upload of the 220.000 records takes an acceptable 45 seconds.
Conversion regarding null values
To convert plain sql statements to a prepared statement syntax, simply replace all params by question marks and use the appropriate set param methods (e.g. setString) to pass the param values to the prepared statement. The only problems are null values, because the setNull method needs the sql type as a param. At first I used a hard coded null in the prepared statement syntax. This resulted in lots of prepared statements, because each combination of null values results in a different prepared statement! To solve this problem I first retrieve the metadata of the table with select * from table where 1 = 0
. No rows are returned, but the metadata is! By looping over the columns, the sql types, which are integers, are stored in a Hashtable with the colum name as key. When passing the null value with method setNull, the sql type can be retrieved using the Hashtable.
@TongucY: The uploading is part of a synchronization process. The synchronization is not straight forward, it involves also lots of extra tables (link, history and log tables) and also an Excel sheet has to exported with all the data changes (sheet per table). Because the incoming data is delivered as pure SQL insert statements into import tables, the fist option I used was adding these complete insert statements as JDBC batch statements and execute them.
If you need/want performance, he is my experience to share;
1. No “SQL in Java” -> package your database calls on pl/sql, and just callable statements within java.
2. No “To batch or not to batch with Oracle JDBC” -> pl/sql bulk operations for batches, and again just callable statements within java.
If you are doing data intensive operations finish the task within the kernel, database itself. And SQL is the native language of databases not Java.
Best regards.
Recently I had to do exactly the same: parse hundred thousands of records from file to a database. To keep the performance acceptable, we had to process the file as a stream and to apply JDBC batching, precisely in the way you described. However, we soon ran into java.lang.OutOfMemory errors, due to the PreparedStatement growing beyond bounds (containing the batch with hundred thousands of records). For those who are interested in our solution, see my post Tackling JDBC incurred memory problems using JMX.
Emiel kun je me een plezier doen en je font style aanpassen naar “default”. Alles (de site) gaat nu mee in jou fontstyle zoals je ziet…
Parameter binding rules 😉
Patrick
For performance purposes, if you were repeating the same statement 200,000 times, you should have been using PreparedStatement from the start.
In general, I like the PreparedStatement object better because I just find it cleaner to use.
Just as a final remark.
The rest of the article of O-Reilly is also quite interesting.