Reading files using SOA Suite is very easy as the file-adapter is a powerfull adapter. However, processing of large files is less trivial. You don’t want to read the huge file into memory and then process it. Preferable you process it in smaller chunks. Chunking the file using the “Read File” option of the file-adapter is pretty staight forward, all you need todo is to specify the publish size. Working with chucks for the “Synchronous Read File” option used from BPEL is less easy. In this blog I’ll describe how to implemement processing of a large file using BPEL and the “Synchronous Read File” option from the file adapter. This blog describes the processing of large CSV files, for processing large XML files see Processing large XML files in the SOA Suite.
In this blog we will create a compostite which will process a “large” file, in this case not really big but big enough to demostrate all the essential steps. A file adapter should start the BPELprocess, this process must read this file in chunks. For each chunk an action must be performed, after all the chunks are processed, the BPEL proces is done. We use the Read option from the file adapter to start the BPEL process, but this Read operation should not read the content. We configure a second, Synchronous Read File, adapter to read the content of the file in chunks. Each chunks is then processed individually.
Download sca_ChunkedRead_rev1.0, unzip and import this sca into your application. This SCA contains the xsd’s and the datafiles used in this blog.
Implemention the SOA Composite
First step is to create the File adapter to trigger the compostite to start working.
- Open the imported ChunkedRead composite
- Add a File adapter to the “exposed services” swimlane, name it “PickupFile”, Next
- Interface “Define from operation and schema”, Next
- Operation type “Read” and select “Do not use file content”, Next
- Select directory name = Logical Name. Name it FILE_IN, Next. Make sure not to select the delete file option
- Include files with name “articles.csv”, Next
- Next, Next, Finish
The next service to create is the file adapter, which reads the file that triggered “PickupFile” in chunks.
- Add a File adapter to the “external references” swimlane, name it “SynchChunkedFileRead”, Next
- Interface “Define from operation and schema”, Next
- Operation type “Synchronous File Read”, Next
- Select directory name = Logical Name. Name it FILE_IN, Next.
- Include files with name “overwriteme.txt”, Next. the name wil be provided by the BPEL process, Next
- Click on the magnifying glass, browse to the articles.xsd in the xsd folder in the project, select Articles as Root element, Next
So far we just created a file adapter doing a synchronous read, now let’s modify it so it will read the file in chunks. This is done by modifying the SynchChunkedFileRead_file.jca file. You’ll find this file in the project explorer. Open this file. Change the classname for the interaction-spec to “oracle.tip.adapter.file.outbound.ChunkedInteractionSpec”. Also add a property to specify the chunksize, for now we use 55 “<property name=”ChunkSize” value=”55″/>”. The file should look like this.
Save and close this file. Next we must create a BPEL process to actually read the file and process each chunk.
- Add a BPEL process to the “components” swimlane
- In the wizard, change to BPEL 2.0 specification, name it “ChunkedFileReadProcess”, template “Define Service Later” and press OK
- Go back to the composite.xml and wire the “PickupFile” to “ChunkedFileReadProcess” by dragging the arrow to the process, and from “ChunkedFileReadProcess” to “SynchChunkedFileRead”. Your composite should look like this.
- Open the BPEL process
- Create the following list of variables, we will use these to store the property values for the file adapter
- dir, type is string – store the filename to process
- file, type is string – store the directory where the file resides
- isEOF, type is string – specifies if the EndOfFile is reached
- lineNumber, type is string – specifies the linenumber to start reading from
- columnNumber, type is string
- noDataFound, type is string
- The BPEL process should start when the file specified in “PickupFile” is picked up by the adapter, add a recieve activty to the process and open the properties. Name it “ReceivePickupFile”, select “PickupFile” as partnerlink. Tick “Create Instance” and autocreate the inputvariable.
- Goto the properties tab, add two properties to get the jca.file.FileName and jca.file.Directory from the adapter. Store these values in the corresponding variables.
- Click OK
- We want to process chunk by chunk we should create a loop. Drag a “While Activity” below the recieve. For the condition use “$isEOF = ‘false’ “, as long as you didn’t get to the “EndOfFile” we keep reading.
- Inside the loop drag an “Invoke Activity”, open the properties and select “SynchChunkedFileRead” as partnerlink. Name the activity “InvokeSynchChunkedFileRead”. Auto create the input and output variable using the + sign.
- Goto the properties tab, add “To properties”, to pass the jca.file.FileName and jca.file.Directory from the variables to the adapter.
- Next add “From properties”, to het the filename from the adapter. This is just a trick to create the correct propertie structure wich we will chanch in the next step. Click OK.
- Open the source tab from the bpel process and find the invoke activity. Copy and past the two just created “To properties” for the filename and directory. Change the copies sources to “jca.file.LineNumber” and “jca.file.ColumnNumber” fill these from the corresponding variables.
- Do the same form the “From properties”. Match the following picture.
- What we did was to specify which chunck to read, from line and column to a line and column, the other properties specify End Of File reached, is Data Found. These will be used in the next loop and to determine if there is data to process.
- Go back to the Design mode.
- Next we should initialize the variables. Add an “Assign Activity” before the loop, assign ‘false’ to variable ‘isEOF’, ‘1’ to ‘lineNumber’ and ‘columnNumber’.
- We now did all the work to read the file chunk by chunk, next step is to process the data returned when data is found.
- Add an “If Activity” inside the loop, below the “Invoke”, name the IF “DataFound” and set the condition to $noDataFound = ‘false’. When the adapter returns false, there is data to process.
- Inside the if branch you process the data. Now we just add an empty activity with name “ProcessData”, inside the else branch add an empty activity with name “NoData”. This to show the difference in the audit trail. You BPEL should now look like this:
- Next, create a configplan to provide correct values for the logical file locations. Right click on the composite, generate config plan. Open the plan and provide correct data for FILE_IN for “PickupFile” and SynchChunkedFileRead.
- Deploy you composite to the server using the config plan.
Now it’s time to see chunking into action. Make sure your audit level is “Development”. In the downloaded sources in the files folder you’ll find a file called articles.csv. This file containt 300 rows, the chunksize is set to 55, we need 300/55 = 6 file read to read the whole file. Place this file on your server in the location specified in the config plan. After the polling interval, you’ll see an instance for the ChunkedRead compostite. View this instance. you’ll see that the SynchChunkedFileRead is invoked 6 times.
Click on the bpel process instance, look at the invokation of the SynchChunkedFileRead in the first loop. You’ll see all the properties being exchanged. As expected, IsEOF is false, and we’ve read until line 56. Scroll down to the bottom and look at the invokation of the SynchChunkedFileRead in the last loop. As expected, IsEOF is true, and we’ve read until line 301.
Reading complete, the loop is exited and the process completed.
You’ll notice the file is not deleted. You should implement this outside of the loop.
If the file isn’t picked up, open a putty session to your server. Goto the directory /fmwhome/user_projects/domains/dev_bpm/fileftp/controlFiles, remove the ChunkedRead folder and the file will be picked up again.
In an other article I will describe how to unzip the file to read, using pipeline valves.
- SOA Suite 22.214.171.124
- BPEL 2.0
See also https://technology.amis.nl/2015/11/27/processing-large-xml-files-in-the-soa-suite/