Java EE 7: Creating a background download manager using Java Batch in GlassFish 4.0

4

One of the brand new specifications in Java EE 7 is JSR-352 Batch Applications for the Java Platform. This specification describes how Java EE containers will allow applications to run batch jobs in a standardized way. Such a batch job can be scheduled (to be started by the container) or be started as a background process (for example from a Web applications or EJB).

In this article I will show a simple download manager that is implemented using this new Batch specification. The user will enter a number of URLs of files that should be downloaded and gathered into a single ZIP-file. An HTML page submits the list of files to a Servlet. The Servlet starts a batch job (that runs in the background) and returns to the browser with the identifier of the background job. The batch job meanwhile will go through a number of steps: create a temporary directory, download all files, create a zip-file, add all downloaded files to the archive and finally remove the temporary download directory.

This example will show a number of the mechanisms and features of the JSR-352 specification, including chunked processing, parameter passing, batchlets, job and step listeners, the creation of the JSL based task definition and the initiation of the task from a servlet.

Designing the Batch Job

The batch job is specified in an XML document, using the Job Specification Language. In this document, the step by step execution path for the job is given – detailing the order of the steps, decision & flow logic determining whether steps should be executed or not and advising the job execution engine on which steps can be processed in parallel. For each step, we specify which classes or managed beans should be invoked to perform the actual work. As such, the JSL has many similarities with for example BPEL.

The job at hand can be described as follows:

  • a string with the file URLs is the dynamic input property for our job
  • the first action will be the creation of a temporary download directory for the job (a batchlet)
  • next, in a chunk step, the string with file URLs is turned into string array and individual file URLs are dispensed for every iteration (chunk: reader (open, readItem)); the file URL is processed meaning that the file is downloaded and written to the temporary directory (chunk: process); finally each file that was processed is reported (chunk: writer)
  • the next action, also a batchlet, is the creation of a zip-file that contains all downloaded files
  • finally, another batchlet takes care or removing the temporary download directory

The next figure describes visually the steps in our job and also shows the xml file with the specification of the batch job:

image

The location of this job specification file is important. In a Java Web application, the file should be in the directory WEB-INF\class\META-INF\batch-jobs:

image

Note how at job and step level a listener is defined – the InfoJobListener bean – that will report on the job execution progress (by writing lines to the system output). This bean is defined like this:

image

The CDI annotation @Named is the link between the ref attribute in the listener object and the class InfoJobListener. The class implements the interfaces JobListener and StepListener defined in the Batch Programming specification.

The job specification language also contains two properties – downloadDirectory and archivesDirectory. These properties can be access from the JobContext object in each class that is enlisted to be a batchlet or a chunk reader, writer or processor. The properties are used to specify the directory in which the temporary download directories can be created as well as where the final download archives (the zip file produced by the job) should be written to.

image

The first step – make directory – is implemented by the class MakeDirectoryBatchlet. This class implements the Batchlet interface – with methods stop() and process(). These methods will be invoked by the batch job execution engine. The action in this batchlet is simple: create a directory called “job45” inside the designated directory for temporary downloads (where 45 is the id of the job execution). The designated directory is retrieved from a job property – specified in the job xml document and the job execution id is retrieved from the job (execution) context. This latter object is injected into the jobCtx member variable, thanks to the @Inject annotation:

image

The chunk step is next. This step consists of three stages: read, process and write. Three classes perform these stages: ItemReader, ItemProcessor and ItemWriter – extending AbstractItemReader, implementing ItemProcessor and extending AbstractItemReaderimplementing respectively. Class ItemReader has an open method that performs the initialization for the per item iterations that will follow. In open(), the string property fileList – passed in when the batch job is started – is split into a String Array and a counter variable is reset.

For as long as readItem() returns non null results, the chunk processing will continue. In this example, readItem returns the next fileURL in the array (and increases the counter if the end of the array has not yet been reached):

image

The second stage in the chunk is the processor:

image

processing the item (represented by the fileURL) consists of opening a URL object to the file, creating a new file in the temporary job download directory, reading the content from the URL and writing it to the new file. Most of the code is pretty standard Java I/O processing and stream handling (for which I googled around freely). Again, the property for the download directory as well as the Job Execution Id are retrieved from the Job Execution Context that is injected.

The final result from the ItemProcessor is the name of the file that was written. This result is returned to the Job execution engine.

The third stage is the ItemWriter. In this example, this writer has hardly anything to do. It is passed in a collection of the results returned by the ItemProcessor. In this case, all it does is write the file names to the system output:

image

After the chunk, another batchlet is invoked to create the archive. The class FileArchiver implements the Batchlet interface. It gets the JobContext injected. It retrieves the properties downloadDirectory (where it will find all downloaded files) and archivesDirectory (in which it should create a zip-file). Next, it will read from the temporary download directory a list of all the files it contains and subsequently create a zip file with all those files added to it. Again, most is pretty standard Java stream processing (for which I borrowed extensively from various blog articles and forum threads)

SNAGHTML388c4ca2

The last step is the batchlet that removes the temporary directory.

Starting the batch job in a Java Web Application

The download batch job is to be invoked from a servlet. The servlet is accessed through a POST request that contains a parameter – fileURLList. This parameter contains URLs from which files should be downloaded. These URLs are separated by linefeeds.

The servlet turns the fileURLList into a String[]. Then it uses method submitJobFromXML to start a new instance of the multiFileBatchProcessing-job. The property fileList is passed along in the context for this job instance. The BatchRuntime – made available in the Java EE container – is retrieved and from it the JobOperator(). The JobOperator is the interface to the batch job processing engine. Its start method is invoked with the identifier of the batch job (corresponding to the name and the identifier in the job definition file) and a properties object:

image

Finally, the application contains a fairly simple and almost ugly static HTML file that allows the user to key in the file URLs and submit this list of URLs to the servlet:

image

Batch Job in Action

The HTML page that allows users to provide file URLs:

image

The response from the Servlet

image

The new directory created:

image

And the files downloaded from the URLs to this temporary directory:

image

The final result – the zip file with the four downloaded files:

SNAGHTML36afd180

The log output from the batch job is seen in the GlassFish console:

image

Resources

Download the NetBeans project with all the sources discussed in this article: DownloadBatchManager-web.

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

4 Comments

  1. Nice post, Batch jobs are really nice framework in Java EE, but is it possible to switch default batch job database to for example mysql? I did not found nothing about it on google.

  2. Great post, a good intro to the respective API parts. One small thing I noticed though… You ought to synchronize access to the counter in the ItemReader if you want to support parallell multithreaded reads. Otherwise a great post!

  3. Jesús Alonso on

    Thanks for the post. I have a question regarding the flow. The 3rd step get the list of files reading the output directory. I guess this list could be passed by the 2nd step, that generates all the files, isn’t it?

Leave a Reply