ItemReaders and ItemWriters in Spring Batch 2.0

ItemReader-

Although a simple concept, an ItemReader is the means for providing data from many different types of input. The most general examples include:

  • Flat File- Flat File Item Readers read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma).
  • XML – XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects. Input data allows for the validation of an XML file against an XSD schema.
  • Database – A database resource is accessed to return resultsets which can be mapped to objects for processing. The default SQL ItemReaders invoke a RowMapper to return objects, keep track of the current row if restart is required, store basic statistics, and provide some transaction enhancements that will be explained later.

There are many more possibilities, but we’ll focus on the basic ones for this chapter. A complete list of all available ItemReaders can be found in Appendix A.
ItemReader is a basic interface for generic input operations:

public interface ItemReader<T> {

    T read() throws Exception, UnexpectedInputException, ParseException;

}

The read method defines the most essential contract of the ItemReader; calling it returns one Item or null if no more items are left. An item might represent a line in a file, a row in a database, or an element in an XML file. It is generally expected that these will be mapped to a usable domain object (i.e. Trade, Foo, etc) but there is no requirement in the contract to do so.

It is expected that implementations of the ItemReader interface will be forward only. However, if the underlying resource is transactional (such as a JMS queue) then calling read may return the same logical item on subsequent calls in a rollback scenario. It is also worth noting that a lack of items to process by an ItemReader will not cause an exception to be thrown. For example, a database ItemReader that is configured with a query that returns 0 results will simply return null on the first invocation of read.

ItemWriter-

ItemWriter is similar in functionality to an ItemReader, but with inverse operations. Resources still need to be located, opened and closed but they differ in that an ItemWriter writes out, rather than reading in. In the case of databases or queues these may be inserts, updates, or sends. The format of the serialization of the output is specific to each batch job.

As with ItemReader, ItemWriter is a fairly generic interface:

public interface ItemWriter<T> {

    void write(List<? extends T> items) throws Exception;

}

As with read on ItemReader, write provides the basic contract of ItemWriter; it will attempt to write out the list of items passed in as long as it is open. Because it is generally expected that items will be ‘batched’ together into a chunk and then output, the interface accepts a list of items, rather than an item by itself. After writing out the list, any flushing that may be necessary can be performed before returning from the write method. For example, if writing to a Hibernate DAO, multiple calls to write can be made, one for each item. The writer can then call close on the hibernate Session before returning.

ItemProcessor-

The ItemReader and ItemWriter interfaces are both very useful for their specific tasks, but what if you want to insert business logic before writing? One option for both reading and writing is to use the composite pattern: create an ItemWriter that contains another ItemWriter, or an ItemReader that contains another ItemReader. For example:

public class CompositeItemWriter<T> implements ItemWriter<T>  {

    ItemWriter<T> itemWriter;

    public CompositeItemWriter(ItemWriter<T>  itemWriter) {
        this.itemWriter = itemWriter;
    }

    public void write(List<? extends T>  items) throws Exception {
        //Add business logic here
       itemWriter.write(item);
    }

    public void setDelegate(ItemWriter<T>  itemWriter){
        this.itemWriter = itemWriter;
    }
}

The class above contains another ItemWriter to which it delegates after having provided some business logic. This pattern could easily be used for an ItemReader as well, perhaps to obtain more reference data based upon the input that was provided by the main ItemReader. It is also useful if you need to control the call to write yourself. However, if you only want to ‘transform’ the item passed in for writing before it is actually written, there isn’t much need to call write yourself: you just want to modify the item. For this scenario, Spring Batch provides the ItemProcessor interface:

public interface ItemProcessor<I, O> {

    O process(I item) throws Exception;
}

An ItemProcessor is very simple; given one object, transform it and return another. The provided object may or may not be of the same type. The point is that business logic may be applied within process, and is completely up to the developer to create. An ItemProcessor can be wired directly into a step, For example, assuming an ItemReader provides a class of type Foo, and it needs to be converted to type Bar before being written out. An ItemProcessor can be written that performs the conversion:

public class Foo {}

public class Bar {
    public Bar(Foo foo) {}
}

public class FooProcessor implements ItemProcessor<Foo,Bar>{
    public Bar process(Foo foo) throws Exception {
        //Perform simple transformation, convert a Foo to a Bar
        return new Bar(foo);
    }
}

public class BarWriter implements ItemWriter<Bar>{
    public void write(List<? extends Bar> bars) throws Exception {
        //write bars
    }
}

In the very simple example above, there is a class Foo, a class Bar, and a class FooProcessor that adheres to the ItemProcessor interface. The transformation is simple, but any type of transformation could be done here. The BarWriter will be used to write out Bar objects, throwing an exception if any other type is provided. Similarly, the FooProcessor will throw an exception if anything but a Foo is provided. The FooProcessor can then be injected into a Step:

<job id="ioSampleJob">
    <step name="step1">
        <tasklet>
            &lt;chunk reader="fooReader" processor="fooProcessor" writer="barWriter" 
                   commit-interval="2"/&gt;
        </tasklet>
    </step>
</job>

<<Configuring Step in Spring Batch<< Index >> Scaling and Parallel Processing>>

Previous
Next