Configuring a Step in Spring Batch 2

A Step is a domain object that encapsulates an independent, sequential phase of a batch job and contains all of the information necessary to define and control the actual batch processing. This is a necessarily vague description because the contents of any given Step are at the discretion of the developer writing a Job. A Step can be as simple or complex as the developer desires. A simple Step might load data from a file into the database, requiring little or no code. (depending upon the implementations used) A more complex Step may have complicated business rules that are applied as part of the processing.

Chunk-Oriented Processing-

Spring Batch uses a ‘Chunk Oriented’ processing style within its most common implementation. Chunk-oriented processing refers to reading the data one at a time and creating ‘chunks’ that will be written out, within a transaction boundary. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

Below is a code representation of the same concepts shown above:

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read()
    Object processedItem = itemProcessor.process(item);
    items.add(processedItem);
}
itemWriter.write(items);

1. Configuring a Step-

<job id="sampleJob" job-repository="jobRepository">
    <step id="step1">
        <tasklet transaction-manager="transactionManager">
            &lt;chunk reader="itemReader" writer="itemWriter" commit-interval="10"/&gt;
        </tasklet>
    </step>
</job>

The configuration above represents the only required dependencies to create an item-oriented step:

  • reader – The ItemReader that provides items for processing.
  • writer – The ItemWriter that processes the items provided by the ItemReader.
  • transaction-manager – Spring’s PlatformTransactionManager that will be used to begin and commit transactions during processing.
  • job-repository – The JobRepository that will be used to periodically store the StepExecution and ExecutionContext during processing (just before committing). For an in-line <step/> (one defined within a <job/>) it is an attribute on the <job/> element; for a standalone step, it is defined as an attribute of the <tasklet/>.
  • commit-interval – The number of items that will be processed before the transaction is committed.

It should be noted that, job-repository defaults to “jobRepository” and transaction-manager defaults to “transactionManger”. Furthermore, the ItemProcessor is optional, not required, since the item could be directly passed from the reader to the writer.

2. Inheriting from a Parent Step-

If a group of Steps share similar configurations, then it may be helpful to define a “parent” Step from which the concrete Steps may inherit properties. Similar to class inheritance in Java, the “child” Step will combine its elements and attributes with the parents. The child will also override any of the parent’s Steps.
In the following example, the Step “concreteStep1” will inherit from “parentStep“. It will be instantiated with ‘itemReader’, ‘itemProcessor’, ‘itemWriter’, startLimit=5, and allowStartIfComplete=true. Additionally, the commitInterval will be ‘5’ since it is overridden by the “concreteStep1“:

<step id="parentStep">
    <tasklet allow-start-if-complete="true">
       <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
    </tasklet>
<step id="concreteStep1" parent="parentStep">     
<tasklet start-limit="5">         
<chunk processor="itemProcessor" commit-interval="5"/>     
</tasklet> 
</step> 
  • Abstract Step-

Sometimes it may be necessary to define a parent Step that is not a complete Step configuration. If, for instance, the reader, writer, and tasklet attributes are left off of a Step configuration, then initialization will fail. If a parent must be defined without these properties, then the “abstract” attribute should be used. An “abstract” Step will not be instantiated; it is used only for extending.

In the following example, the Step “abstractParentStep” would not instantiate if it were not declared to be abstract. The Step “concreteStep2” will have ‘itemReader’, ‘itemWriter’, and commitInterval=10.

<step abstract="true" id="abstractParentStep">
    <tasklet>
       &lt;chunk commit-interval="10"/&gt;
    </tasklet>
</step>

<step id="concreteStep2" parent="abstractParentStep">
    <tasklet>
        <chunk reader="itemReader" writer="itemWriter"/>
    </tasklet>
</step>

3. The Commit Interval-

As mentioned above, a step reads in and writes out items, periodically committing using the supplied PlatformTransactionManager. With a commit-interval of 1, it will commit after writing each individual item. This is less than ideal in many situations, since the beginning and committing a transaction is expensive. Ideally, it is preferable to process as many items as possible in each transaction, which is completely dependent upon the type of data being processed and the resources with which the step is interacting. For this reason, the number of items that are processed within a commit can be configured.

<job id="sampleJob">
    <step id="step1">
        <tasklet>
            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
        </tasklet>
    </step>
</job>

In the example above, 10 items will be processed within each transaction. At the beginning of processing a transaction is begun, and each time read is called on the ItemReader, a counter is incremented. When it reaches 10, the list of aggregated items is passed to the ItemWriter, and the transaction will be committed.

4. Configuring a Step for Restart-

  • Setting a StartLimit-

There are many scenarios where you may want to control the number of times a Step may be started. For example, a particular Step might need to be configured so that it only runs once because it invalidates some resource that must be fixed manually before it can be run again.

<step id="step1">
    <tasklet start-limit="1">
       <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
    </tasklet>
</step>

The simple step above can be run only once. Attempting to run it again will cause an exception to be thrown. It should be noted that the default value for the start-limit is Integer.MAX_VALUE.

  • Restarting a completed step-
<step id="step1">
    <tasklet allow-start-if-complete="true">
        <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
    </tasklet>
</step>

In the case of a restartable job, there may be one or more steps that should always be run, regardless of whether or not they were successful the first time. An example might be a validation step or a Step that cleans up resources before processing. During normal processing of a restarted job, any step with a status of ‘COMPLETED’, meaning it has already been completed successfully, will be skipped. Setting allow-start-if-complete to “true” overrides this so that the step will always run.

5. Configuring Skip Logic-

There are many scenarios where errors encountered while processing should not result in Step failure, but should be skipped instead. This is usually a decision that must be made by someone who understands the data itself and what meaning it has. Financial data, for example, may not be skippable because it results in money being transferred, which needs to be completely accurate. Loading a list of vendors, on the other hand, might allow for skips. If a vendor is not loaded because it was formatted incorrectly or was missing necessary information, then there probably won’t be issues. Usually, these bad records are logged as well, which will be covered later when discussing listeners.

<step id="step1">
   <tasklet>
      <chunk commit-interval="10" reader="flatFileItemReader" skip-limit="10" writer="itemWriter">
         <skippable-exception-classes>
           <include class="org.springframework.batch.item.file.FlatFileParseException"/>
         </skippable-exception-classes>
      </chunk>
   </tasklet>
</step>

In this example, a FlatFileItemReader is used, and if at any point a FlatFileParseException is thrown, it will be skipped and counted against the total skip limit of 10. Separate counts are made of skips on reading, process and write inside the step execution, and the limit applies across all. Once the skip limit is reached, the next exception found will cause the step to fail.

<step id="step1">
    <tasklet>
        <chunk commit-interval="10" reader="flatFileItemReader" skip-limit="10" writer="itemWriter">
            <skippable-exception-classes>
                <include class="java.lang.Exception"/>
                <exclude class="java.io.FileNotFoundException"/>
            </skippable-exception-classes>
        </chunk>
    </tasklet>
</step>

By ‘including’ java.lang.Exception as a skippable exception class, the configuration indicates that all Exceptions are skippable. However, by ‘excluding’ java.io.FileNotFoundException, the configuration refines the list of skippable exception classes to be all Exceptions except FileNotFoundException. Any excluded exception classes will be fatal if encountered (i.e. not skipped).

6. Configuring Retry Logic-

In most cases you want an exception to cause either a skip or Step failure. However, not all exceptions are deterministic. If a FlatFileParseException is encountered while reading, it will always be thrown for that record; resetting the ItemReader will not help. However, for other exceptions, such as a DeadlockLoserDataAccessException, which indicates that the current process has attempted to update a record that another process holds a lock on, waiting and trying again might result in success. In this case, retry should be configured:

<step id="step1">
   <tasklet>
      <chunk commit-interval="2" reader="itemReader" retry-limit="3" writer="itemWriter">
         <retryable-exception-classes>
            <include class="org.springframework.dao.DeadlockLoserDataAccessException"/>
         </retryable-exception-classes>
      </chunk>
   </tasklet>
</step>

7. Controlling Rollback-

By default, regardless of retry or skip, any exceptions thrown from the ItemWriter will cause the transaction controlled by the Step to rollback. If a skip is configured as described above, exceptions thrown from the ItemReader will not cause a rollback. However, there are many scenarios in which exceptions thrown from the ItemWriter should not cause a rollback because no action has taken place to invalidate the transaction. For this reason, the Step can be configured with a list of exceptions that should not cause a rollback.

<step id="step1">
   <tasklet>
      <chunk commit-interval="2" reader="itemReader" writer="itemWriter">
      <no-rollback-exception-classes>
         <include class="org.springframework.batch.item.validator.ValidationException"/>
      </no-rollback-exception-classes>
   </chunk></tasklet>
</step>

8. Transaction Attributes-

    <tasklet>
        <chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
        <transaction-attributes isolation="DEFAULT" 
                                propagation="REQUIRED" 
                                timeout="30"/>
   </tasklet>

9. Registering ItemStreams with the Step-

<step id="step1">
    <tasklet>
        <chunk commit-interval="2" reader="itemReader" writer="compositeWriter">
            <streams>
                <stream ref="fileItemWriter1"/>
                <stream ref="fileItemWriter2"/>
            </streams>
        </chunk>
    </tasklet>
</step>

<beans:bean class="org.springframework.batch.item.support.CompositeItemWriter" id="compositeWriter">
    <beans:property name="delegates">
        <beans:list>
            <beans:ref bean="fileItemWriter1" />
            <beans:ref bean="fileItemWriter2" />
        </beans:list>
    </beans:property>
</beans:bean>

10. Intercepting Step Execution-

<step id="step1">
    <tasklet>
          <chunk reader="reader" writer="writer" commit-interval="10"/>
        <listeners>
            <listener ref="chunkListener"/>
        </listeners>
    </tasklet>
</step>

StepExecutionListener

public interface StepExecutionListener extends StepListener {

    void beforeStep(StepExecution stepExecution);

    ExitStatus afterStep(StepExecution stepExecution);

}

ChunkListener

public interface ChunkListener extends StepListener {

    void beforeChunk();

    void afterChunk();

}

ItemReadListener

public interface ItemReadListener<T> extends StepListener {
  
    void beforeRead();

    void afterRead(T item);
    
    void onReadError(Exception ex);

}

ItemProcessListener

public interface ItemProcessListener<T, S> extends StepListener {

    void beforeProcess(T item);

    void afterProcess(T item, S result);

    void onProcessError(T item, Exception e);

}

ItemWriteListener

public interface ItemWriteListener<S> extends StepListener {

    void beforeWrite(List<? extends S> items);

    void afterWrite(List<? extends S> items);

    void onWriteError(Exception exception, List<? extends S> items);

}

SkipListener

public interface SkipListener<T,S> extends StepListener {

    void onSkipInRead(Throwable t);

    void onSkipInProcess(T item, Throwable t);

    void onSkipInWrite(S item, Throwable t);

}

TaskletStepThe Tasklet is a simple interface that has one method, execute, which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. Each call to the Tasklet is wrapped in a transaction. Tasklet implementors might call a stored procedure, a script, or a simple SQL update statement. To create a TaskletStep, the ‘ref’ attribute of the <tasklet/> element should reference a bean defining a Tasklet object; no <chunk/> element should be used within the <tasklet/>:

<step id="step1">
    <tasklet ref="myTasklet"/>
</step>

Controlling Step Flow-

With the ability to group steps together within an owning job comes the need to be able to control how the job ‘flows’ from one step to another. The failure of a Step doesn’t necessarily mean that the Job should fail. Furthermore, there may be more than one type of ‘success’ which determines which Step should be executed next. Depending upon how a group of Steps is configured, certain steps may not even be processed at all.

1. Sequential Flow-

The simplest flow scenario is a job where all of the steps execute sequentially:

This can be achieved using the ‘next’ attribute of the step element:

<job id="job">
    <step id="stepA" parent="s1" next="stepB" />
    <step id="stepB" parent="s2" next="stepC"/>
    <step id="stepC" parent="s3" />
</job>

2. Conditional Flow-

In the example above, there are only two possibilities:

  1. The Step is successful and the next Step should be executed.
  2. The Step failed and thus the Job should fail.

In many cases, this may be sufficient. However, what about a scenario in which the failure of a Step should trigger a different Step, rather than causing failure?

In order to handle more complex scenarios, the Spring Batch namespace allows transition elements to be defined within the step element. One such transition is the “next” element. Like the “next” attribute, the “next” element will tell the Job which Steps to execute next. However, unlike the attribute, any number of “next” elements are allowed on a given Step, and there is no default behavior in the case of failure. This means that if transition elements are used, then all of the behaviours for the Step’s transitions must be defined explicitly. Note also that a single step cannot have both a “next” attribute and a transition element.
The next element specifies a pattern to match and the step to execute next:

<job id="job">
    <step id="stepA" parent="s1">
        <next on="*" to="stepB" />
        <next on="FAILED" to="stepC" />
    </step>
    <step id="stepB" parent="s2" next="stepC" />
    <step id="stepC" parent="s3" />
</job>

The “on” attribute of a transition element uses a simple pattern-matching scheme to match the ExitStatus that results from the execution of the Step. Only two special characters are allowed in the pattern:

  • “*” will zero or more characters
  • “?” will match exactly one character

For example, “c*t” will match “cat” and “count”, while “c?t” will match “cat” but not “count”.
While there is no limit to the number of transition elements on a Step, if the Step’s execution results in an ExitStatus that is not covered by an element, then the framework will throw an exception and the Job will fail. The framework will automatically order transitions from most specific to least specific. This means that even if the elements were swapped for “stepA” in the example above, an ExitStatus of “FAILED” would still go to “stepC”.

Previous
Next

2 Comments

  1. congdoan January 14, 2014
  2. Sudhagar Narayanasamy July 8, 2019