Spring Batch Glossary


Batch
An accumulation of business transactions over time.
Batch Application Style
Term used to designate batch as an application style in its own right similar to online, Web or SOA. It has standard elements of input, validation, transformation of information to business model, business processing and output. In addition, it requires monitoring at a macro level.
Batch Processing
The handling of a batch of many business transactions that have accumulated over a period of time (e.g. an hour, day, week, month, or year). It is the application of a process, or set of processes, to many data entities or objects in a repetitive and predictable fashion with either no manual element, or a separate manual element for error processing.
Batch Window
The time frame within which a batch job must complete. This can be constrained by other systems coming online, other dependent jobs needing to execute or other factors specific to the batch environment.
Step
It is the main batch task or unit of work controller. It initializes the business logic, and controls the transaction environment based on commit interval setting, etc.
Tasklet
A component created by application developer to process the business logic for a Step.
Batch Job Type
Job Types describe application of jobs for particular type of processing. Common areas are interface processing (typically flat files), forms processing (either for online pdf generation or print formats), report processing.
Job
A job represents entire batch work. E.g. End Of Day (EOD) job in bank which consists of multiple steps, each step representing single unit of work.
JobRepository
Repository is responsible for persistence of batch meta-data information. SimpleJobRepository is an implementation of JobRepository that stores JobInstances, JobExecutions, and StepExecutions information using the DAOs injected via constructure arguments. Spring Batch supports two implementation of these DAOs: Map based (in-memory) and Jdbc based. In real enterprise application the Jdbc variants are preffered but we will use simpler in-memory alternatives (MapJobInstanceDao, MapJobExecutionDao, MapStepExecutionDao, MapExecutionContextDao).
JobLauncher
As name suggests it is responsible for launching batch job. We are using SimpleJobLauncher implementation which requires only one dependency, a JobRepository. JobRepository is used to obtain a valid JobExecution. Repository must be used because the provided Job could be a restart of an existing JobInstance, and only the Repository can reliably recreate it.
JobInstanceDao, JobExecutionDao, StepExecutionDao
These data access objects are used by SimpleJobRepository to store execution related information. Two sets of implementations are provided by Spring Batch: Map based (in-memory) and Jdbc based. In a real application the Jdbc variants are more suitable but we will use the simpler in-memory alternative in this example.
Driving Query
A driving query identifies the set of work for a job to do; the job then breaks that work into individual units of work. For instance, identify all financial transactions that have a status of "pending transmission" and send them to our partner system. The driving query returns a set of record IDs to process; each record ID then becomes a unit of work. A driving query may involve a join (if the criteria for selection falls across two or more tables) or it may work with a single table.
Item
An item represents the smallest ammount of complete data for processing. In the simplest terms, this might mean a line in a file, a row in a database table, or a particular element in an XML file.
Logicial Unit of Work (LUW)
A batch job iterates through a driving query (or another input source such as a file) to perform the set of work that the job must accomplish. Each iteration of work performed is a unit of work.
Commit Interval
A set of LUWs processed within a single transaction.
Partitioning
Splitting a job into multiple threads where each thread is responsible for a subset of the overall data to be processed. The threads of execution may be within the same JVM or they may span JVMs in a clustered environment that supports workload balancing.
Staging Table
A table that holds temporary data while it is being processed.
Restartable
A job that can be executed again and will assume the same identity as when run initially. In othewords, it is has the same job instance id.
Rerunnable
A job that is restartable and manages its own state in terms of previous run's record processing. An example of a rerunnable step is one based on a driving query. If the driving query can be formed so that it will limit the processed rows when the job is restarted than it is re-runnable. This is managed by the application logic. Often times a condition is added to the where statement to limit the rows returned by the driving query with something like "and processedFlag != true".
Repeat
One of the most basic units of batch processing, that defines repeatability calling a portion of code until it is finished, and while there is no error. Typically a batch process would be repeatable as long as there is input.
Retry
Simplifies the execution of operations with retry semantics most frequently associated with handling transactional output exceptions. Retry is slightly different from repeat, rather than continually calling a block of code, retry is stateful, and continually calls the same block of code with the same input, until it either succeeds, or some type of retry limit has been exceeded. It is only generally useful if a subsequent invocation of the operation might succeed because something in the environment has improved.
Recover
Recover operations handle an exception in such a way that a repeat process is able to continue.
Skip
Skip is a recovery strategy often used on file input sources as the strategy for ignoring bad input records that failed validation.

No comments:

Post a Comment