Spring Batch for Beginners Part 2 – The Domain Language of Spring Batch

Let’s start with this diagram that highlights the key concepts that make up the domain language of Spring Batch.
Let us begin to understand each of these concepts in detail.

Work

It is a unit that encompasses the entire batch process. It is wired together with the help of some configuration file (XML/Java based) called job configuration.
The job is at the top of the hierarchy. This step is a container for examples.

picture description

This diagram will be explained in later steps.
A job consists of several steps and arranges the steps in the order of their execution. It applies configuration globally on all steps like restore

Job configuration includes

  1. job title
  2. Definition and sequence of step examples.
  3. job resume qualification

picture description

Here if we see that “FootballJob” is the name of the job, then PlayerLoad(), GameLoad(), PlayerSummerization() are all steps.
.start() tells the job to start the playerload() step and once it’s completed the .next() tells the job to run the gameload() step and once it’s done playerSumerization () step will be executed.

job example

It refers to the concept of logical job run. Let us understand in simple words.
Suppose you have EOD job to fill data from 1 table to another table [based on today’s date], There is only one EOD job. But each day you have to run the task with it’s own parameters. In the case of this job, there is one logical job example per day. There is 1 January run, 2 January run and so on. If the Jan 1st run fails, we trigger it again – it’s still the Jan 1st run.
Here we see that a job instance can have multiple executions.[Job executions], But only one job instance can be run at a given time for a job with its set of parameters. Starting a new job instance means starting from the beginning, while using an existing instance means picking up where we left off.

picture description

job parameter
How do we distinguish one Job example from another?
It consists of a set of parameters that are used to start the job. Something similar to passing an argument to a function. In the earlier case where we had a job instance for January 1st and another job for January 2nd, it is the same job but it is differentiated based on the parameters of the job.
The date parameter will start as January 2, to run on January 1 and January 2 based on the date parameter on January 1.

  • Job instance = job + parameter identification

job execution
It refers to the concept of a single attempt to run a job. A job instance can be executed multiple times. This is called execution. Execution may result in failure or completion, but a task instance corresponding to execution will be considered complete only if that execution is successful. For an EOD job, it may be possible that the first execution of the Jan 1 job instance failed. If it is run again with the same task parameters, a new task execution is created.
There are some task execution properties like Status, Start Time, End Time, Exit Status, ExecutionContext etc. These properties are persisted and can be used to check the status of execution.

picture description

Just assume that it took the developer a whole day to figure out the problem. The next window for the execution of the job reopened at 9 PM and this time it ran the job instance of Jan 1st and it was successful at 9:30. And since it is the second day, we should run the task for January 2 as well. So we started the job instance at 9:31 pm and it ended at 10:30 pm
There is no need to start job instances one after the other if they don’t use the same data otherwise they may end up locking the DB
Since there are 2 job instances running, we should have one more entry in the instance table and 2 additional entries in the job execution table.

picture description

Step

A step is a domain object that encapsulates an independent, sequential step of a batch job. Each task can have multiple steps to facilitate batch processing. It is entirely up to the developer how complex each step can be.
It can be as simple as deriving from the DB or complex, like performing heavy operations on the DB using business logic etc. Similar to a job, each step has a step execution.

picture description

step execution
It represents a single attempt to execute a step. Every time a step is run a new step execution is created. If the stage fails to execute due to the failure of the first stage, no execution is continued for it. Each stage of execution has an ExecutionContext which contains the data required by the developer which is persisted during the batch run.
For example the state information needs to be restarted.
Some of the stage execution properties are Status, Start Time, End Time, Exit Status, ExecutionContext, etc.

execution context
It represents a collection of key value pairs that are maintained and controlled by the framework to allow developers to store persistent data that falls within the scope of a step execution obj or job execution object.
Example: ExecutionContext.putLong(GetKey(LINES_READ_COUNT), Reader.GetPosition());

picture description

picture description

Suppose you are reading lines from a file. Here ‘LINES_READ_COUNT’ is the key and the value will be set by the developer through code. This key value pair is stored in a metadata table that can be easily accessed at different stages of a job.
In this case the step failed after processing 40322 lines, allowing the step to resume from the same line where it left off.
There should be exactly 1 execution context per execution step. There must be at least 1 execution context per step of execution per task.

picture description

Here, it checks whether the ExecutionContext contains LINES_READ_COUNT. If it exists lineCount receives it and stores the corresponding value of lines_read_count. It then starts reading data from the next lines in the file. Thus the execution context saved time + extra processing here as it stored the data of the lines already read.

job repository

This job launcher provides CRUD operations for job and step implementation. When a job is launched for the first time, the job is fetched from the execution repository. And during the process of task execution and step execution, other data is also retained in it which can be used by the developer. All metadata tables are part of this repository. You can also change the default names of these tables.
@EnableBatchProcessing Configures a JobRepository automatically.

job launcher

It has a simple interface to launch a job with a set of job parameters.

public interface JobLauncher {
 public JobExecution run(Job job, JobParameters jobParameters)
 throws JobExecutionAlreadyRunningException, JobRestartException,
 JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}
enter fullscreen mode

exit fullscreen mode

item reader

It is the abstract class which represents the retrieval of the input for a step. When ItemReader finishes retrieval of data, it returns null.

item writer

It is an abstract class that represents the output of a batch or chunk, one step at a time. It has no knowledge of Parvius or the next input it takes. It only knows the object that was passed in the current invocation.

item processor

It is an abstraction that represents the business processing of data. It serves as a place to convert the input data received from the item reader into the output of the item writer.
If the processed data is not valid, it returns null which means that the item writer will not include it when producing the output.

If you like to watch videos, you can follow this link.


So now you know what to do. If you found this useful, you know what to do now. Hit that clap button and follow me to get more articles and tutorials on your feed.❤❤

Leave a Comment