Spring Batch – Read from CSV and write to relational DB

In Spring Batch, we often need read data from CSV file and write it into relational database or NoSQL or convert it into another format like XML or JSON. There are few source systems like MainFrame which generates FlatFile and we must need to read data from the file, processes or transform it and save to the location where its intended.

Lets begin – In this tutorial we’ll use MySQL as a database for persistent store.

pom.xml – This file contains a list of dependencies required to run the project.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>io.spring.batch</groupId>
	<artifactId>database-output</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>databaseOutput</name>
	<description>JDBC based item writing</description>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.1.1.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<!-- Spring Batch -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>
		<!-- JDBC -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-jdbc</artifactId>
		</dependency>
		<!-- MYSQL DB -->
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
		</dependency>
		<!-- Test -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<!-- Lombok -->
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
		</dependency>
	</dependencies>
	
	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
	

</project>

Person.java – Its a domain class which holds Person details.


@Data
@AllArgsConstructor
@NoArgsConstructor
@Builder
public class Person {
	private long id;
	private String firstName;
	private String lastName;
	private Date birthdate;
}

PersonFieldSetMapper – Interface that is used to map data obtained from a FieldSet into an object.


public class PersonFieldSetMapper implements FieldSetMapper<Person> {

	@Override
	public Person mapFieldSet(FieldSet fieldSet) throws BindException {
		return new Person(fieldSet.readLong("id"),
				fieldSet.readString("firstName"),
				fieldSet.readString("lastName"),
				fieldSet.readDate("birthdate", "yyyy-MM-dd HH:mm:ss"));
	}
}

application.properties – This file holds the database related information which helps spring boot to create datasource.

spring.batch.job.enabled=false means we’re disabling batch to run on start of application context.
spring.batch.initialize-schema=always – means we’re asking Spring Batch to create the metadata tables. Metadata tables holds information about Batch Job instance, execution and parameter details.

spring.datasource.driverClassName=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3306/test
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.platform=mysql
spring.datasource.continueOnError=false
spring.batch.initialize-schema=always
spring.batch.job.enabled=false

JobConfiguration.java

  • FlatFileItemReader – Restartable ItemReader that reads lines from input setResource(Resource). Line is defined by the setRecordSeparatorPolicy(RecordSeparatorPolicy) and mapped to item using setLineMapper(LineMapper). If an exception is thrown during line mapping it is rethrown as FlatFileParseException adding information about the problematic line and its line number.
  • JdbcBatchItemWriter – The writer is thread-safe after its properties are set (normal singleton behavior), so it can be used to write in multiple concurrent transactions. ItemWriter that uses the batching features from NamedParameterJdbcTemplate to execute a batch of statements for all items provided. The user must provide an SQL query and a special callback for either of ItemPreparedStatementSetter or ItemSqlParameterSourceProvider. You can use either named parameters or the traditional ‘?’ placeholders.
  • Step – Batch domain interface representing the configuration of a step. As with the Job, a Step is meant to explicitly represent the configuration of a step by a developer, but also the ability to execute the step.
  • Job – Batch domain object representing a job. Job is an explicit abstraction representing the configuration of a job specified by a developer. It should be noted that restart policy is applied to the job as a whole and not to a step.
import javax.sql.DataSource;

import io.spring.batch.domain.Person;
import io.spring.batch.domain.PersonFieldSetMapper;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;


@Configuration
public class JobConfiguration {

	@Autowired
	public JobBuilderFactory jobBuilderFactory;

	@Autowired
	public StepBuilderFactory stepBuilderFactory;

	@Autowired
	public DataSource dataSource;

	@Bean
	public FlatFileItemReader<Person> personItemReader() {
		FlatFileItemReader<Person> reader = new FlatFileItemReader<>();
		reader.setLinesToSkip(1);
		reader.setResource(new ClassPathResource("/data/person.csv"));

		DefaultLineMapper<Person> customerLineMapper = new DefaultLineMapper<>();

		DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
		tokenizer.setNames(new String[] {"id", "firstName", "lastName", "birthdate"});

		customerLineMapper.setLineTokenizer(tokenizer);
		customerLineMapper.setFieldSetMapper(new PersonFieldSetMapper());
		customerLineMapper.afterPropertiesSet();
		reader.setLineMapper(customerLineMapper);
		return reader;
	}

	@SuppressWarnings({ "rawtypes", "unchecked" })
	@Bean
	public JdbcBatchItemWriter<Person> personItemWriter() {
		JdbcBatchItemWriter<Person> itemWriter = new JdbcBatchItemWriter<>();

		itemWriter.setDataSource(this.dataSource);
		itemWriter.setSql("INSERT INTO PERSON VALUES (:id, :firstName, :lastName, :birthdate)");
		itemWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider());
		itemWriter.afterPropertiesSet();

		return itemWriter;
	}

	@Bean
	public Step step1() {
		return stepBuilderFactory.get("step1")
				.<Person, Person>chunk(10)
				.reader(personItemReader())
				.writer(personItemWriter())
				.build();
	}

	@Bean
	public Job job() {
		return jobBuilderFactory.get("job")
				.start(step1())
				.build();
	}
}

MainApp – This is spring boot main class, just right click and run the app

  • JobLauncher – Simple interface for controlling jobs, including possible ad-hoc executions, based on different runtime identifiers. It is extremely important to note that this interface makes absolutely no guarantees about whether or not calls to it are executed synchronously or asynchronously. The javadocs for specific implementations should be checked to ensure callers fully understand how the job will be run.

import java.util.UUID;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@EnableBatchProcessing
public class SaveinDBApplication implements CommandLineRunner {

	@Autowired
	private JobLauncher jobLauncher;

	@Autowired
	private Job job;
	
	public static void main(String[] args) {
		SpringApplication.run(DatabaseOutputApplication.class, args);
	}

	@Override
	public void run(String... args) throws Exception {
		JobParameters jobParameters = new JobParametersBuilder()
                .addString("date", UUID.randomUUID().toString())
                .addLong("JobId",System.currentTimeMillis())
                .addLong("time",System.currentTimeMillis()).toJobParameters();
		
		JobExecution execution = jobLauncher.run(job, jobParameters);
		System.out.println("STATUS :: "+execution.getStatus());
	}
}

person.csv – This file holds sample data to be read. Create this file under data folder under src/main/resources.

id,firstName,lastName,birthdate
1,Neha,Limay,1964-10-19 14:11:03
2,Gaurav,Nene,1977-12-11 21:44:30
3,Sandeep,Joshi,1986-12-25 11:54:28
4,Parag,Rane,1959-07-24 06:00:16
5,Sachit,Patil,1956-09-14 06:49:28
6,Suchita,Vinchurkar,1984-08-30 04:18:10
7,Aditi,Nerkar,1973-02-04 05:26:05
8,Shankar,Parate,1953-04-26 11:16:26
9,Tania,Datta,1951-06-24 14:56:51
10,Sandhya,Potthuri,1953-08-27 13:15:08
11,Santosh,Pande,1957-09-05 21:36:47
12,Ajinkya,Deshpande,1979-01-21 18:31:27
13,Kiran,Giradkar,1965-07-18 15:05:22
14,Deepak,Patil,1990-09-11 15:52:54
15,Suraj,Bhamre,1979-06-01 06:58:54
16,Anup,Tarone,1990-07-02 17:36:35
17,Pradeep,Rodge,1959-12-19 20:23:12
18,Pawan,Bawankar,1984-12-27 10:36:49
19,Sheetal,Kale,1962-06-23 20:03:40
20,Reetisha,Hedau,1988-11-12 19:05:13

schema.sql – Please make sure to create the table before you run the app.

CREATE TABLE `person` (
  `id` mediumint(8) unsigned NOT NULL,
  `firstName` varchar(255) default NULL,
  `lastName` varchar(255) default NULL,
  `birthdate` varchar(255),
  PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;

Console – You should be able to see below logs. This shows batch job has executed successfully.

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.1.RELEASE)

2021-01-04 14:31:30.519  INFO 21148 --- [           main] i.s.batch.DatabaseOutputApplication      : Starting DatabaseOutputApplication on with PID 21148 ()
2021-01-04 14:31:30.522  INFO 21148 --- [           main] i.s.batch.DatabaseOutputApplication      : No active profile set, falling back to default profiles: default
2021-01-04 14:31:31.095  INFO 21148 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2021-01-04 14:31:31.595  INFO 21148 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2021-01-04 14:31:31.725  INFO 21148 --- [           main] o.s.b.c.r.s.JobRepositoryFactoryBean     : No database type set, using meta data indicating: MYSQL
2021-01-04 14:31:31.784  INFO 21148 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : No TaskExecutor has been set, defaulting to synchronous executor.
2021-01-04 14:31:31.885  INFO 21148 --- [           main] i.s.batch.DatabaseOutputApplication      : Started DatabaseOutputApplication in 1.574 seconds (JVM running for 2.627)
2021-01-04 14:31:31.998  INFO 21148 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=job]] launched with the following parameters: [{date=874ffd69-f6d4-4f9c-b27a-c7dd692e969d, JobId=1609750891887, time=1609750891887}]
2021-01-04 14:31:32.034  INFO 21148 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [step1]
2021-01-04 14:31:32.129  INFO 21148 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=job]] completed with the following parameters: [{date=874ffd69-f6d4-4f9c-b27a-c7dd692e969d, JobId=1609750891887, time=1609750891887}] and the following status: [COMPLETED]
STATUS :: COMPLETED
2021-01-04 14:31:32.132  INFO 21148 --- [       Thread-3] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2021-01-04 14:31:32.135  INFO 21148 --- [       Thread-3] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.

Result from MySQL DB – From the result we can conclude that Spring Batch has read the data from CSV file and loaded into MySQL DB.

mysql> use test;
Database changed

mysql> select * from person;
+----+-----------+------------+-----------------------+
| id | firstName | lastName   | birthdate             |
+----+-----------+------------+-----------------------+
|  1 | Neha      | Limay      | 1964-10-19 14:11:03.0 |
|  2 | Gaurav    | Nene       | 1977-12-11 21:44:30.0 |
|  3 | Sandeep   | Joshi      | 1986-12-25 11:54:28.0 |
|  4 | Parag     | Rane       | 1959-07-24 06:00:16.0 |
|  5 | Sachit    | Patil      | 1956-09-14 06:49:28.0 |
|  6 | Suchita   | Vinchurkar | 1984-08-30 04:18:10.0 |
|  7 | Aditi     | Nerkar     | 1973-02-04 05:26:05.0 |
|  8 | Shankar   | Parate     | 1953-04-26 11:16:26.0 |
|  9 | Tania     | Datta      | 1951-06-24 14:56:51.0 |
| 10 | Sandhya   | Potthuri   | 1953-08-27 13:15:08.0 |
| 11 | Santosh   | Pande      | 1957-09-05 21:36:47.0 |
| 12 | Ajinkya   | Deshpande  | 1979-01-21 18:31:27.0 |
| 13 | Kiran     | Giradkar   | 1965-07-18 15:05:22.0 |
| 14 | Deepak    | Patil      | 1990-09-11 15:52:54.0 |
| 15 | Suraj     | Bhamre     | 1979-06-01 06:58:54.0 |
| 16 | Anup      | Tarone     | 1990-07-02 17:36:35.0 |
| 17 | Pradeep   | Rodge      | 1959-12-19 20:23:12.0 |
| 18 | Pawan     | Bawankar   | 1984-12-27 10:36:49.0 |
| 19 | Sheetal   | Kale       | 1962-06-23 20:03:40.0 |
| 20 | Reetisha  | Hedau      | 1988-11-12 19:05:13.0 |
+----+-----------+------------+-----------------------+
20 rows in set (0.00 sec)
Previous
Next