Hibernate Batch Processing : Chapter 34

Suppose there is one situation in which you have to insert 1000000 records in to database in a time. So what to do in this situation...
In Native Solution in the Hibernate
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<1000000; i++ )
{
    Student student = new Student(.....);
    session.save(student);
}
tx.commit();
session.close();
Because by default, Hibernate will cache all the persisted objects in the session-level cache and ultimately your application would fall over with an OutOfMemoryException somewhere around the 50,000th row. You can resolve this problem if you are using batch processing with Hibernate.

To use the batch processing feature, first set hibernate.jdbc.batch_size as batch size to a number either at 20 or 50 depending on object size. This will tell the hibernate container that every X rows to be inserted as batch. To implement this in your code we would need to do little modification as follows:
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<1000000; i++ ) 
{
    Student student = new Student(.....);
    session.save(employee);
    if( i % 50 == 0 ) // Same as the JDBC batch size
    { 
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
Above code will work fine for the INSERT operation, but if you want to make UPDATE operation then you can achieve using the following code:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
ScrollableResults studentCursor = session.createQuery("FROM STUDENT").scroll();
int count = 0;
while(studentCursor .next())
 {
   Student student = (Student) studentCursor.get(0);
   student.setName("DEV");
   seession.update(student); 
   if ( ++count % 50 == 0 ) {
      session.flush();
      session.clear();
   }
}
tx.commit();
session.close();
In Batch Processing Solution in the Hibernate 
If you are undertaking batch processing you will need to enable the use of JDBC batching. This is absolutely essential if you want to achieve optimal performance. Set the JDBC batch size to a reasonable number (10-50).
hibernate.jdbc.batch_size 50
 You can also do this kind of work in a process where interaction with the second-level cache is completely disabled: 
hibernate.cache.use_second_level_cache false

hibernate.cfg.xml
<hibernate-configuration> 
 <session-factory> 
  
   <property name="connection.driver_class">com.mysql.jdbc.Driver</property> 
   <property name="connection.url">jdbc:mysql://localhost:3306/hibernateDB2</property> 
   <property name="connection.username">root</property> 
   <property name="connection.password">root</property> 

  
   <property name="connection.pool_size">1</property> 
   
   
   <property name="hibernate.jdbc.batch_size"> 50 </property>

  
   <property name="dialect">org.hibernate.dialect.MySQLDialect</property> 

  
    <property name="current_session_context_class">thread</property> 
  
  
   <property name="hibernate.cache.use_second_level_cache">false</property>
   <property name="cache.provider_class">org.hibernate.cache.EhCacheProvider</property> 

  
   <property name="show_sql">true</property> 
  
  
   <property name="hbm2ddl.auto">update</property> 
   
   
   <mapping class="com.sdnext.hibernate.tutorial.dto.Student">
      
  </mapping></session-factory> 
 </hibernate-configuration>
Student.java
package com.sdnext.hibernate.tutorial.dto;

import java.io.Serializable;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name="STUDENT")
public class Student implements Serializable 
{
 /**
  * serialVersionUID
  */
 private static final long serialVersionUID = 8633415090390966715L;
 @Id
 @Column(name="ID")
 @GeneratedValue(strategy=GenerationType.AUTO)
 private int id;
 @Column(name="STUDENT_NAME")
 private String studentName;
 @Column(name="ROLL_NUMBER")
 private int rollNumber;
 @Column(name="COURSE")
 private String course;
 public int getId() {
  return id;
 }
 public void setId(int id) {
  this.id = id;
 }
 public String getStudentName() {
  return studentName;
 }
 public void setStudentName(String studentName) {
  this.studentName = studentName;
 }
 public int getRollNumber() {
  return rollNumber;
 }
 public void setRollNumber(int rollNumber) {
  this.rollNumber = rollNumber;
 }
 public String getCourse() {
  return course;
 }
 public void setCourse(String course) {
  this.course = course;
 }
 public String toString()
 {
  return "ROLL Number: "+rollNumber+"| Name: "+studentName+"| Course: "+course;
 }
}
HibernateTestDemo.java
package com.sdnext.hibernate.tutorial;

import org.hibernate.Session;
import org.hibernate.SessionFactory;
import org.hibernate.Transaction;
import org.hibernate.cfg.AnnotationConfiguration;

import com.sdnext.hibernate.tutorial.dto.Student;


public class HibernateTestDemo {

 /**
  * @param args
  */
 public static void main(String[] args) 
 {
  SessionFactory sessionFactory = new AnnotationConfiguration().configure().buildSessionFactory();
  Session session = sessionFactory.openSession();
  Transaction transaction = session.beginTransaction();
  
  for ( int i=0; i<100000; i++ )
  {
            String studentName = "DINESH " + i;
            int rollNumber = 9 + i;
            String course = "MCA " + i;
            Student student = new Student();
            student.setStudentName(studentName);
            student.setRollNumber(rollNumber);
            student.setCourse(course);
            session.save(student);
          if( i % 50 == 0 ) 
          {
               session.flush();
               session.clear();
            }
  }
  transaction.commit();
  session.close();
 }

}
Output: .................................
                ..................................
                 ....................................
Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) Hibernate: insert into STUDENT (COURSE, ROLL_NUMBER, STUDENT_NAME) values (?, ?, ?) ....................................
.................................
..............................


This will create 100000 records in STUDENT table.
Hibernate batch processing is powerful but it has many pitfalls that developers must be aware of in order to use it properly and efficiently. Most people who use batch probably find out about it by trying to perform a large operation and finding out the hard way why batching is needed. They run out of memory. Once this is resolved they assume that batching is working properly. The problem is that even if you are flushing your first level cache, you may not be batching your SQL statements.
Hibernate flushes by default for the following reasons:

1. Before some queries
2. When commit() is executed
3. When session.flush() is executed

The thing to note here is that until the session is flushed, every persistent object is placed into the first level cache (your JVM's memory). So if you are iterating over a million objects you will have at least a million objects in memory.

To avoid this problem you need to call the flush() and then clear() method on the session at regular intervals. Hibernate documentation recommends that you flush every n records where n is equal to the hibernate.jdbc.batch_size parameter. A Hibernate Batch example shows a trivial batch process.
There are two reasons for batching your hibernate database interactions. The first is to maintain a reasonable first level cache size so that you do not run out memory. The second is that you want to batch the inserts and updates so that they are executed efficiently by the database. The example above will accomplish the first goal but not the second.
Student student = new Student();
Address address = new Address();
student.setName("DINESH RAJPUT");
address.setCity("DELHI");
student.setAddress(address);
session.save(student);
The problem is Hibernate looks at each SQL statement and checks to see if it is the same statement as the previously executed statement. If they are and if it hasn't reached the batch_size it will batch those two statements together using JDBC2 batch. However, if your statements look like the example above, hibernate will see alternating insert statements and will flush an individual insert statement for each record processed. So 1 million new students would equal a total of 2 million insert statements in this case. This is extremely bad for performance.



                               <<Previous Chapter 33





3 comments:

  1. Good ,Clean and Perfect Tutorial ....
    lot of thanks...

    ReplyDelete
    Replies
    1. Hi friends,
      Thanks for nice compliment and good comment.

      Delete