Hibernate Search = Hibernate Core + Apache Lucene search engine
Hibernate Search consists of an indexing component as well as an index search component. Both are backed by Apache Lucene.
Hibernate Search i.e. Hibernate indexing component + index search component
Each time an entity is inserted, updated or removed in/from the database, Hibernate Search keeps track of this event (through the Hibernate event system) and schedules an index update. All these updates are handled without you having to interact with the Apache Lucene APIs directly Instead, the interaction with the underlying Lucene indexes is handled via so called IndexManagers.
Each Lucene index is managed by one index manager which is uniquely identified by name. In most cases there is also a one to one relationship between an indexed entity and a single IndexManager. The exceptions are the use cases of index sharding and index sharing. The former can be applied when the index for a single entity becomes too big and indexing operations are slowing down the application. In this case a single entity is indexed into multiple indexes each with its own index manager.
Once the index is created, you can search for entities and return lists of managed entities saving you the tedious object to Lucene Document mapping. The same persistence context is shared between Hibernate and Hibernate Search. As a matter of fact, the FullTextSession is built on top of the Hibernate Session so that the application code can use the unified org.hibernate.Query or javax.persistence.Query APIs exactly the same way a HQL, JPA-QL or native query would do.
Back end Process-
Hibernate Search offers the ability to let the batched work being processed by different back ends. Several back ends are provided out of the box and you have the option to plugin your own. It is important to understand that in this context back end encompasses more than just the configuration option hibernate.search.default.worker.backend. This property just specifies a implementation of the BackendQueueProcessor interface which is a part of a back end configuration. In most cases, however, additional configuration settings are needed to successfully configure a specific backend setup, like for example the JMS back end.
In the above mode, all index update operations applied on a given node (JVM) will be executed to the Lucene directories (through the directory providers) by the same node. This mode is typically used in non clustered environment or in clustered environments where the directory store is shared.
This mode targets non clustered applications, or clustered applications where the Directory is taking care of the locking strategy.
The main advantage is simplicity and immediate visibility of the changes in Lucene queries (a requirement in some applications).
An alternative back end viable for non-clustered and non-shared index configurations is the near-real-time backend.
All index update operations applied on a given node are sent to a JMS queue. A unique reader will then process the queue and update the master index. The master index is then replicated on a regular basis to the slave copies. This is known as the master/slaves pattern. The master is the sole responsible for updating the Lucene index. The slaves can accept read as well as write operations. However, they only process the read operation on their local index copy and delegate the update operations to the master.
This mode targets clustered environments where throughput is critical, and index update delays are affordable. Reliability is ensured by the JMS provider and by having the slaves working on a local copy of the index.
When executing a query, Hibernate Search interacts with the Apache Lucene indexes through a reader strategy. Choosing a reader strategy will depend on the profile of the application (frequent updates, read mostly, asynchronous index update etc).
With this strategy, Hibernate Search will share the same IndexReader, for a given Lucene index, across multiple queries and threads provided that the IndexReader is still up-to-date. If the IndexReader is not up-to-date, a new one is opened and provided. Each IndexReader is made of several SegmentReaders. This strategy only reopens segments that have been modified or created after last opening and shares the already loaded segments from the previous instance. This strategy is the default.
The name of this strategy is shared.
Every time a query is executed, a Lucene IndexReader is opened. This strategy is not the most efficient since opening and warming up an IndexReader can be a relatively expensive operation.
The name of this strategy is not-shared.
You can write your own reader strategy that suits your application needs by implementing org.hibernate.search.reader.ReaderProvider. The implementation must be thread safe.
Full text search engines like Apache Lucene are very powerful technologies to add efficient free text search capabilities to applications. However, Lucene suffers several mismatches when dealing with object domain models. Amongst other things indexes have to be kept up to date and mismatches between index structure and domain model as well as query mismatches have to be avoided.
Hibernate Search addresses these shortcomings. It indexes your domain model with the help of a few annotations, takes care of database/index synchronization and brings back regular managed objects from free text queries. Hence, it solves:
- The structural mismatch: Hibernate Search takes care of the object/index translation
- The duplication mismatch: Hibernate Search manages the index, keeps changes synchronized with your database, and optimizes the index access transparently
- The API mismatch: Hibernate Search lets you query the index and retrieve managed objects as any regular Hibernate query would do.
Even though Hibernate Search is using Apache Lucene™ under the hood you can always fallback to the native Lucene APIs if the need arises.
Depending on application needs, Hibernate Search works well in non-clustered and clustered mode, provides synchronous and asynchronous index updates, allowing you to make an active choice between response, throughput and index update time.