Mapping entities to the index structure

In this chapter we will discuss about the entity mapping to the index structure. You have already learned that all the metadata information needed to index entities is described through annotations. There is no need for xml mapping files. You can still use Hibernate mapping files for the basic Hibernate configuration, but the Hibernate Search specific configuration has to be expressed via annotations.
Mapping an entity-

1. A Basic mapping: There are following most commonly used annotations for mapping an entity.

@Indexed-
Foremost we must declare a persistent class as indexable. This is done by annotating the class with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process).
@Entity
@Indexed
public class Book{
    ...
}
You can optionally specify the index attribute of the @Indexed annotation to change the default name of the index.

@Field

For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (no annotation present) means that the property is ignored by the indexing process. @Field does declare a property as indexed and allows to configure several aspects of the indexing process by setting one or more of the following attributes:
  • name : describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)
  • store : describe whether or not the property is stored in the Lucene index. You can store the value Store.YES (consuming more space in the index but allowing projection), store it in a compressed way Store.COMPRESS (this does consume more CPU), or avoid any storage Store.NO (this is the default value). When a property is stored, you can retrieve its original value from the Lucene Document. This is not related to whether the element is indexed or not.
  • index: describe whether the property is indexed or not. The different values are Index.NO (no indexing, i.e. cannot be found by a query), Index.YES (the element gets indexed and is searchable). The default value is Index.YES. Index.NO can be useful for cases where a property does is not required to be searchable, but should be available for projection.
  • analyze: determines whether the property is analyzed (Analyze.YES) or not (Analyze.NO). The default value is Analyze.YES.
  • norms: describes whether index time boosting information should be stored (Norms.YES) or not (Norms.NO). Not storing it can save a considerable amount of memory, but there won't be any index time boosting information available. The default value is Norms.YES.
  • termVector: describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value is TermVector.NO.
  • indexNullAs : Per default null values are ignored and not indexed. However, using indexNullAs you can specify a string which will be inserted as token for the null value. Per default this value is set to Field.DO_NOT_INDEX_NULL indicating that null values should not be indexed. You can set this value to Field.DEFAULT_NULL_TOKEN to indicate that a default null token should be used. This default null token can be specified in the configuration using hibernate.search.default_null_token. If this property is not set and you specify Field.DEFAULT_NULL_TOKEN the string "_null_" will be used as default.
@NumericField-
There is a companion annotation to @Field called @NumericField that can be specified in the same scope as @Field or @DocumentId. It can be specified for Integer, Long, Float and Double properties. At index time the value will be indexed using a Tree structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following parameters:

  • forField (Optional) Specify the name of of the related @Field that will be indexed as numeric. It's only mandatory when the property contains more than a @Field declaration
  • precisionStep (Optional) Change the way that the Trie structure is stored in the index. Smaller precisionSteps lead to more disk space usage and faster range and sort queries. Larger values lead to less space used and range query performance more close to the range query in normal @Fields. Default value is 4.
@Id-
Finally, the id property of an entity is a special property used by Hibernate Search to ensure index unicity of a given entity. By design, an id has to be stored and must not be tokenized. To mark a property as index id, use the @DocumentId annotation. If you are using JPA and you have specified @Id you can omit @DocumentId. The chosen entity id will also be used as document id.
@Entity
@Indexed
public class Book{
    ...

    @Id
    @DocumentId
    public Long getId() { return id; }

    @Field(name="Abstract", store=Store.YES)
    public String getSummary() { return summary; }

    @Lob
    @Field
    public String getText() { return text; }

    @Field 
    @NumericField( precisionStep = 6)
    public float getGrade() { return grade; }
}
2. A Mapping properties multiple times: Sometimes one has to map a property multiple times per index, with slightly different indexing strategies. For example, sorting a query by field requires the field to be un-analyzed. If one wants to search by words in this property and still sort it, one need to index it twice - once analyzed and once un-analyzed. @Fields allows to achieve this goal.
@Entity
@Indexed(index = "Book" )
public class Book {
    @Fields( {
            @Field,
            @Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES)
            } )
    public String getSummary() {
        return summary;
    }

    ...
}
@Field supports 2 attributes useful when @Fields is used:
  • analyzer: defines a @Analyzer annotation per field rather than per property
  • bridge: defines a @FieldBridge annotation per field rather than per property

3. A Embedded and associated objects:
Associated objects as well as embedded objects can be indexed as part of the root entity index. This is useful if you expect to search a given entity based on properties of associated objects. 
@Entity
@Indexed
public class Place {
    @Id
    @GeneratedValue
    @DocumentId
    private Long id;

    @Field
    private String name;

    @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } )
    @IndexedEmbedded
    private Address address;
    ....
}

@Entity
public class Address {
    @Id
    @GeneratedValue
    private Long id;

    @Field
    private String street;

    @Field
    private String city;

    @ContainedIn
    @OneToMany(mappedBy="address")
    private Set<Place> places;
    ...
}
4. A Analysis:
Analysis is the process of converting text into single terms (words) and can be considered as one of the key features of a fulltext search engine. Lucene uses the concept of Analyzers to control this process. In the following section we cover the multiple ways Hibernate Search offers to configure the analyzers.
Default analyzer and analyzer by class-
The default analyzer class used to index tokenized fields is configurable through the hibernate.search.analyzer property. The default value for this property is org.apache.lucene.analysis.standard.StandardAnalyzer.
You can also define the analyzer class per entity, property and even per @Field (useful when multiple fields are indexed from a single property).
@Entity
@Indexed
@Analyzer(impl = EntityAnalyzer.class)
public class MyEntity {
    @Id
    @GeneratedValue
    @DocumentId
    private Integer id;

    @Field
    private String name;

    @Field
    @Analyzer(impl = PropertyAnalyzer.class)
    private String summary;

    @Field(analyzer = @Analyzer(impl = FieldAnalyzer.class)
    private String body;

    ...
}
In this example, EntityAnalyzer is used to index all tokenized properties (eg. name), except summary and body which are indexed with PropertyAnalyzer and FieldAnalyzer respectively.
Named analyzers-
Analyzers can become quite complex to deal with. For this reason introduces Hibernate Search the notion of analyzer definitions. An analyzer definition can be reused by many @Analyzer declarations and is composed of:
  • a name: the unique string used to refer to the definition
  • a list of char filters: each char filter is responsible to pre-process input characters before the tokenization. Char filters can add, change or remove characters; one common usage is for characters normalization
  • a tokenizer: responsible for tokenizing the input stream into individual words
  • a list of filters: each filter is responsible to remove, modify or sometimes even add words into the stream provided by the tokenizer
This separation of tasks - a list of char filters, and a tokenizer followed by a list of filters - allows for easy reuse of each individual component and let you build your customized analyzer in a very flexible way (just like Lego). Generally speaking the char filters do some pre-processing in the character input, then the Tokenizer starts the tokenizing process by turning the character input into tokens which are then further processed by the TokenFilters. Hibernate Search supports this infrastructure by utilizing the Solr analyzer framework.
@Entity
@Indexed
@AnalyzerDef(name = "customanalyzer",tokenizer =@TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
 @Parameter(name = "language", value = "English")
  })
})
public class Book {
 ....
}




<<Previous <<   || Index ||   >>Next >>








No comments:

Post a Comment