Show / Hide Table of Contents

    Namespace Lucene.Net.Facet.Taxonomy

    Taxonomy of Categories

    Facets are defined using a hierarchy of categories, known as a _Taxonomy_.
    For example, the taxonomy of a book store application might have the following structure:
    
    • Author

      • Mark Twain

      • J. K. Rowling

    • Date

      • 2010

      • March

      • April

      • 2009

      The Taxonomy translates category-paths into interger identifiers (often termed ordinals) and vice versa. The category Author/Mark Twain adds two nodes to the taxonomy: Author and Author/Mark Twain, each is assigned a different ordinal. The taxonomy maintains the invariant that a node always has an ordinal that is < all its children.

    Classes

    AssociationFacetField

    Add an instance of this to your to add a facet label associated with an arbitrary byte[]. This will require a custom Facets implementation at search time; see Int32AssociationFacetField and SingleAssociationFacetField to use existing Facets implementations.

    @lucene.experimental

    CachedOrdinalsReader

    A per-segment cache of documents' facet ordinals. Every CachedOrdinalsReader.CachedOrds holds the ordinals in a raw int[], and therefore consumes as much RAM as the total number of ordinals found in the segment, but saves the CPU cost of decoding ordinals during facet counting.

    NOTE: every CachedOrdinalsReader.CachedOrds is limited to 2.1B total ordinals. If that is a limitation for you then consider limiting the segment size to fewer documents, or use an alternative cache which pages through the category ordinals.

    NOTE: when using this cache, it is advised to use a that does not cache the data in memory, at least for the category lists fields, or otherwise you'll be doing double-caching.

    NOTE: create one instance of this and re-use it for all facet implementations (the cache is per-instance, not static).

    CachedOrdinalsReader.CachedOrds

    Holds the cached ordinals in two parallel int[] arrays.

    CategoryPath

    Holds a sequence of string components, specifying the hierarchical name of a category.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    DocValuesOrdinalsReader

    Decodes ordinals previously indexed into a field

    FacetLabel

    Holds a sequence of string components, specifying the hierarchical name of a category.

    This is a Lucene.NET INTERNAL API, use at your own risk

    FastTaxonomyFacetCounts

    Computes facets counts, assuming the default encoding into DocValues was used.

    @lucene.experimental

    Int32AssociationFacetField

    Add an instance of this to your to add a facet label associated with an . Use TaxonomyFacetSumInt32Associations to aggregate int values per facet label at search time.

    NOTE: This was IntAssociationFacetField in Lucene

    @lucene.experimental

    Int32TaxonomyFacets

    Base class for all taxonomy-based facets that aggregate to a per-ords int[].

    NOTE: This was IntTaxonomyFacets in Lucene

    LRUHashMap<TKey, TValue>

    LRUHashMap<TKey, TValue> is similar to of Java's HashMap, which has a bounded Limit; When it reaches that Limit, each time a new element is added, the least recently used (LRU) entry is removed.

    Unlike the Java Lucene implementation, this one is thread safe because it is backed by the . Do note that every time an element is read from LRUHashMap<TKey, TValue>, a write operation also takes place to update the element's last access time. This is because the LRU order needs to be remembered to determine which element to evict when the Limit is exceeded.

    @lucene.experimental

    OrdinalsReader

    Provides per-document ordinals.

    OrdinalsReader.OrdinalsSegmentReader

    Returns ordinals for documents in one segment.

    ParallelTaxonomyArrays

    Returns 3 arrays for traversing the taxonomy:

    • Parents: Parents[i] denotes the parent of category ordinal i.
    • Children: Children[i] denotes a child of category ordinal i.
    • Siblings: Siblings[i] denotes the sibling of category ordinal i.

    To traverse the taxonomy tree, you typically start with Children[0] (ordinal 0 is reserved for ROOT), and then depends if you want to do DFS or BFS, you call Children[Children[0]] or Siblings[Children[0]] and so forth, respectively.

    NOTE: you are not expected to modify the values of the arrays, since the arrays are shared with other threads. @lucene.experimental

    PrintTaxonomyStats

    Prints how many ords are under each dimension.

    SearcherTaxonomyManager

    Manages near-real-time reopen of both an and a TaxonomyReader.

    NOTE: If you call ReplaceTaxonomy(Store.Directory) then you must open a new SearcherTaxonomyManager afterwards.

    SearcherTaxonomyManager.SearcherAndTaxonomy

    Holds a matched pair of and TaxonomyReader

    SingleAssociationFacetField

    Add an instance of this to your to add a facet label associated with a . Use TaxonomyFacetSumSingleAssociations to aggregate values per facet label at search time.

    NOTE: This was FloatAssociationFacetField in Lucene

    @lucene.experimental

    SingleTaxonomyFacets

    Base class for all taxonomy-based facets that aggregate to a per-ords float[].

    NOTE: This was FloatTaxonomyFacets in Lucene

    TaxonomyFacetCounts

    Reads from any OrdinalsReader; use FastTaxonomyFacetCounts if you are using the default encoding from .

    @lucene.experimental

    TaxonomyFacets

    Base class for all taxonomy-based facets impls.

    TaxonomyFacetSumInt32Associations

    Aggregates sum of values previously indexed with Int32AssociationFacetField, assuming the default encoding.

    NOTE: This was TaxonomyFacetSumIntAssociations in Lucene

    @lucene.experimental

    TaxonomyFacetSumSingleAssociations

    Aggregates sum of values previously indexed with SingleAssociationFacetField, assuming the default encoding.

    NOTE: This was TaxonomyFacetSumFloatAssociations in Lucene

    @lucene.experimental

    TaxonomyFacetSumValueSource

    Aggregates sum of values from DoubleVal(Int32) and DoubleVal(Int32, Double[]), for each facet label.

    @lucene.experimental

    TaxonomyFacetSumValueSource.ScoreValueSource

    ValueSource that returns the score for each hit; use this to aggregate the sum of all hit scores for each facet label.

    TaxonomyReader

    TaxonomyReader is the read-only interface with which the faceted-search library uses the taxonomy during search time.

    A TaxonomyReader holds a list of categories. Each category has a serial number which we call an "ordinal", and a hierarchical "path" name:

    • The ordinal is an integer that starts at 0 for the first category (which is always the root category), and grows contiguously as more categories are added; Note that once a category is added, it can never be deleted.
    • The path is a CategoryPath object specifying the category's position in the hierarchy.

    Notes about concurrent access to the taxonomy:

    An implementation must allow multiple readers to be active concurrently with a single writer. Readers follow so-called "point in time" semantics, i.e., a TaxonomyReader object will only see taxonomy entries which were available at the time it was created. What the writer writes is only available to (new) readers after the writer's is called.

    In faceted search, two separate indices are used: the main Lucene index, and the taxonomy. Because the main index refers to the categories listed in the taxonomy, it is important to open the taxonomy after opening the main index, and it is also necessary to Reopen() the taxonomy after Reopen()ing the main index.

    This order is important, otherwise it would be possible for the main index to refer to a category which is not yet visible in the old snapshot of the taxonomy. Note that it is indeed fine for the the taxonomy to be opened after the main index - even a long time after. The reason is that once a category is added to the taxonomy, it can never be changed or deleted, so there is no danger that a "too new" taxonomy not being consistent with an older index.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    TaxonomyReader.ChildrenIterator

    An iterator over a category's children.

    Interfaces

    ITaxonomyWriter

    ITaxonomyWriter is the interface which the faceted-search library uses to dynamically build the taxonomy at indexing time.

    Notes about concurrent access to the taxonomy:

    An implementation must allow multiple readers and a single writer to be active concurrently. Readers follow so-called "point in time" semantics, i.e., a reader object will only see taxonomy entries which were available at the time it was created. What the writer writes is only available to (new) readers after the writer's is called.

    Faceted search keeps two indices - namely Lucene's main index, and this taxonomy index. When one or more readers are active concurrently with the writer, care must be taken to avoid an inconsistency between the state of these two indices: When writing to the indices, the taxonomy must always be committed to disk before the main index, because the main index refers to categories listed in the taxonomy. Such control can best be achieved by turning off the main index's "autocommit" feature, and explicitly calling for both indices (first for the taxonomy, then for the main index). In old versions of Lucene (2.2 or earlier), when autocommit could not be turned off, a more complicated solution needs to be used. E.g., use some sort of (possibly inter-process) locking to ensure that a reader is being opened only right after both indices have been flushed (and before anything else is written to them).

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)