Show / Hide Table of Contents

    Namespace Lucene.Net.Facet.Taxonomy.WriterCache

    Improves indexing time by caching a map of CategoryPath to their Ordinal.

    Classes

    Cl2oTaxonomyWriterCache

    ITaxonomyWriterCache using CompactLabelToOrdinal. Although called cache, it maintains in memory all the mappings from category to ordinal, relying on that CompactLabelToOrdinal is an efficient mapping for this purpose.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    CollisionMap

    HashMap to store colliding labels. See CompactLabelToOrdinal for details.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    CompactLabelToOrdinal

    This is a very efficient LabelToOrdinal implementation that uses a Lucene.Net.Facet.Taxonomy.WriterCache.CharBlockArray to store all labels and a configurable number of Lucene.Net.Facet.Taxonomy.WriterCache.CompactLabelToOrdinal.HashArrays to reference the labels.

    Since the Lucene.Net.Facet.Taxonomy.WriterCache.CompactLabelToOrdinal.HashArrays don't handle collisions, a CollisionMap is used to store the colliding labels.

    This data structure grows by adding a new HashArray whenever the number of collisions in the CollisionMap exceeds Lucene.Net.Facet.Taxonomy.WriterCache.CompactLabelToOrdinal.loadFactor GetMaxOrdinal(). Growing also includes reinserting all colliding labels into the Lucene.Net.Facet.Taxonomy.WriterCache.CompactLabelToOrdinal.HashArrays to possibly reduce the number of collisions.

    For setting the Lucene.Net.Facet.Taxonomy.WriterCache.CompactLabelToOrdinal.loadFactor see CompactLabelToOrdinal(Int32, Single, Int32).

    This data structure has a much lower memory footprint (~30%) compared to a Java HashMap<String, Integer>. It also only uses a small fraction of objects a HashMap would use, thus limiting the GC overhead. Ingestion speed was also ~50% faster compared to a HashMap for 3M unique labels.

    @lucene.experimental

    LabelToOrdinal

    Abstract class for storing Label->Ordinal mappings in a taxonomy.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    LruTaxonomyWriterCache

    LRU ITaxonomyWriterCache - good choice for huge taxonomies.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    NameHashInt32CacheLRU

    An an LRU cache of mapping from name to int. Used to cache Ordinals of category paths. It uses as key, hash of the path instead of the path. This way the cache takes less RAM, but correctness depends on assuming no collisions.

    NOTE: this was NameHashIntCacheLRU in Lucene

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    NameInt32CacheLRU

    An an LRU cache of mapping from name to int. Used to cache Ordinals of category paths.

    NOTE: This was NameIntCacheLRU in Lucene

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    Interfaces

    ITaxonomyWriterCache

    ITaxonomyWriterCache is a relatively simple interface for a cache of category->ordinal mappings, used in ITaxonomyWriter implementations (such as DirectoryTaxonomyWriter).

    It basically has Put(FacetLabel, Int32) methods for adding a mapping, and Get(FacetLabel) for looking a mapping up the cache. The cache does not guarantee to hold everything that has been put into it, and might in fact selectively delete some of the mappings (e.g., the ones least recently used). This means that if Get(FacetLabel) returns a negative response, it does not necessarily mean that the category doesn't exist - just that it is not in the cache. The caller can only infer that the category doesn't exist if it knows the cache to be complete (because all the categories were loaded into the cache, and since then no Put(FacetLabel, Int32) returned true).

    However, if it does so, it should clear out large parts of the cache at once, because the user will typically need to work hard to recover from every cache cleanup (see Put(FacetLabel, Int32)'s return value).

    NOTE: the cache may be accessed concurrently by multiple threads, therefore cache implementations should take this into consideration.

    @lucene.experimental

    Enums

    LruTaxonomyWriterCache.LRUType

    Determines cache type. For guaranteed correctness - not relying on no-collisions in the hash function, LRU_STRING should be used.

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)