Namespace Lucene.Net.Facet.Taxonomy.Directory
Taxonomy index implementation using on top of a Directory.
Classes
DirectoryTaxonomyReader
A TaxonomyReader which retrieves stored taxonomy information from a Directory.
Reading from the on-disk index on every method call is too slow, so this implementation employs caching: Some methods cache recent requests and their results, while other methods prefetch all the data into memory and then provide answers directly from in-memory tables. See the documentation of individual methods for comments on their performance.
DirectoryTaxonomyWriter
ITaxonomyWriter which uses a Directory to store the taxonomy information on disk, and keeps an additional in-memory cache of some or all categories.
In addition to the permanently-stored information in the Directory, efficiency dictates that we also keep an in-memory cache of recently seen or all categories, so that we do not need to go back to disk for every category addition to see which ordinal this category already has, if any. A ITaxonomyWriterCache object determines the specific caching algorithm used.
This class offers some hooks for extending classes to control the IndexWriter instance that is used. See OpenIndexWriter(Directory, IndexWriterConfig).
DirectoryTaxonomyWriter.DiskOrdinalMap
DirectoryTaxonomyWriter.IOrdinalMap maintained on file system
DirectoryTaxonomyWriter.MemoryOrdinalMap
DirectoryTaxonomyWriter.IOrdinalMap maintained in memory
Interfaces
DirectoryTaxonomyWriter.IOrdinalMap
Mapping from old ordinal to new ordinals, used when merging indexes wit separate taxonomies.
AddMapping(Int32, Int32) merges one or more taxonomies into the given taxonomy (this). An DirectoryTaxonomyWriter.IOrdinalMap is filled for each of the added taxonomies, containing the new ordinal (in the merged taxonomy) of each of the categories in the old taxonomy.
There exist two implementations of DirectoryTaxonomyWriter.IOrdinalMap: DirectoryTaxonomyWriter.MemoryOrdinalMap and
DirectoryTaxonomyWriter.DiskOrdinalMap. As their names suggest, the former keeps the map in
memory and the latter in a temporary disk file. Because these maps will
later be needed one by one (to remap the counting lists), not all at the
same time, it is recommended to put the first taxonomy's map in memory,
and all the rest on disk (later to be automatically read into memory one
by one, when needed).