Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class DirectoryTaxonomyWriter

    ITaxonomyWriter which uses a Lucene.Net.Store.Directory to store the taxonomy information on disk, and keeps an additional in-memory cache of some or all categories.

    In addition to the permanently-stored information in the Lucene.Net.Store.Directory, efficiency dictates that we also keep an in-memory cache of recently seen or all categories, so that we do not need to go back to disk for every category addition to see which ordinal this category already has, if any. A ITaxonomyWriterCache object determines the specific caching algorithm used.

    For extending classes that need to control the Lucene.Net.Index.IndexWriter instance that is used, please see DirectoryTaxonomyIndexWriterFactory class.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    DirectoryTaxonomyWriter
    Implements
    ITaxonomyWriter
    IDisposable
    ITwoPhaseCommit
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Facet.Taxonomy.Directory
    Assembly: Lucene.Net.Facet.dll
    Syntax
    public class DirectoryTaxonomyWriter : ITaxonomyWriter, IDisposable, ITwoPhaseCommit

    Constructors

    DirectoryTaxonomyWriter(DirectoryTaxonomyIndexWriterFactory, Directory)

    Create this with Lucene.Net.Index.OpenMode.CREATE_OR_APPEND and CreateDefaultTaxonomyWriterCache().

    Declaration
    public DirectoryTaxonomyWriter(DirectoryTaxonomyIndexWriterFactory indexWriterFactory, Directory directory)
    Parameters
    Type Name Description
    DirectoryTaxonomyIndexWriterFactory indexWriterFactory

    The DirectoryTaxonomyIndexWriterFactory to use to create the Lucene.Net.Index.IndexWriter.

    Directory directory

    The Lucene.Net.Store.Directory in which to store the taxonomy. Note that the taxonomy is written directly to that directory (not to a subdirectory of it).

    DirectoryTaxonomyWriter(DirectoryTaxonomyIndexWriterFactory, Directory, OpenMode, ITaxonomyWriterCache)

    Construct a Taxonomy writer.

    Declaration
    public DirectoryTaxonomyWriter(DirectoryTaxonomyIndexWriterFactory indexWriterFactory, Directory directory, OpenMode openMode, ITaxonomyWriterCache cache)
    Parameters
    Type Name Description
    DirectoryTaxonomyIndexWriterFactory indexWriterFactory

    A DirectoryTaxonomyIndexWriterFactory implementation that can be used to customize the Lucene.Net.Index.IndexWriter configuration and writer itself that's used to store the taxonomy index.

    Directory directory

    The Lucene.Net.Store.Directory in which to store the taxonomy. Note that the taxonomy is written directly to that directory (not to a subdirectory of it).

    OpenMode openMode

    Specifies how to open a taxonomy for writing: Lucene.Net.Index.OpenMode.APPEND means open an existing index for append (failing if the index does not yet exist). Lucene.Net.Index.OpenMode.CREATE means create a new index (first deleting the old one if it already existed). Lucene.Net.Index.OpenMode.CREATE_OR_APPEND appends to an existing index if there is one, otherwise it creates a new index.

    ITaxonomyWriterCache cache

    A ITaxonomyWriterCache implementation which determines the in-memory caching policy. See for example LruTaxonomyWriterCache and Cl2oTaxonomyWriterCache. If null or missing, CreateDefaultTaxonomyWriterCache() is used.

    Exceptions
    Type Condition
    CorruptIndexException

    if the taxonomy is corrupted.

    LockObtainFailedException

    if the taxonomy is locked by another writer. If it is known that no other concurrent writer is active, the lock might have been left around by an old dead process, and should be removed using Unlock(Directory).

    IOException

    if another error occurred.

    ArgumentNullException

    if indexWriterFactory is null

    DirectoryTaxonomyWriter(Directory)

    Create this with Lucene.Net.Index.OpenMode.CREATE_OR_APPEND.

    Declaration
    public DirectoryTaxonomyWriter(Directory directory)
    Parameters
    Type Name Description
    Directory directory

    The Lucene.Net.Store.Directory in which to store the taxonomy. Note that the taxonomy is written directly to that directory (not to a subdirectory of it).

    DirectoryTaxonomyWriter(Directory, OpenMode)

    Creates a new instance with a default cache as defined by CreateDefaultTaxonomyWriterCache().

    Declaration
    public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode)
    Parameters
    Type Name Description
    Directory directory
    OpenMode openMode

    DirectoryTaxonomyWriter(Directory, OpenMode, ITaxonomyWriterCache)

    Construct a Taxonomy writer.

    Declaration
    public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode, ITaxonomyWriterCache cache)
    Parameters
    Type Name Description
    Directory directory

    The Lucene.Net.Store.Directory in which to store the taxonomy. Note that the taxonomy is written directly to that directory (not to a subdirectory of it).

    OpenMode openMode

    Specifies how to open a taxonomy for writing: Lucene.Net.Index.OpenMode.APPEND means open an existing index for append (failing if the index does not yet exist). Lucene.Net.Index.OpenMode.CREATE means create a new index (first deleting the old one if it already existed). Lucene.Net.Index.OpenMode.CREATE_OR_APPEND appends to an existing index if there is one, otherwise it creates a new index.

    ITaxonomyWriterCache cache

    A ITaxonomyWriterCache implementation which determines the in-memory caching policy. See for example LruTaxonomyWriterCache and Cl2oTaxonomyWriterCache. If null or missing, CreateDefaultTaxonomyWriterCache() is used.

    Exceptions
    Type Condition
    CorruptIndexException

    if the taxonomy is corrupted.

    LockObtainFailedException

    if the taxonomy is locked by another writer. If it is known that no other concurrent writer is active, the lock might have been left around by an old dead process, and should be removed using Unlock(Directory).

    IOException

    if another error occurred.

    Fields

    INDEX_EPOCH

    Property name of user commit data that contains the index epoch. The epoch changes whenever the taxonomy is recreated (i.e. opened with Lucene.Net.Index.OpenMode.CREATE.

    Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.

    Declaration
    public const string INDEX_EPOCH = "index.epoch"
    Field Value
    Type Description
    string

    Properties

    CommitData

    Returns the commit user data map that was set on SetCommitData(IDictionary<string, string>).

    Declaration
    public virtual IDictionary<string, string> CommitData { get; }
    Property Value
    Type Description
    IDictionary<string, string>

    Count

    Count returns the number of categories in the taxonomy.

    Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through Count-1.

    Note that the number returned by Count is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
    Declaration
    public virtual int Count { get; }
    Property Value
    Type Description
    int

    Directory

    Returns the Lucene.Net.Store.Directory of this taxonomy writer.

    Declaration
    public virtual Directory Directory { get; }
    Property Value
    Type Description
    Directory

    TaxonomyEpoch

    Expert: returns current index epoch, if this is a near-real-time reader. Used by DirectoryTaxonomyReader to support NRT.

    Note

    This API is for internal purposes only and might change in incompatible ways in the next release.

    Declaration
    public long TaxonomyEpoch { get; }
    Property Value
    Type Description
    long

    Methods

    AddCategory(FacetLabel)

    NOTE to inheritors: This method can be called from the constructor to add the root category (e.g. if the index is empty). Therefore, if you override the AddCategory method, you should be aware that it will be called before your state is fully initialized.

    Declaration
    public virtual int AddCategory(FacetLabel categoryPath)
    Parameters
    Type Name Description
    FacetLabel categoryPath
    Returns
    Type Description
    int

    AddTaxonomy(Directory, IOrdinalMap)

    Takes the categories from the given taxonomy directory, and adds the missing ones to this taxonomy. Additionally, it fills the given DirectoryTaxonomyWriter.IOrdinalMap with a mapping from the original ordinal to the new ordinal.

    Declaration
    public virtual void AddTaxonomy(Directory taxoDir, DirectoryTaxonomyWriter.IOrdinalMap map)
    Parameters
    Type Name Description
    Directory taxoDir
    DirectoryTaxonomyWriter.IOrdinalMap map

    Commit()

    The second phase of a 2-phase commit. Implementations should ideally do very little work in this method (following Lucene.Net.Index.ITwoPhaseCommit.PrepareCommit(), and after it returns, the caller can assume that the changes were successfully committed to the underlying storage.

    Declaration
    public virtual void Commit()

    CreateDefaultTaxonomyWriterCache()

    Defines the default ITaxonomyWriterCache to use in constructors which do not specify one.

    The current default is Cl2oTaxonomyWriterCache constructed with the parameters (1024, 0.15f, 3), i.e., the entire taxonomy is cached in memory while building it.

    Declaration
    public static ITaxonomyWriterCache CreateDefaultTaxonomyWriterCache()
    Returns
    Type Description
    ITaxonomyWriterCache

    Dispose()

    Frees used resources as well as closes the underlying Lucene.Net.Index.IndexWriter, which commits whatever changes made to it to the underlying Lucene.Net.Store.Directory.

    Declaration
    public void Dispose()

    Dispose(bool)

    A hook for extending classes to close additional resources that were used. The default implementation closes the Lucene.Net.Index.IndexReader as well as the ITaxonomyWriterCache instances that were used.

    NOTE: if you override this method, you should include a base.Dispose(disposing) call in your implementation.

    Declaration
    protected virtual void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing

    EnsureOpen()

    Verifies that this instance wasn't closed, or throws ObjectDisposedException if it is.

    Declaration
    protected void EnsureOpen()

    FindCategory(FacetLabel)

    Look up the given category in the cache and/or the on-disk storage, returning the category's ordinal, or a negative number in case the category does not yet exist in the taxonomy.

    Declaration
    protected virtual int FindCategory(FacetLabel categoryPath)
    Parameters
    Type Name Description
    FacetLabel categoryPath
    Returns
    Type Description
    int

    GetParent(int)

    GetParent(int) returns the ordinal of the parent category of the category with the given ordinal.

    When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. GetParent(int) is functionally equivalent to calling GetPath(int) on the given ordinal, dropping the last component of the path, and then calling GetOrdinal(FacetLabel) to get an ordinal back.

    If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArgumentOutOfRangeException is thrown. However, it is expected that GetParent(int) will only be called for ordinals which are already known to be in the taxonomy.

    TODO (Facet): instead of a GetParent(ordinal) method, consider having a GetCategory(categorypath, prefixlen) which is similar to AddCategory(FacetLabel) except it doesn't add new categories; This method can be used to get the ordinals of all prefixes of the given category, and it can use exactly the same code and cache used by AddCategory(FacetLabel) so it means less code.

    Declaration
    public virtual int GetParent(int ordinal)
    Parameters
    Type Name Description
    int ordinal
    Returns
    Type Description
    int

    PrepareCommit()

    prepare most of the work needed for a two-phase commit. See Lucene.Net.Index.IndexWriter.PrepareCommit().

    Declaration
    public virtual void PrepareCommit()

    ReplaceTaxonomy(Directory)

    Replaces the current taxonomy with the given one. This method should generally be called in conjunction with Lucene.Net.Index.IndexWriter.AddIndexes(params Lucene.Net.Store.Directory[]) to replace both the taxonomy as well as the search index content.

    Declaration
    public virtual void ReplaceTaxonomy(Directory taxoDir)
    Parameters
    Type Name Description
    Directory taxoDir

    Rollback()

    Rollback changes to the taxonomy writer and closes the instance. Following this method the instance becomes unusable (calling any of its API methods will yield an ObjectDisposedException).

    Declaration
    public virtual void Rollback()

    SetCacheMissesUntilFill(int)

    Set the number of cache misses before an attempt is made to read the entire taxonomy into the in-memory cache.

    This taxonomy writer holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.

    If the number is set to 0, the entire taxonomy is read into the cache on first use, without fetching individual categories first.

    NOTE: it is assumed that this method is called immediately after the taxonomy writer has been created.

    Declaration
    public virtual void SetCacheMissesUntilFill(int i)
    Parameters
    Type Name Description
    int i

    SetCommitData(IDictionary<string, string>)

    Sets the commit user data map. That method is considered a transaction and will be committed (Lucene.Net.Index.IndexWriter.Commit()) even if no other changes were made to the writer instance.

    NOTE: the map is cloned internally, therefore altering the map's contents after calling this method has no effect.

    Declaration
    public virtual void SetCommitData(IDictionary<string, string> commitUserData)
    Parameters
    Type Name Description
    IDictionary<string, string> commitUserData

    Unlock(Directory)

    Forcibly unlocks the taxonomy in the named directory.

    Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this taxonomy.

    This method is unnecessary if your Lucene.Net.Store.Directory uses a Lucene.Net.Store.NativeFSLockFactory instead of the default Lucene.Net.Store.SimpleFSLockFactory. When the "native" lock is used, a lock does not stay behind forever when the process using it dies.
    Declaration
    public static void Unlock(Directory directory)
    Parameters
    Type Name Description
    Directory directory

    Implements

    ITaxonomyWriter
    IDisposable
    Lucene.Net.Index.ITwoPhaseCommit
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.