Class DirectoryTaxonomyWriter
ITaxonomyWriter which uses a Directory to store the taxonomy information on disk, and keeps an additional in-memory cache of some or all categories.
In addition to the permanently-stored information in the Directory, efficiency dictates that we also keep an in-memory cache of recently seen or all categories, so that we do not need to go back to disk for every category addition to see which ordinal this category already has, if any. A ITaxonomyWriterCache object determines the specific caching algorithm used.
This class offers some hooks for extending classes to control the IndexWriter instance that is used. See OpenIndexWriter(Directory, IndexWriterConfig).
Inheritance
Inherited Members
Namespace: Lucene.Net.Facet.Taxonomy.Directory
Assembly: Lucene.Net.Facet.dll
Syntax
public class DirectoryTaxonomyWriter : ITaxonomyWriter, IDisposable, ITwoPhaseCommit
Constructors
| Improve this Doc View SourceDirectoryTaxonomyWriter(Directory, OpenMode)
Creates a new instance with a default cache as defined by DefaultTaxonomyWriterCache().
Declaration
public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode = OpenMode.CREATE_OR_APPEND)
Parameters
Type | Name | Description |
---|---|---|
Directory | directory | |
OpenMode | openMode |
DirectoryTaxonomyWriter(Directory, OpenMode, ITaxonomyWriterCache)
Construct a Taxonomy writer.
Declaration
public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode, ITaxonomyWriterCache cache)
Parameters
Type | Name | Description |
---|---|---|
Directory | directory | The Directory in which to store the taxonomy. Note that the taxonomy is written directly to that directory (not to a subdirectory of it). |
OpenMode | openMode | Specifies how to open a taxonomy for writing: APPEND means open an existing index for append (failing if the index does not yet exist). CREATE means create a new index (first deleting the old one if it already existed). CREATE_OR_APPEND appends to an existing index if there is one, otherwise it creates a new index. |
ITaxonomyWriterCache | cache | A ITaxonomyWriterCache implementation which determines the in-memory caching policy. See for example LruTaxonomyWriterCache and Cl2oTaxonomyWriterCache. If null or missing, DefaultTaxonomyWriterCache() is used. |
Exceptions
Type | Condition |
---|---|
CorruptIndexException | if the taxonomy is corrupted. |
LockObtainFailedException | if the taxonomy is locked by another writer. If it is known that no other concurrent writer is active, the lock might have been left around by an old dead process, and should be removed using Unlock(Directory). |
System.IO.IOException | if another error occurred. |
Fields
| Improve this Doc View SourceINDEX_EPOCH
Property name of user commit data that contains the index epoch. The epoch changes whenever the taxonomy is recreated (i.e. opened with CREATE.
Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.
Declaration
public const string INDEX_EPOCH = "index.epoch"
Field Value
Type | Description |
---|---|
System.String |
Properties
| Improve this Doc View SourceCommitData
Declaration
public virtual IDictionary<string, string> CommitData { get; }
Property Value
Type | Description |
---|---|
System.Collections.Generic.IDictionary<System.String, System.String> |
Count
Declaration
public virtual int Count { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Directory
Returns the Directory of this taxonomy writer.
Declaration
public virtual Directory Directory { get; }
Property Value
Type | Description |
---|---|
Directory |
TaxonomyEpoch
Expert: returns current index epoch, if this is a near-real-time reader. Used by DirectoryTaxonomyReader to support NRT.
Declaration
public long TaxonomyEpoch { get; }
Property Value
Type | Description |
---|---|
System.Int64 |
Methods
| Improve this Doc View SourceAddCategory(FacetLabel)
Declaration
public virtual int AddCategory(FacetLabel categoryPath)
Parameters
Type | Name | Description |
---|---|---|
FacetLabel | categoryPath |
Returns
Type | Description |
---|---|
System.Int32 |
AddTaxonomy(Directory, DirectoryTaxonomyWriter.IOrdinalMap)
Takes the categories from the given taxonomy directory, and adds the missing ones to this taxonomy. Additionally, it fills the given DirectoryTaxonomyWriter.IOrdinalMap with a mapping from the original ordinal to the new ordinal.
Declaration
public virtual void AddTaxonomy(Directory taxoDir, DirectoryTaxonomyWriter.IOrdinalMap map)
Parameters
Type | Name | Description |
---|---|---|
Directory | taxoDir | |
DirectoryTaxonomyWriter.IOrdinalMap | map |
CloseResources()
A hook for extending classes to close additional resources that were used. The default implementation closes the IndexReader as well as the ITaxonomyWriterCache instances that were used.
NOTE: if you override this method, you should include a
base.CloseResources()
call in your implementation.
Declaration
protected virtual void CloseResources()
Commit()
Declaration
public virtual void Commit()
CreateIndexWriterConfig(OpenMode)
Create the IndexWriterConfig that would be used for opening the internal index writer.
Extensions can configure the IndexWriter as they see fit, including setting a MergeScheduler, or IndexDeletionPolicy, different RAM size etc.
NOTE: internal docids of the configured index must not be altered. For that, categories are never deleted from the taxonomy index. In addition, merge policy in effect must not merge none adjacent segments.
Declaration
protected virtual IndexWriterConfig CreateIndexWriterConfig(OpenMode openMode)
Parameters
Type | Name | Description |
---|---|---|
OpenMode | openMode | see OpenMode |
Returns
Type | Description |
---|---|
IndexWriterConfig |
See Also
| Improve this Doc View SourceDefaultTaxonomyWriterCache()
Defines the default ITaxonomyWriterCache to use in constructors which do not specify one.
The current default is Cl2oTaxonomyWriterCache constructed
with the parameters (1024, 0.15f, 3), i.e., the entire taxonomy is
cached in memory while building it.
Declaration
public static ITaxonomyWriterCache DefaultTaxonomyWriterCache()
Returns
Type | Description |
---|---|
ITaxonomyWriterCache |
Dispose()
Frees used resources as well as closes the underlying IndexWriter, which commits whatever changes made to it to the underlying Directory.
Declaration
public void Dispose()
EnsureOpen()
Verifies that this instance wasn't closed, or throws System.ObjectDisposedException if it is.
Declaration
protected void EnsureOpen()
FindCategory(FacetLabel)
Look up the given category in the cache and/or the on-disk storage, returning the category's ordinal, or a negative number in case the category does not yet exist in the taxonomy.
Declaration
protected virtual int FindCategory(FacetLabel categoryPath)
Parameters
Type | Name | Description |
---|---|---|
FacetLabel | categoryPath |
Returns
Type | Description |
---|---|
System.Int32 |
GetParent(Int32)
Declaration
public virtual int GetParent(int ordinal)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | ordinal |
Returns
Type | Description |
---|---|
System.Int32 |
OpenIndexWriter(Directory, IndexWriterConfig)
Open internal index writer, which contains the taxonomy data.
Extensions may provide their own IndexWriter implementation or instance.
NOTE: the instance this method returns will be disposed upon calling to Dispose().
NOTE: the merge policy in effect must not merge none adjacent segments. See comment in CreateIndexWriterConfig(OpenMode) for the logic behind this.
Declaration
protected virtual IndexWriter OpenIndexWriter(Directory directory, IndexWriterConfig config)
Parameters
Type | Name | Description |
---|---|---|
Directory | directory | the Directory on top of which an IndexWriter should be opened. |
IndexWriterConfig | config | configuration for the internal index writer. |
Returns
Type | Description |
---|---|
IndexWriter |
See Also
| Improve this Doc View SourcePrepareCommit()
prepare most of the work needed for a two-phase commit. See PrepareCommit().
Declaration
public virtual void PrepareCommit()
ReplaceTaxonomy(Directory)
Replaces the current taxonomy with the given one. This method should generally be called in conjunction with AddIndexes(Directory[]) to replace both the taxonomy as well as the search index content.
Declaration
public virtual void ReplaceTaxonomy(Directory taxoDir)
Parameters
Type | Name | Description |
---|---|---|
Directory | taxoDir |
Rollback()
Rollback changes to the taxonomy writer and closes the instance. Following this method the instance becomes unusable (calling any of its API methods will yield an System.ObjectDisposedException).
Declaration
public virtual void Rollback()
SetCacheMissesUntilFill(Int32)
Set the number of cache misses before an attempt is made to read the entire taxonomy into the in-memory cache.
This taxonomy writer holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.
If the number is set to 0
, the entire taxonomy is read into the
cache on first use, without fetching individual categories first.
NOTE: it is assumed that this method is called immediately after the taxonomy writer has been created.
Declaration
public virtual void SetCacheMissesUntilFill(int i)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | i |
SetCommitData(IDictionary<String, String>)
Declaration
public virtual void SetCommitData(IDictionary<string, string> commitUserData)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IDictionary<System.String, System.String> | commitUserData |
Unlock(Directory)
Forcibly unlocks the taxonomy in the named directory.
Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this taxonomy.
This method is unnecessary if your Directory uses a NativeFSLockFactory instead of the default SimpleFSLockFactory. When the "native" lock is used, a lock does not stay behind forever when the process using it dies.
Declaration
public static void Unlock(Directory directory)
Parameters
Type | Name | Description |
---|---|---|
Directory | directory |