Class TermVectorsWriter
Codec API for writing term vectors:
- For every document, StartDocument(int) is called, informing the Codec how many fields will be written.
- StartField(FieldInfo, int, bool, bool, bool) is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled.
- Within each field, StartTerm(BytesRef, int) is called for each term.
- If offsets and/or positions are enabled, then AddPosition(int, int, int, BytesRef) will be called for each term occurrence.
- After all documents have been written, Finish(FieldInfos, int) is called for verification/sanity-checks.
- Finally the writer is disposed (Dispose(bool))
Note
This API is experimental and might change in incompatible ways in the next release.
Implements
Inherited Members
Namespace: Lucene.Net.Codecs
Assembly: Lucene.Net.dll
Syntax
public abstract class TermVectorsWriter : IDisposable
Constructors
TermVectorsWriter()
Sole constructor. (For invocation by subclass constructors, typically implicit.)
Declaration
protected TermVectorsWriter()
Properties
Comparer
Return the IComparer{BytesRef} used to sort terms before feeding to this API.
Declaration
public abstract IComparer<BytesRef> Comparer { get; }
Property Value
Type | Description |
---|---|
IComparer<BytesRef> |
Methods
Abort()
Aborts writing entirely, implementation should remove any partially-written files, etc.
Declaration
public abstract void Abort()
AddAllDocVectors(Fields, MergeState)
Safe (but, slowish) default method to write every vector field in the document.
Declaration
protected void AddAllDocVectors(Fields vectors, MergeState mergeState)
Parameters
Type | Name | Description |
---|---|---|
Fields | vectors | |
MergeState | mergeState |
AddPosition(int, int, int, BytesRef)
Adds a term position
and offsets.
Declaration
public abstract void AddPosition(int position, int startOffset, int endOffset, BytesRef payload)
Parameters
Type | Name | Description |
---|---|---|
int | position | |
int | startOffset | |
int | endOffset | |
BytesRef | payload |
AddProx(int, DataInput, DataInput)
Called by IndexWriter when writing new segments.
This is an expert API that allows the codec to consume positions and offsets directly from the indexer. The default implementation calls AddPosition(int, int, int, BytesRef), but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example. NOTE: this API is extremely expert and subject to change or removal!!!Note
This API is for internal purposes only and might change in incompatible ways in the next release.
Declaration
public virtual void AddProx(int numProx, DataInput positions, DataInput offsets)
Parameters
Type | Name | Description |
---|---|---|
int | numProx | |
DataInput | positions | |
DataInput | offsets |
Dispose()
Disposes all resources used by this object.
Declaration
public void Dispose()
Dispose(bool)
Implementations must override and should dispose all resources used by this instance.
Declaration
protected abstract void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing |
Finish(FieldInfos, int)
Called before Dispose(bool), passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls to StartDocument(int), but a Codec should check that this is the case to detect the bug described in LUCENE-1282.
Declaration
public abstract void Finish(FieldInfos fis, int numDocs)
Parameters
Type | Name | Description |
---|---|---|
FieldInfos | fis | |
int | numDocs |
FinishDocument()
Called after a doc and all its fields have been added.
Declaration
public virtual void FinishDocument()
FinishField()
Called after a field and all its terms have been added.
Declaration
public virtual void FinishField()
FinishTerm()
Called after a term and all its positions have been added.
Declaration
public virtual void FinishTerm()
Merge(MergeState)
Merges in the term vectors from the readers in
mergeState
. The default implementation skips
over deleted documents, and uses StartDocument(int),
StartField(FieldInfo, int, bool, bool, bool),
StartTerm(BytesRef, int), AddPosition(int, int, int, BytesRef),
and Finish(FieldInfos, int),
returning the number of documents that were written.
Implementations can override this method for more sophisticated
merging (bulk-byte copying, etc).
Declaration
public virtual int Merge(MergeState mergeState)
Parameters
Type | Name | Description |
---|---|---|
MergeState | mergeState |
Returns
Type | Description |
---|---|
int |
StartDocument(int)
Called before writing the term vectors of the document.
StartField(FieldInfo, int, bool, bool, bool) will
be called numVectorFields
times. Note that if term
vectors are enabled, this is called even if the document
has no vector fields, in this case numVectorFields
will be zero.
Declaration
public abstract void StartDocument(int numVectorFields)
Parameters
Type | Name | Description |
---|---|---|
int | numVectorFields |
StartField(FieldInfo, int, bool, bool, bool)
Called before writing the terms of the field.
StartTerm(BytesRef, int) will be called numTerms
times.
Declaration
public abstract void StartField(FieldInfo info, int numTerms, bool positions, bool offsets, bool payloads)
Parameters
Type | Name | Description |
---|---|---|
FieldInfo | info | |
int | numTerms | |
bool | positions | |
bool | offsets | |
bool | payloads |
StartTerm(BytesRef, int)
Adds a term
and its term frequency freq
.
If this field has positions and/or offsets enabled, then
AddPosition(int, int, int, BytesRef) will be called
freq
times respectively.
Declaration
public abstract void StartTerm(BytesRef term, int freq)
Parameters
Type | Name | Description |
---|---|---|
BytesRef | term | |
int | freq |