Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class TermVectorsWriter

    Codec API for writing term vectors:

    1. For every document, StartDocument(int) is called, informing the Codec how many fields will be written.
    2. StartField(FieldInfo, int, bool, bool, bool) is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled.
    3. Within each field, StartTerm(BytesRef, int) is called for each term.
    4. If offsets and/or positions are enabled, then AddPosition(int, int, int, BytesRef) will be called for each term occurrence.
    5. After all documents have been written, Finish(FieldInfos, int) is called for verification/sanity-checks.
    6. Finally the writer is disposed (Dispose(bool))

    Note

    This API is experimental and might change in incompatible ways in the next release.

    Inheritance
    object
    TermVectorsWriter
    CompressingTermVectorsWriter
    Lucene40TermVectorsWriter
    Implements
    IDisposable
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Codecs
    Assembly: Lucene.Net.dll
    Syntax
    public abstract class TermVectorsWriter : IDisposable

    Constructors

    TermVectorsWriter()

    Sole constructor. (For invocation by subclass constructors, typically implicit.)

    Declaration
    protected TermVectorsWriter()

    Properties

    Comparer

    Return the IComparer{BytesRef} used to sort terms before feeding to this API.

    Declaration
    public abstract IComparer<BytesRef> Comparer { get; }
    Property Value
    Type Description
    IComparer<BytesRef>

    Methods

    Abort()

    Aborts writing entirely, implementation should remove any partially-written files, etc.

    Declaration
    public abstract void Abort()

    AddAllDocVectors(Fields, MergeState)

    Safe (but, slowish) default method to write every vector field in the document.

    Declaration
    protected void AddAllDocVectors(Fields vectors, MergeState mergeState)
    Parameters
    Type Name Description
    Fields vectors
    MergeState mergeState

    AddPosition(int, int, int, BytesRef)

    Adds a term position and offsets.

    Declaration
    public abstract void AddPosition(int position, int startOffset, int endOffset, BytesRef payload)
    Parameters
    Type Name Description
    int position
    int startOffset
    int endOffset
    BytesRef payload

    AddProx(int, DataInput, DataInput)

    Called by IndexWriter when writing new segments.

    This is an expert API that allows the codec to consume positions and offsets directly from the indexer.

    The default implementation calls AddPosition(int, int, int, BytesRef), but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example.

    NOTE: this API is extremely expert and subject to change or removal!!!

    Note

    This API is for internal purposes only and might change in incompatible ways in the next release.

    Declaration
    public virtual void AddProx(int numProx, DataInput positions, DataInput offsets)
    Parameters
    Type Name Description
    int numProx
    DataInput positions
    DataInput offsets

    Dispose()

    Disposes all resources used by this object.

    Declaration
    public void Dispose()

    Dispose(bool)

    Implementations must override and should dispose all resources used by this instance.

    Declaration
    protected abstract void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing

    Finish(FieldInfos, int)

    Called before Dispose(bool), passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls to StartDocument(int), but a Codec should check that this is the case to detect the bug described in LUCENE-1282.

    Declaration
    public abstract void Finish(FieldInfos fis, int numDocs)
    Parameters
    Type Name Description
    FieldInfos fis
    int numDocs

    FinishDocument()

    Called after a doc and all its fields have been added.

    Declaration
    public virtual void FinishDocument()

    FinishField()

    Called after a field and all its terms have been added.

    Declaration
    public virtual void FinishField()

    FinishTerm()

    Called after a term and all its positions have been added.

    Declaration
    public virtual void FinishTerm()

    Merge(MergeState)

    Merges in the term vectors from the readers in mergeState. The default implementation skips over deleted documents, and uses StartDocument(int), StartField(FieldInfo, int, bool, bool, bool), StartTerm(BytesRef, int), AddPosition(int, int, int, BytesRef), and Finish(FieldInfos, int), returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).

    Declaration
    public virtual int Merge(MergeState mergeState)
    Parameters
    Type Name Description
    MergeState mergeState
    Returns
    Type Description
    int

    StartDocument(int)

    Called before writing the term vectors of the document. StartField(FieldInfo, int, bool, bool, bool) will be called numVectorFields times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this case numVectorFields will be zero.

    Declaration
    public abstract void StartDocument(int numVectorFields)
    Parameters
    Type Name Description
    int numVectorFields

    StartField(FieldInfo, int, bool, bool, bool)

    Called before writing the terms of the field. StartTerm(BytesRef, int) will be called numTerms times.

    Declaration
    public abstract void StartField(FieldInfo info, int numTerms, bool positions, bool offsets, bool payloads)
    Parameters
    Type Name Description
    FieldInfo info
    int numTerms
    bool positions
    bool offsets
    bool payloads

    StartTerm(BytesRef, int)

    Adds a term and its term frequency freq. If this field has positions and/or offsets enabled, then AddPosition(int, int, int, BytesRef) will be called freq times respectively.

    Declaration
    public abstract void StartTerm(BytesRef term, int freq)
    Parameters
    Type Name Description
    BytesRef term
    int freq

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.