Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class OpenNLPTokenizer

    Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

    Inheritance
    object
    AttributeSource
    TokenStream
    Tokenizer
    SegmentingTokenizerBase
    OpenNLPTokenizer
    Implements
    IDisposable
    Inherited Members
    SegmentingTokenizerBase.IncrementToken()
    SegmentingTokenizerBase.End()
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis.OpenNlp
    Assembly: Lucene.Net.Analysis.OpenNLP.dll
    Syntax
    public sealed class OpenNLPTokenizer : SegmentingTokenizerBase, IDisposable

    Constructors

    OpenNLPTokenizer(AttributeFactory, TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)

    Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

    Declaration
    public OpenNLPTokenizer(AttributeSource.AttributeFactory factory, TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader reader
    NLPSentenceDetectorOp sentenceOp
    NLPTokenizerOp tokenizerOp

    OpenNLPTokenizer(TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)

    Creates a new OpenNLPTokenizer

    Declaration
    public OpenNLPTokenizer(TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
    Parameters
    Type Name Description
    TextReader reader
    NLPSentenceDetectorOp sentenceOp
    NLPTokenizerOp tokenizerOp

    Fields

    EOS_FLAG_BIT

    Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

    Declaration
    public static int EOS_FLAG_BIT
    Field Value
    Type Description
    int

    Methods

    Dispose(bool)

    Releases resources associated with this stream.

    If you override this method, always call base.Dispose(disposing), otherwise some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will throw InvalidOperationException on reuse).
    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing
    Overrides
    Tokenizer.Dispose(bool)
    Remarks

    NOTE: The default implementation closes the input TextReader, so be sure to call base.Dispose(disposing) when overriding this method.

    IncrementWord()

    Returns true if another word is available

    Declaration
    protected override bool IncrementWord()
    Returns
    Type Description
    bool
    Overrides
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementWord()

    Reset()

    This method is called by a consumer before it begins consumption using Lucene.Net.Analysis.TokenStream.IncrementToken().

    Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.

    If you override this method, always call base.Reset(), otherwise some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will throw InvalidOperationException on further usage).
    Declaration
    public override void Reset()
    Overrides
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.Reset()

    SetNextSentence(int, int)

    Provides the next input sentence for analysis

    Declaration
    protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
    Parameters
    Type Name Description
    int sentenceStart
    int sentenceEnd
    Overrides
    SegmentingTokenizerBase.SetNextSentence(int, int)

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.