Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class OpenNLPTokenizer

    Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase
    OpenNLPTokenizer
    Implements
    System.IDisposable
    Inherited Members
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.BUFFERMAX
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.m_buffer
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.m_offset
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementToken()
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.End()
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IsSafeEnd(System.Char)
    Lucene.Net.Analysis.Tokenizer.m_input
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.OpenNlp
    Assembly: Lucene.Net.Analysis.OpenNLP.dll
    Syntax
    public sealed class OpenNLPTokenizer : SegmentingTokenizerBase, IDisposable

    Constructors

    | Improve this Doc View Source

    OpenNLPTokenizer(AttributeSource.AttributeFactory, TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)

    Declaration
    public OpenNLPTokenizer(AttributeSource.AttributeFactory factory, TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
    Parameters
    Type Name Description
    Lucene.Net.Util.AttributeSource.AttributeFactory factory
    TextReader reader
    NLPSentenceDetectorOp sentenceOp
    NLPTokenizerOp tokenizerOp
    | Improve this Doc View Source

    OpenNLPTokenizer(TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)

    Creates a new OpenNLPTokenizer

    Declaration
    public OpenNLPTokenizer(TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
    Parameters
    Type Name Description
    TextReader reader
    NLPSentenceDetectorOp sentenceOp
    NLPTokenizerOp tokenizerOp

    Fields

    | Improve this Doc View Source

    EOS_FLAG_BIT

    Declaration
    public static int EOS_FLAG_BIT
    Field Value
    Type Description
    System.Int32

    Methods

    | Improve this Doc View Source

    Dispose(Boolean)

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    Overrides
    Tokenizer.Dispose(Boolean)
    | Improve this Doc View Source

    IncrementWord()

    Declaration
    protected override bool IncrementWord()
    Returns
    Type Description
    System.Boolean
    Overrides
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementWord()
    | Improve this Doc View Source

    Reset()

    Declaration
    public override void Reset()
    Overrides
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.Reset()
    | Improve this Doc View Source

    SetNextSentence(Int32, Int32)

    Declaration
    protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
    Parameters
    Type Name Description
    System.Int32 sentenceStart
    System.Int32 sentenceEnd
    Overrides
    Lucene.Net.Analysis.Util.SegmentingTokenizerBase.SetNextSentence(System.Int32, System.Int32)

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2022 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.