Class OpenNLPTokenizer

Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.

Inheritance

object

AttributeSource

TokenStream

Tokenizer

SegmentingTokenizerBase

OpenNLPTokenizer

Implements

IDisposable

Inherited Members

SegmentingTokenizerBase.IncrementToken()

SegmentingTokenizerBase.End()

Tokenizer.SetReader(TextReader)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(object)

AttributeSource.ReflectAsString(bool)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

object.Equals(object, object)

object.GetType()

object.ReferenceEquals(object, object)

Namespace: Lucene.Net.Analysis.OpenNlp

Assembly: Lucene.Net.Analysis.OpenNLP.dll

Syntax

public sealed class OpenNLPTokenizer : SegmentingTokenizerBase, IDisposable

Constructors

OpenNLPTokenizer(AttributeFactory, TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)

Declaration

public OpenNLPTokenizer(AttributeSource.AttributeFactory factory, TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)

Parameters

Type	Name	Description
AttributeSource.AttributeFactory	factory
TextReader	reader
NLPSentenceDetectorOp	sentenceOp
NLPTokenizerOp	tokenizerOp

OpenNLPTokenizer(TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)

Creates a new OpenNLPTokenizer

Declaration

public OpenNLPTokenizer(TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)

Parameters

Type	Name	Description
TextReader	reader
NLPSentenceDetectorOp	sentenceOp
NLPTokenizerOp	tokenizerOp

Fields

EOS_FLAG_BIT

Declaration

public static int EOS_FLAG_BIT

Field Value

Type	Description
int

Methods

Dispose(bool)

Releases resources associated with this stream.

If you override this method, always call base.Dispose(disposing), otherwise some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will throw InvalidOperationException on reuse).

Declaration

protected override void Dispose(bool disposing)

Parameters

Type	Name	Description
bool	disposing

Overrides

Tokenizer.Dispose(bool)

Remarks

NOTE: The default implementation closes the input TextReader, so be sure to call base.Dispose(disposing) when overriding this method.

IncrementWord()

Returns true if another word is available

Declaration

protected override bool IncrementWord()

Returns

Type	Description
bool

Overrides

Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementWord()

Reset()

This method is called by a consumer before it begins consumption using Lucene.Net.Analysis.TokenStream.IncrementToken().

Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.

If you override this method, always call base.Reset(), otherwise some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will throw InvalidOperationException on further usage).

Declaration

public override void Reset()

Overrides

Lucene.Net.Analysis.Util.SegmentingTokenizerBase.Reset()

SetNextSentence(int, int)

Provides the next input sentence for analysis

Declaration

protected override void SetNextSentence(int sentenceStart, int sentenceEnd)

Parameters

Type	Name	Description
int	sentenceStart
int	sentenceEnd

Overrides

SegmentingTokenizerBase.SetNextSentence(int, int)

Implements

IDisposable