Class OpenNLPTokenizer
Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.OpenNlp
Assembly: Lucene.Net.Analysis.OpenNLP.dll
Syntax
public sealed class OpenNLPTokenizer : SegmentingTokenizerBase, IDisposable
Constructors
OpenNLPTokenizer(AttributeFactory, TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)
Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.
Declaration
public OpenNLPTokenizer(AttributeSource.AttributeFactory factory, TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
TextReader | reader | |
NLPSentenceDetectorOp | sentenceOp | |
NLPTokenizerOp | tokenizerOp |
OpenNLPTokenizer(TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)
Creates a new OpenNLPTokenizer
Declaration
public OpenNLPTokenizer(TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
NLPSentenceDetectorOp | sentenceOp | |
NLPTokenizerOp | tokenizerOp |
Fields
EOS_FLAG_BIT
Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.
Declaration
public static int EOS_FLAG_BIT
Field Value
Type | Description |
---|---|
int |
Methods
Dispose(bool)
Releases resources associated with this stream.
If you override this method, always callbase.Dispose(disposing)
, otherwise
some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will
throw InvalidOperationException on reuse).
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing |
Overrides
Remarks
NOTE:
The default implementation closes the input TextReader, so
be sure to call base.Dispose(disposing)
when overriding this method.
IncrementWord()
Returns true if another word is available
Declaration
protected override bool IncrementWord()
Returns
Type | Description |
---|---|
bool |
Overrides
Reset()
This method is called by a consumer before it begins consumption using Lucene.Net.Analysis.TokenStream.IncrementToken().
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh. If you override this method, always callbase.Reset()
, otherwise
some internal state will not be correctly reset (e.g., Lucene.Net.Analysis.Tokenizer will
throw InvalidOperationException on further usage).
Declaration
public override void Reset()
Overrides
SetNextSentence(int, int)
Provides the next input sentence for analysis
Declaration
protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
Parameters
Type | Name | Description |
---|---|---|
int | sentenceStart | |
int | sentenceEnd |