Class OpenNLPTokenizer
Run OpenNLP SentenceDetector and Lucene.Net.Analysis.Tokenizer. The last token in each sentence is marked by setting the EOS_FLAG_BIT in the Lucene.Net.Analysis.TokenAttributes.IFlagsAttribute; following filters can use this information to apply operations to tokens one sentence at a time.
Inheritance
System.Object
Lucene.Net.Util.AttributeSource
Lucene.Net.Analysis.TokenStream
Lucene.Net.Analysis.Tokenizer
Lucene.Net.Analysis.Util.SegmentingTokenizerBase
OpenNLPTokenizer
Implements
System.IDisposable
Inherited Members
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.BUFFERMAX
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.m_buffer
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.m_offset
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementToken()
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.End()
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IsSafeEnd(System.Char)
Lucene.Net.Analysis.Tokenizer.m_input
Lucene.Net.Analysis.TokenStream.Dispose()
Lucene.Net.Util.AttributeSource.GetAttributeFactory()
Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
Lucene.Net.Util.AttributeSource.AddAttribute<T>()
Lucene.Net.Util.AttributeSource.HasAttributes
Lucene.Net.Util.AttributeSource.HasAttribute<T>()
Lucene.Net.Util.AttributeSource.GetAttribute<T>()
Lucene.Net.Util.AttributeSource.ClearAttributes()
Lucene.Net.Util.AttributeSource.CaptureState()
Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
Lucene.Net.Util.AttributeSource.GetHashCode()
Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
Lucene.Net.Util.AttributeSource.CloneAttributes()
Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
Lucene.Net.Util.AttributeSource.ToString()
Namespace: Lucene.Net.Analysis.OpenNlp
Assembly: Lucene.Net.Analysis.OpenNLP.dll
Syntax
public sealed class OpenNLPTokenizer : SegmentingTokenizerBase, IDisposable
Constructors
| Improve this Doc View SourceOpenNLPTokenizer(AttributeSource.AttributeFactory, TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)
Declaration
public OpenNLPTokenizer(AttributeSource.AttributeFactory factory, TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Util.AttributeSource.AttributeFactory | factory | |
TextReader | reader | |
NLPSentenceDetectorOp | sentenceOp | |
NLPTokenizerOp | tokenizerOp |
OpenNLPTokenizer(TextReader, NLPSentenceDetectorOp, NLPTokenizerOp)
Creates a new OpenNLPTokenizer
Declaration
public OpenNLPTokenizer(TextReader reader, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
NLPSentenceDetectorOp | sentenceOp | |
NLPTokenizerOp | tokenizerOp |
Fields
| Improve this Doc View SourceEOS_FLAG_BIT
Declaration
public static int EOS_FLAG_BIT
Field Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceDispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | disposing |
Overrides
| Improve this Doc View SourceIncrementWord()
Declaration
protected override bool IncrementWord()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.IncrementWord()
|
Improve this Doc
View Source
Reset()
Declaration
public override void Reset()
Overrides
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.Reset()
|
Improve this Doc
View Source
SetNextSentence(Int32, Int32)
Declaration
protected override void SetNextSentence(int sentenceStart, int sentenceEnd)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | sentenceStart | |
System.Int32 | sentenceEnd |
Overrides
Lucene.Net.Analysis.Util.SegmentingTokenizerBase.SetNextSentence(System.Int32, System.Int32)
Implements
System.IDisposable