Class ICUTokenizer

Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Inheritance

System.Object

AttributeSource

TokenStream

Tokenizer

ICUTokenizer

Implements

IDisposable

Inherited Members

Tokenizer.m_input

Tokenizer.Dispose(Boolean)

Tokenizer.CorrectOffset(Int32)

Tokenizer.SetReader(TextReader)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

Namespace: Lucene.Net.Analysis.Icu.Segmentation

Assembly: Lucene.Net.ICU.dll

Syntax

public sealed class ICUTokenizer : Tokenizer, IDisposable

Constructors

| Improve this Doc View Source

ICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)

Construct a new ICUTokenizer that breaks text into words from the given , using a tailored configuration.

Declaration

public ICUTokenizer(AttributeSource.AttributeFactory factory, TextReader input, ICUTokenizerConfig config)

Parameters

Type	Name	Description
AttributeSource.AttributeFactory	factory	AttributeSource.AttributeFactory to use.
TextReader	input	containing text to tokenize.
ICUTokenizerConfig	config	Tailored configuration.

| Improve this Doc View Source

ICUTokenizer(TextReader)

Construct a new ICUTokenizer that breaks text into words from the given .

Declaration

public ICUTokenizer(TextReader input)

Parameters

Type	Name	Description
TextReader	input	containing text to tokenize.

Remarks

The default script-specific handling is used.

The default attribute factory is used.

ICUTokenizer(TextReader, ICUTokenizerConfig)

Construct a new ICUTokenizer that breaks text into words from the given , using a tailored configuration.

Declaration

public ICUTokenizer(TextReader input, ICUTokenizerConfig config)

Parameters

Type	Name	Description
TextReader	input	containing text to tokenize.
ICUTokenizerConfig	config	Tailored configuration.

Remarks

The default attribute factory is used.

Methods

| Improve this Doc View Source

End()

Declaration

public override void End()

Overrides

TokenStream.End()

| Improve this Doc View Source

IncrementToken()

Declaration

public override bool IncrementToken()

Returns

Type	Description
System.Boolean

Overrides

TokenStream.IncrementToken()

| Improve this Doc View Source

Reset()

Declaration

public override void Reset()

Overrides

Tokenizer.Reset()

Implements

IDisposable

Class ICUTokenizer

Inheritance

Implements

Inherited Members

Namespace: Lucene.Net.Analysis.Icu.Segmentation

Assembly: Lucene.Net.ICU.dll

Syntax

Constructors

ICUTokenizer(AttributeSource.AttributeFactory, TextReader, ICUTokenizerConfig)

Declaration

Parameters

ICUTokenizer(TextReader)

Declaration

Parameters

Remarks

See Also

ICUTokenizer(TextReader, ICUTokenizerConfig)

Declaration

Parameters

Remarks

Methods

End()

Declaration

Overrides

IncrementToken()

Declaration

Returns

Overrides

Reset()

Declaration

Overrides

Implements

See Also