Class LowerCaseTokenizer

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating LowerCaseTokenizer:

As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See IsTokenChar(int) and Normalize(int) for details.

Inheritance

object

AttributeSource

TokenStream

Tokenizer

CharTokenizer

LetterTokenizer

LowerCaseTokenizer

Implements

IDisposable

Inherited Members

CharTokenizer.IncrementToken()

CharTokenizer.End()

CharTokenizer.Reset()

Tokenizer.SetReader(TextReader)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(object)

AttributeSource.ReflectAsString(bool)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

object.Equals(object, object)

object.GetType()

object.ReferenceEquals(object, object)

Namespace: Lucene.Net.Analysis.Core

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public sealed class LowerCaseTokenizer : LetterTokenizer, IDisposable

Constructors

LowerCaseTokenizer(LuceneVersion, AttributeFactory, TextReader)

Construct a new LowerCaseTokenizer using a given Lucene.Net.Util.AttributeSource.AttributeFactory.

Declaration

public LowerCaseTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader @in)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene.Net.Util.LuceneVersion to match
AttributeSource.AttributeFactory	factory	the attribute factory to use for this Lucene.Net.Analysis.Tokenizer
TextReader	in	the input to split up into tokens

LowerCaseTokenizer(LuceneVersion, TextReader)

Construct a new LowerCaseTokenizer.

Declaration

public LowerCaseTokenizer(LuceneVersion matchVersion, TextReader @in)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene.Net.Util.LuceneVersion to match
TextReader	in	the input to split up into tokens

Methods

Normalize(int)

Converts char to lower case ToLower(int, CultureInfo) in the invariant culture.

Declaration

protected override int Normalize(int c)

Parameters

Type	Name	Description
int	c

Returns

Type	Description
int

Overrides

CharTokenizer.Normalize(int)

Implements

IDisposable