Class LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together. It divides text at non-letters and converts
them to lower case. While it is functionally equivalent to the combination
of LetterTokenizer and LowerCaseFilter, there is a performance advantage
to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating
LowerCaseTokenizer:
Inheritance
System.Object
Lucene.Net.Util.AttributeSource
Lucene.Net.Analysis.TokenStream
Lucene.Net.Analysis.Tokenizer
LowerCaseTokenizer
Implements
System.IDisposable
Inherited Members
Lucene.Net.Analysis.Tokenizer.m_input
Lucene.Net.Analysis.TokenStream.Dispose()
Lucene.Net.Util.AttributeSource.GetAttributeFactory()
Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
Lucene.Net.Util.AttributeSource.AddAttribute<T>()
Lucene.Net.Util.AttributeSource.HasAttributes
Lucene.Net.Util.AttributeSource.HasAttribute<T>()
Lucene.Net.Util.AttributeSource.GetAttribute<T>()
Lucene.Net.Util.AttributeSource.ClearAttributes()
Lucene.Net.Util.AttributeSource.CaptureState()
Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
Lucene.Net.Util.AttributeSource.GetHashCode()
Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
Lucene.Net.Util.AttributeSource.CloneAttributes()
Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
Lucene.Net.Util.AttributeSource.ToString()
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class LowerCaseTokenizer : LetterTokenizer, IDisposable
Constructors
|
Improve this Doc
View Source
LowerCaseTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Construct a new LowerCaseTokenizer using a given
Lucene.Net.Util.AttributeSource.AttributeFactory.
Declaration
public LowerCaseTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader in)
Parameters
Type |
Name |
Description |
Lucene.Net.Util.LuceneVersion |
matchVersion |
Lucene.Net.Util.LuceneVersion to match
|
Lucene.Net.Util.AttributeSource.AttributeFactory |
factory |
the attribute factory to use for this Lucene.Net.Analysis.Tokenizer
|
System.IO.TextReader |
in |
the input to split up into tokens
|
|
Improve this Doc
View Source
LowerCaseTokenizer(LuceneVersion, TextReader)
Declaration
public LowerCaseTokenizer(LuceneVersion matchVersion, TextReader in)
Parameters
Type |
Name |
Description |
Lucene.Net.Util.LuceneVersion |
matchVersion |
Lucene.Net.Util.LuceneVersion to match
|
System.IO.TextReader |
in |
the input to split up into tokens
|
Methods
|
Improve this Doc
View Source
Normalize(Int32)
Converts char to lower case
J2N.Character.ToLower(System.Int32,System.Globalization.CultureInfo) in the invariant culture.
Declaration
protected override int Normalize(int c)
Parameters
Type |
Name |
Description |
System.Int32 |
c |
|
Returns
Type |
Description |
System.Int32 |
|
Overrides
Implements
System.IDisposable