Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class LowerCaseTokenizer

    LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

    Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating LowerCaseTokenizer:

    • As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See IsTokenChar(Int32) and Normalize(Int32) for details.

    Inheritance
    System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    CharTokenizer
    LetterTokenizer
    LowerCaseTokenizer
    Implements
    System.IDisposable
    Inherited Members
    LetterTokenizer.IsTokenChar(Int32)
    CharTokenizer.IncrementToken()
    CharTokenizer.End()
    CharTokenizer.Reset()
    Lucene.Net.Analysis.Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    Lucene.Net.Analysis.TokenStream.Dispose()
    Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    Lucene.Net.Util.AttributeSource.HasAttributes
    Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    Lucene.Net.Util.AttributeSource.ClearAttributes()
    Lucene.Net.Util.AttributeSource.CaptureState()
    Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    Lucene.Net.Util.AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    Lucene.Net.Util.AttributeSource.CloneAttributes()
    Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    Lucene.Net.Util.AttributeSource.ToString()
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: Lucene.Net.Analysis.Core
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class LowerCaseTokenizer : LetterTokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    LowerCaseTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)

    Construct a new LowerCaseTokenizer using a given Lucene.Net.Util.AttributeSource.AttributeFactory.

    Declaration
    public LowerCaseTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader in)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene.Net.Util.LuceneVersion to match

    Lucene.Net.Util.AttributeSource.AttributeFactory factory

    the attribute factory to use for this Lucene.Net.Analysis.Tokenizer

    System.IO.TextReader in

    the input to split up into tokens

    | Improve this Doc View Source

    LowerCaseTokenizer(LuceneVersion, TextReader)

    Construct a new LowerCaseTokenizer.

    Declaration
    public LowerCaseTokenizer(LuceneVersion matchVersion, TextReader in)
    Parameters
    Type Name Description
    Lucene.Net.Util.LuceneVersion matchVersion

    Lucene.Net.Util.LuceneVersion to match

    System.IO.TextReader in

    the input to split up into tokens

    Methods

    | Improve this Doc View Source

    Normalize(Int32)

    Converts char to lower case J2N.Character.ToLower(System.Int32,System.Globalization.CultureInfo) in the invariant culture.

    Declaration
    protected override int Normalize(int c)
    Parameters
    Type Name Description
    System.Int32 c
    Returns
    Type Description
    System.Int32
    Overrides
    CharTokenizer.Normalize(Int32)

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.