Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class CapitalizationFilter

    A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.

    This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.
    Inheritance
    object
    AttributeSource
    TokenStream
    TokenFilter
    CapitalizationFilter
    Implements
    IDisposable
    Inherited Members
    TokenFilter.End()
    TokenFilter.Reset()
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(object)
    AttributeSource.ReflectAsString(bool)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    object.Equals(object, object)
    object.GetType()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Analysis.Miscellaneous
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public sealed class CapitalizationFilter : TokenFilter, IDisposable

    Constructors

    CapitalizationFilter(TokenStream)

    Creates a CapitalizationFilter with the default parameters using the invariant culture.

    Calls CapitalizationFilter(in, true, null, true, null, 0, DEFAULT_MAX_WORD_COUNT, DEFAULT_MAX_TOKEN_LENGTH, null)

    Declaration
    public CapitalizationFilter(TokenStream @in)
    Parameters
    Type Name Description
    TokenStream in

    CapitalizationFilter(TokenStream, bool, CharArraySet, bool, ICollection<char[]>, int, int, int)

    Creates a CapitalizationFilter with the specified parameters using the invariant culture.

    Declaration
    public CapitalizationFilter(TokenStream @in, bool onlyFirstWord, CharArraySet keep, bool forceFirstLetter, ICollection<char[]> okPrefix, int minWordLength, int maxWordCount, int maxTokenLength)
    Parameters
    Type Name Description
    TokenStream in

    input tokenstream

    bool onlyFirstWord

    should each word be capitalized or all of the words?

    CharArraySet keep

    a keep word list. Each word that should be kept separated by whitespace.

    bool forceFirstLetter

    Force the first letter to be capitalized even if it is in the keep list.

    ICollection<char[]> okPrefix

    do not change word capitalization if a word begins with something in this list.

    int minWordLength

    how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or".

    int maxWordCount

    if the token contains more then maxWordCount words, the capitalization is assumed to be correct.

    int maxTokenLength

    The maximum length for an individual token. Tokens that exceed this length will not have the capitalization operation performed.

    CapitalizationFilter(TokenStream, bool, CharArraySet, bool, ICollection<char[]>, int, int, int, CultureInfo)

    Creates a CapitalizationFilter with the specified parameters and the specified culture.

    Declaration
    public CapitalizationFilter(TokenStream @in, bool onlyFirstWord, CharArraySet keep, bool forceFirstLetter, ICollection<char[]> okPrefix, int minWordLength, int maxWordCount, int maxTokenLength, CultureInfo culture)
    Parameters
    Type Name Description
    TokenStream in

    input tokenstream

    bool onlyFirstWord

    should each word be capitalized or all of the words?

    CharArraySet keep

    a keep word list. Each word that should be kept separated by whitespace.

    bool forceFirstLetter

    Force the first letter to be capitalized even if it is in the keep list.

    ICollection<char[]> okPrefix

    do not change word capitalization if a word begins with something in this list.

    int minWordLength

    how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or".

    int maxWordCount

    if the token contains more then maxWordCount words, the capitalization is assumed to be correct.

    int maxTokenLength

    The maximum length for an individual token. Tokens that exceed this length will not have the capitalization operation performed.

    CultureInfo culture

    The culture to use for the casing operation. If null, InvariantCulture will be used.

    CapitalizationFilter(TokenStream, CultureInfo)

    Creates a CapitalizationFilter with the default parameters and the specified culture.

    Calls CapitalizationFilter(in, true, null, true, null, 0, DEFAULT_MAX_WORD_COUNT, DEFAULT_MAX_TOKEN_LENGTH)

    Declaration
    public CapitalizationFilter(TokenStream @in, CultureInfo culture)
    Parameters
    Type Name Description
    TokenStream in

    input tokenstream

    CultureInfo culture

    The culture to use for the casing operation. If null, InvariantCulture will be used.

    Fields

    DEFAULT_MAX_TOKEN_LENGTH

    A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.

    This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.
    Declaration
    public static readonly int DEFAULT_MAX_TOKEN_LENGTH
    Field Value
    Type Description
    int

    DEFAULT_MAX_WORD_COUNT

    A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.

    This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.
    Declaration
    public static readonly int DEFAULT_MAX_WORD_COUNT
    Field Value
    Type Description
    int

    Methods

    IncrementToken()

    Consumers (i.e., Lucene.Net.Index.IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate Lucene.Net.Util.IAttributes with the attributes of the next token.

    The producer must make no assumptions about the attributes after the method has been returned: the caller may arbitrarily change it. If the producer needs to preserve the state for subsequent calls, it can use Lucene.Net.Util.AttributeSource.CaptureState() to create a copy of the current attribute state.

    this method is called for every token of a document, so an efficient implementation is crucial for good performance. To avoid calls to Lucene.Net.Util.AttributeSource.AddAttribute<T>() and Lucene.Net.Util.AttributeSource.GetAttribute<T>(), references to all Lucene.Net.Util.IAttributes that this stream uses should be retrieved during instantiation.

    To ensure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in Lucene.Net.Analysis.TokenStream.IncrementToken().
    Declaration
    public override bool IncrementToken()
    Returns
    Type Description
    bool

    false for end of stream; true otherwise

    Overrides
    Lucene.Net.Analysis.TokenStream.IncrementToken()

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.