Class CapitalizationFilter
A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.
This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Miscellaneous
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class CapitalizationFilter : TokenFilter, IDisposable
Constructors
CapitalizationFilter(TokenStream)
Creates a CapitalizationFilter with the default parameters using the invariant culture.
Declaration
public CapitalizationFilter(TokenStream @in)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | in |
CapitalizationFilter(TokenStream, bool, CharArraySet, bool, ICollection<char[]>, int, int, int)
Creates a CapitalizationFilter with the specified parameters using the invariant culture.
Declaration
public CapitalizationFilter(TokenStream @in, bool onlyFirstWord, CharArraySet keep, bool forceFirstLetter, ICollection<char[]> okPrefix, int minWordLength, int maxWordCount, int maxTokenLength)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | in | input tokenstream |
bool | onlyFirstWord | should each word be capitalized or all of the words? |
CharArraySet | keep | a keep word list. Each word that should be kept separated by whitespace. |
bool | forceFirstLetter | Force the first letter to be capitalized even if it is in the keep list. |
ICollection<char[]> | okPrefix | do not change word capitalization if a word begins with something in this list. |
int | minWordLength | how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or". |
int | maxWordCount | if the token contains more then maxWordCount words, the capitalization is assumed to be correct. |
int | maxTokenLength | The maximum length for an individual token. Tokens that exceed this length will not have the capitalization operation performed. |
CapitalizationFilter(TokenStream, bool, CharArraySet, bool, ICollection<char[]>, int, int, int, CultureInfo)
Creates a CapitalizationFilter with the specified parameters and the specified culture
.
Declaration
public CapitalizationFilter(TokenStream @in, bool onlyFirstWord, CharArraySet keep, bool forceFirstLetter, ICollection<char[]> okPrefix, int minWordLength, int maxWordCount, int maxTokenLength, CultureInfo culture)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | in | input tokenstream |
bool | onlyFirstWord | should each word be capitalized or all of the words? |
CharArraySet | keep | a keep word list. Each word that should be kept separated by whitespace. |
bool | forceFirstLetter | Force the first letter to be capitalized even if it is in the keep list. |
ICollection<char[]> | okPrefix | do not change word capitalization if a word begins with something in this list. |
int | minWordLength | how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or". |
int | maxWordCount | if the token contains more then maxWordCount words, the capitalization is assumed to be correct. |
int | maxTokenLength | The maximum length for an individual token. Tokens that exceed this length will not have the capitalization operation performed. |
CultureInfo | culture | The culture to use for the casing operation. If null, InvariantCulture will be used. |
CapitalizationFilter(TokenStream, CultureInfo)
Creates a CapitalizationFilter with the default parameters and the specified culture
.
Declaration
public CapitalizationFilter(TokenStream @in, CultureInfo culture)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | in | input tokenstream |
CultureInfo | culture | The culture to use for the casing operation. If null, InvariantCulture will be used. |
Fields
DEFAULT_MAX_TOKEN_LENGTH
A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.
This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.Declaration
public static readonly int DEFAULT_MAX_TOKEN_LENGTH
Field Value
Type | Description |
---|---|
int |
DEFAULT_MAX_WORD_COUNT
A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case.
This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query.Declaration
public static readonly int DEFAULT_MAX_WORD_COUNT
Field Value
Type | Description |
---|---|
int |
Methods
IncrementToken()
Consumers (i.e., Lucene.Net.Index.IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate Lucene.Net.Util.IAttributes with the attributes of the next token.
The producer must make no assumptions about the attributes after the method has been returned: the caller may arbitrarily change it. If the producer needs to preserve the state for subsequent calls, it can use Lucene.Net.Util.AttributeSource.CaptureState() to create a copy of the current attribute state. this method is called for every token of a document, so an efficient implementation is crucial for good performance. To avoid calls to Lucene.Net.Util.AttributeSource.AddAttribute<T>() and Lucene.Net.Util.AttributeSource.GetAttribute<T>(), references to all Lucene.Net.Util.IAttributes that this stream uses should be retrieved during instantiation. To ensure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in Lucene.Net.Analysis.TokenStream.IncrementToken().Declaration
public override bool IncrementToken()
Returns
Type | Description |
---|---|
bool | false for end of stream; true otherwise |