Class EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
This Lucene.Net.Analysis.Tokenizer create n-grams from the beginning edge or ending edge of a input token.
As of Lucene 4.4, this tokenizer
- can handle
larger than 1024 chars, but beware that this will result in increased memory usagemaxGram
- doesn't trim the input,
- sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones
- doesn't support backward n-grams anymore.
- supports IsTokenChar(int) pre-tokenization,
- correctly handles supplementary characters.
Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.NGram
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class EdgeNGramTokenizer : NGramTokenizer, IDisposable
Constructors
EdgeNGramTokenizer(LuceneVersion, AttributeFactory, TextReader, int, int)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
Declaration
public EdgeNGramTokenizer(LuceneVersion version, AttributeSource.AttributeFactory factory, TextReader input, int minGram, int maxGram)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version | the Lucene match version - See Lucene.Net.Util.LuceneVersion |
AttributeSource.AttributeFactory | factory | Lucene.Net.Util.AttributeSource.AttributeFactory to use |
TextReader | input | TextReader holding the input to be tokenized |
int | minGram | the smallest n-gram to generate |
int | maxGram | the largest n-gram to generate |
EdgeNGramTokenizer(LuceneVersion, TextReader, int, int)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
Declaration
public EdgeNGramTokenizer(LuceneVersion version, TextReader input, int minGram, int maxGram)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version | the Lucene match version - See Lucene.Net.Util.LuceneVersion |
TextReader | input | TextReader holding the input to be tokenized |
int | minGram | the smallest n-gram to generate |
int | maxGram | the largest n-gram to generate |
Fields
DEFAULT_MAX_GRAM_SIZE
Tokenizes the input from an edge into n-grams of given size(s).
This Lucene.Net.Analysis.Tokenizer create n-grams from the beginning edge or ending edge of a input token.
As of Lucene 4.4, this tokenizer
- can handle
larger than 1024 chars, but beware that this will result in increased memory usagemaxGram
- doesn't trim the input,
- sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones
- doesn't support backward n-grams anymore.
- supports IsTokenChar(int) pre-tokenization,
- correctly handles supplementary characters.
Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.
Declaration
public const int DEFAULT_MAX_GRAM_SIZE = 1
Field Value
Type | Description |
---|---|
int |
DEFAULT_MIN_GRAM_SIZE
Tokenizes the input from an edge into n-grams of given size(s).
This Lucene.Net.Analysis.Tokenizer create n-grams from the beginning edge or ending edge of a input token.
As of Lucene 4.4, this tokenizer
- can handle
larger than 1024 chars, but beware that this will result in increased memory usagemaxGram
- doesn't trim the input,
- sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones
- doesn't support backward n-grams anymore.
- supports IsTokenChar(int) pre-tokenization,
- correctly handles supplementary characters.
Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.
Declaration
public const int DEFAULT_MIN_GRAM_SIZE = 1
Field Value
Type | Description |
---|---|
int |