Class EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
This Tokenizer create n-grams from the beginning edge or ending edge of a input token.
As of Lucene 4.4, this tokenizer
- can handle
larger than 1024 chars, but beware that this will result in increased memory usagemaxGram
- doesn't trim the input,
- sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones
- doesn't support backward n-grams anymore.
- supports IsTokenChar(Int32) pre-tokenization,
- correctly handles supplementary characters.
Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.NGram
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class EdgeNGramTokenizer : NGramTokenizer, IDisposable
Constructors
| Improve this Doc View SourceEdgeNGramTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader, Int32, Int32)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
Declaration
public EdgeNGramTokenizer(LuceneVersion version, AttributeSource.AttributeFactory factory, TextReader input, int minGram, int maxGram)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version | the Lucene match version - See LuceneVersion |
AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | input | System.IO.TextReader holding the input to be tokenized |
System.Int32 | minGram | the smallest n-gram to generate |
System.Int32 | maxGram | the largest n-gram to generate |
EdgeNGramTokenizer(LuceneVersion, TextReader, Int32, Int32)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
Declaration
public EdgeNGramTokenizer(LuceneVersion version, TextReader input, int minGram, int maxGram)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version | the Lucene match version - See LuceneVersion |
System.IO.TextReader | input | System.IO.TextReader holding the input to be tokenized |
System.Int32 | minGram | the smallest n-gram to generate |
System.Int32 | maxGram | the largest n-gram to generate |
Fields
| Improve this Doc View SourceDEFAULT_MAX_GRAM_SIZE
Declaration
public const int DEFAULT_MAX_GRAM_SIZE = 1
Field Value
Type | Description |
---|---|
System.Int32 |
DEFAULT_MIN_GRAM_SIZE
Declaration
public const int DEFAULT_MIN_GRAM_SIZE = 1
Field Value
Type | Description |
---|---|
System.Int32 |