Show / Hide Table of Contents

    Class EdgeNGramTokenizer

    Tokenizes the input from an edge into n-grams of given size(s).

    This Tokenizer create n-grams from the beginning edge or ending edge of a input token.

    As of Lucene 4.4, this tokenizer

    • can handle
      maxGram
      larger than 1024 chars, but beware that this will result in increased memory usage
    • doesn't trim the input,
    • sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones
    • doesn't support backward n-grams anymore.
    • supports IsTokenChar(Int32) pre-tokenization,
    • correctly handles supplementary characters.

    Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.

    Inheritance
    System.Object
    AttributeSource
    TokenStream
    Tokenizer
    NGramTokenizer
    EdgeNGramTokenizer
    Implements
    IDisposable
    Inherited Members
    NGramTokenizer.DEFAULT_MIN_NGRAM_SIZE
    NGramTokenizer.DEFAULT_MAX_NGRAM_SIZE
    NGramTokenizer.IncrementToken()
    NGramTokenizer.IsTokenChar(Int32)
    NGramTokenizer.End()
    NGramTokenizer.Reset()
    Tokenizer.m_input
    Tokenizer.Dispose(Boolean)
    Tokenizer.CorrectOffset(Int32)
    Tokenizer.SetReader(TextReader)
    TokenStream.Dispose()
    AttributeSource.GetAttributeFactory()
    AttributeSource.GetAttributeClassesEnumerator()
    AttributeSource.GetAttributeImplsEnumerator()
    AttributeSource.AddAttributeImpl(Attribute)
    AttributeSource.AddAttribute<T>()
    AttributeSource.HasAttributes
    AttributeSource.HasAttribute<T>()
    AttributeSource.GetAttribute<T>()
    AttributeSource.ClearAttributes()
    AttributeSource.CaptureState()
    AttributeSource.RestoreState(AttributeSource.State)
    AttributeSource.GetHashCode()
    AttributeSource.Equals(Object)
    AttributeSource.ReflectAsString(Boolean)
    AttributeSource.ReflectWith(IAttributeReflector)
    AttributeSource.CloneAttributes()
    AttributeSource.CopyTo(AttributeSource)
    AttributeSource.ToString()
    Namespace: Lucene.Net.Analysis.NGram
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class EdgeNGramTokenizer : NGramTokenizer, IDisposable

    Constructors

    | Improve this Doc View Source

    EdgeNGramTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader, Int32, Int32)

    Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range

    Declaration
    public EdgeNGramTokenizer(LuceneVersion version, AttributeSource.AttributeFactory factory, TextReader input, int minGram, int maxGram)
    Parameters
    Type Name Description
    LuceneVersion version

    the Lucene match version - See LuceneVersion

    AttributeSource.AttributeFactory factory

    AttributeSource.AttributeFactory to use

    TextReader input

    holding the input to be tokenized

    System.Int32 minGram

    the smallest n-gram to generate

    System.Int32 maxGram

    the largest n-gram to generate

    | Improve this Doc View Source

    EdgeNGramTokenizer(LuceneVersion, TextReader, Int32, Int32)

    Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range

    Declaration
    public EdgeNGramTokenizer(LuceneVersion version, TextReader input, int minGram, int maxGram)
    Parameters
    Type Name Description
    LuceneVersion version

    the Lucene match version - See LuceneVersion

    TextReader input

    holding the input to be tokenized

    System.Int32 minGram

    the smallest n-gram to generate

    System.Int32 maxGram

    the largest n-gram to generate

    Fields

    | Improve this Doc View Source

    DEFAULT_MAX_GRAM_SIZE

    Declaration
    public const int DEFAULT_MAX_GRAM_SIZE = null
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    DEFAULT_MIN_GRAM_SIZE

    Declaration
    public const int DEFAULT_MIN_GRAM_SIZE = null
    Field Value
    Type Description
    System.Int32

    Implements

    IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)