Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class UAX29URLEmailTokenizerImpl34

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Inheritance
    object
    UAX29URLEmailTokenizerImpl34
    Implements
    IStandardTokenizerInterface
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Standard.Std34
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    [Obsolete("This class is only for exact backwards compatibility")]
    public sealed class UAX29URLEmailTokenizerImpl34 : IStandardTokenizerInterface

    Constructors

    UAX29URLEmailTokenizerImpl34(TextReader)

    Creates a new scanner

    Declaration
    public UAX29URLEmailTokenizerImpl34(TextReader @in)
    Parameters
    Type Name Description
    TextReader in

    the TextReader to read input from.

    Fields

    EMAIL_TYPE

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Declaration
    public static readonly int EMAIL_TYPE
    Field Value
    Type Description
    int

    HANGUL_TYPE

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Declaration
    public static readonly int HANGUL_TYPE
    Field Value
    Type Description
    int

    HIRAGANA_TYPE

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Declaration
    public static readonly int HIRAGANA_TYPE
    Field Value
    Type Description
    int

    IDEOGRAPHIC_TYPE

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Declaration
    public static readonly int IDEOGRAPHIC_TYPE
    Field Value
    Type Description
    int

    KATAKANA_TYPE

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Declaration
    public static readonly int KATAKANA_TYPE
    Field Value
    Type Description
    int

    NUMERIC_TYPE

    Numbers

    Declaration
    public static readonly int NUMERIC_TYPE
    Field Value
    Type Description
    int

    SOUTH_EAST_ASIAN_TYPE

    Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.

    See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
    Declaration
    public static readonly int SOUTH_EAST_ASIAN_TYPE
    Field Value
    Type Description
    int

    URL_TYPE

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Declaration
    public static readonly int URL_TYPE
    Field Value
    Type Description
    int

    WORD_TYPE

    Alphanumeric sequences

    Declaration
    public static readonly int WORD_TYPE
    Field Value
    Type Description
    int

    YYEOF

    This character denotes the end of file

    Declaration
    public static readonly int YYEOF
    Field Value
    Type Description
    int

    YYINITIAL

    lexical states

    Declaration
    public const int YYINITIAL = 0
    Field Value
    Type Description
    int

    Properties

    YyChar

    Returns the current position.

    Declaration
    public int YyChar { get; }
    Property Value
    Type Description
    int

    YyLength

    Returns the length of the matched text region.

    Declaration
    public int YyLength { get; }
    Property Value
    Type Description
    int

    YyState

    Returns the current lexical state.

    Declaration
    public int YyState { get; }
    Property Value
    Type Description
    int

    YyText

    Returns the text matched by the current regular expression.

    Declaration
    public string YyText { get; }
    Property Value
    Type Description
    string

    Methods

    GetNextToken()

    Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

    Declaration
    public int GetNextToken()
    Returns
    Type Description
    int

    the next token

    Exceptions
    Type Condition
    IOException

    if any I/O-Error occurs

    GetText(ICharTermAttribute)

    Fills ICharTermAttribute with the current token text.

    Declaration
    public void GetText(ICharTermAttribute t)
    Parameters
    Type Name Description
    ICharTermAttribute t

    YyBegin(int)

    Enters a new lexical state

    Declaration
    public void YyBegin(int newState)
    Parameters
    Type Name Description
    int newState

    the new lexical state

    YyCharAt(int)

    Returns the character at position pos from the matched text.

    It is equivalent to YyText[pos], but faster
    Declaration
    public char YyCharAt(int pos)
    Parameters
    Type Name Description
    int pos

    the position of the character to fetch. A value from 0 to YyLength-1.

    Returns
    Type Description
    char

    the character at position pos

    YyClose()

    Disposes the input stream.

    Declaration
    public void YyClose()

    YyPushBack(int)

    Pushes the specified amount of characters back into the input stream.

    They will be read again by then next call of the scanning method
    Declaration
    public void YyPushBack(int number)
    Parameters
    Type Name Description
    int number

    the number of characters to be read again. This number must not be greater than YyLength!

    YyReset(TextReader)

    Resets the scanner to read from a new input stream. Does not close the old reader.

    All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to YYINITIAL.

    Internal scan buffer is resized down to its initial length, if it has grown.
    Declaration
    public void YyReset(TextReader reader)
    Parameters
    Type Name Description
    TextReader reader

    the new input stream

    Implements

    IStandardTokenizerInterface
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.