Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class UAX29URLEmailTokenizerImpl34

    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.

    @deprecated This class is only for exact backwards compatibility

    Inheritance
    System.Object
    UAX29URLEmailTokenizerImpl34
    Implements
    IStandardTokenizerInterface
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Standard.Std34
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    [Obsolete("This class is only for exact backwards compatibility")]
    public sealed class UAX29URLEmailTokenizerImpl34 : IStandardTokenizerInterface

    Constructors

    | Improve this Doc View Source

    UAX29URLEmailTokenizerImpl34(TextReader)

    Creates a new scanner

    Declaration
    public UAX29URLEmailTokenizerImpl34(TextReader in)
    Parameters
    Type Name Description
    System.IO.TextReader in

    the TextReader to read input from.

    Fields

    | Improve this Doc View Source

    EMAIL_TYPE

    Declaration
    public static readonly int EMAIL_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HANGUL_TYPE

    Declaration
    public static readonly int HANGUL_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    HIRAGANA_TYPE

    Declaration
    public static readonly int HIRAGANA_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    IDEOGRAPHIC_TYPE

    Declaration
    public static readonly int IDEOGRAPHIC_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    KATAKANA_TYPE

    Declaration
    public static readonly int KATAKANA_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    NUMERIC_TYPE

    Numbers

    Declaration
    public static readonly int NUMERIC_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    SOUTH_EAST_ASIAN_TYPE

    Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.

    See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA

    Declaration
    public static readonly int SOUTH_EAST_ASIAN_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    URL_TYPE

    Declaration
    public static readonly int URL_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    WORD_TYPE

    Alphanumeric sequences

    Declaration
    public static readonly int WORD_TYPE
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    YYEOF

    This character denotes the end of file

    Declaration
    public static readonly int YYEOF
    Field Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    YYINITIAL

    lexical states

    Declaration
    public const int YYINITIAL = 0
    Field Value
    Type Description
    System.Int32

    Properties

    | Improve this Doc View Source

    YyChar

    Declaration
    public int YyChar { get; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    YyLength

    Returns the length of the matched text region.

    Declaration
    public int YyLength { get; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    YyState

    Returns the current lexical state.

    Declaration
    public int YyState { get; }
    Property Value
    Type Description
    System.Int32
    | Improve this Doc View Source

    YyText

    Returns the text matched by the current regular expression.

    Declaration
    public string YyText { get; }
    Property Value
    Type Description
    System.String

    Methods

    | Improve this Doc View Source

    GetNextToken()

    Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.

    Declaration
    public int GetNextToken()
    Returns
    Type Description
    System.Int32

    the next token

    Exceptions
    Type Condition
    System.IO.IOException

    if any I/O-Error occurs

    | Improve this Doc View Source

    GetText(ICharTermAttribute)

    Fills ICharTermAttribute with the current token text.

    Declaration
    public void GetText(ICharTermAttribute t)
    Parameters
    Type Name Description
    Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute t
    | Improve this Doc View Source

    YyBegin(Int32)

    Enters a new lexical state

    Declaration
    public void YyBegin(int newState)
    Parameters
    Type Name Description
    System.Int32 newState

    the new lexical state

    | Improve this Doc View Source

    YyCharAt(Int32)

    Returns the character at position pos from the matched text.

    It is equivalent to YyText[pos], but faster

    Declaration
    public char YyCharAt(int pos)
    Parameters
    Type Name Description
    System.Int32 pos

    the position of the character to fetch. A value from 0 to YyLength-1.

    Returns
    Type Description
    System.Char

    the character at position pos

    | Improve this Doc View Source

    YyClose()

    Disposes the input stream.

    Declaration
    public void YyClose()
    | Improve this Doc View Source

    YyPushBack(Int32)

    Pushes the specified amount of characters back into the input stream.

    They will be read again by then next call of the scanning method

    Declaration
    public void YyPushBack(int number)
    Parameters
    Type Name Description
    System.Int32 number

    the number of characters to be read again. This number must not be greater than YyLength!

    | Improve this Doc View Source

    YyReset(TextReader)

    Resets the scanner to read from a new input stream. Does not close the old reader.

    All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to YYINITIAL.

    Internal scan buffer is resized down to its initial length, if it has grown.

    Declaration
    public void YyReset(TextReader reader)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    the new input stream

    Implements

    IStandardTokenizerInterface
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.