Class UAX29URLEmailTokenizerImpl31
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Standard.Std31
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Obsolete("This class is only for exact backwards compatibility")]
public sealed class UAX29URLEmailTokenizerImpl31 : IStandardTokenizerInterface
Constructors
UAX29URLEmailTokenizerImpl31(TextReader)
Creates a new scanner
Declaration
public UAX29URLEmailTokenizerImpl31(TextReader @in)
Parameters
Type | Name | Description |
---|---|---|
TextReader | in | the TextReader to read input from. |
Fields
EMAIL_TYPE
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Declaration
public static readonly int EMAIL_TYPE
Field Value
Type | Description |
---|---|
int |
HANGUL_TYPE
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Declaration
public static readonly int HANGUL_TYPE
Field Value
Type | Description |
---|---|
int |
HIRAGANA_TYPE
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Declaration
public static readonly int HIRAGANA_TYPE
Field Value
Type | Description |
---|---|
int |
IDEOGRAPHIC_TYPE
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Declaration
public static readonly int IDEOGRAPHIC_TYPE
Field Value
Type | Description |
---|---|
int |
KATAKANA_TYPE
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Declaration
public static readonly int KATAKANA_TYPE
Field Value
Type | Description |
---|---|
int |
NUMERIC_TYPE
Numbers
Declaration
public static readonly int NUMERIC_TYPE
Field Value
Type | Description |
---|---|
int |
SOUTH_EAST_ASIAN_TYPE
Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.
See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SADeclaration
public static readonly int SOUTH_EAST_ASIAN_TYPE
Field Value
Type | Description |
---|---|
int |
URL_TYPE
This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters: @deprecated This class is only for exact backwards compatibility
Declaration
public static readonly int URL_TYPE
Field Value
Type | Description |
---|---|
int |
WORD_TYPE
Alphanumeric sequences
Declaration
public static readonly int WORD_TYPE
Field Value
Type | Description |
---|---|
int |
YYEOF
This character denotes the end of file
Declaration
public static readonly int YYEOF
Field Value
Type | Description |
---|---|
int |
YYINITIAL
lexical states
Declaration
public const int YYINITIAL = 0
Field Value
Type | Description |
---|---|
int |
Properties
YyChar
Returns the current position.
Declaration
public int YyChar { get; }
Property Value
Type | Description |
---|---|
int |
YyLength
Returns the length of the matched text region.
Declaration
public int YyLength { get; }
Property Value
Type | Description |
---|---|
int |
YyState
Returns the current lexical state.
Declaration
public int YyState { get; }
Property Value
Type | Description |
---|---|
int |
YyText
Returns the text matched by the current regular expression.
Declaration
public string YyText { get; }
Property Value
Type | Description |
---|---|
string |
Methods
GetNextToken()
Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
Declaration
public int GetNextToken()
Returns
Type | Description |
---|---|
int | the next token |
Exceptions
Type | Condition |
---|---|
IOException | if any I/O-Error occurs |
GetText(ICharTermAttribute)
Fills Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute with the current token text.
Declaration
public void GetText(ICharTermAttribute t)
Parameters
Type | Name | Description |
---|---|---|
ICharTermAttribute | t |
YyBegin(int)
Enters a new lexical state
Declaration
public void YyBegin(int newState)
Parameters
Type | Name | Description |
---|---|---|
int | newState | the new lexical state |
YyCharAt(int)
Returns the character at position pos from the matched text.
It is equivalent to YyText[pos], but faster
Declaration
public char YyCharAt(int pos)
Parameters
Type | Name | Description |
---|---|---|
int | pos | the position of the character to fetch. A value from 0 to YyLength-1. |
Returns
Type | Description |
---|---|
char | the character at position pos |
YyClose()
Closes the input stream.
Declaration
public void YyClose()
YyPushBack(int)
Pushes the specified amount of characters back into the input stream.
They will be read again by then next call of the scanning method
Declaration
public void YyPushBack(int number)
Parameters
Type | Name | Description |
---|---|---|
int | number | the number of characters to be read again. This number must not be greater than YyLength! |
YyReset(TextReader)
Resets the scanner to read from a new input stream. Does not close the old reader.
All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to YYINITIAL.
Internal scan buffer is resized down to its initial length, if it has grown.
Declaration
public void YyReset(TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | the new input stream |