Class UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in ` Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
Tokens produced are of the following types:
- <ALPHANUM>: A sequence of alphabetic and numeric characters
- <NUM>: A number
- <URL>: A URL
- <EMAIL>: An email address
- <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer
- <IDEOGRAPHIC>: A single CJKV ideographic character
- <HIRAGANA>: A single hiragana character
You must specify the required LuceneVersion compatibility when creating UAX29URLEmailTokenizer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
Implements
System.IDisposable
Inherited Members
System.Object.Equals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
Namespace: Lucene.Net.Analysis.Standard
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class UAX29URLEmailTokenizer : Tokenizer, IDisposable
Constructors
| Improve this Doc View SourceUAX29URLEmailTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Creates a new UAX29URLEmailTokenizer with a given AttributeSource.AttributeFactory
Declaration
public UAX29URLEmailTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | |
AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | input |
UAX29URLEmailTokenizer(LuceneVersion, TextReader)
Creates a new instance of the UAX29URLEmailTokenizer. Attaches
the input
to the newly created JFlex scanner.
Declaration
public UAX29URLEmailTokenizer(LuceneVersion matchVersion, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Lucene compatibility version |
System.IO.TextReader | input | The input reader |
Fields
| Improve this Doc View SourceALPHANUM
Declaration
public const int ALPHANUM = 0
Field Value
Type | Description |
---|---|
System.Int32 |
Declaration
public const int EMAIL = 8
Field Value
Type | Description |
---|---|
System.Int32 |
HANGUL
Declaration
public const int HANGUL = 6
Field Value
Type | Description |
---|---|
System.Int32 |
HIRAGANA
Declaration
public const int HIRAGANA = 4
Field Value
Type | Description |
---|---|
System.Int32 |
IDEOGRAPHIC
Declaration
public const int IDEOGRAPHIC = 3
Field Value
Type | Description |
---|---|
System.Int32 |
KATAKANA
Declaration
public const int KATAKANA = 5
Field Value
Type | Description |
---|---|
System.Int32 |
NUM
Declaration
public const int NUM = 1
Field Value
Type | Description |
---|---|
System.Int32 |
SOUTHEAST_ASIAN
Declaration
public const int SOUTHEAST_ASIAN = 2
Field Value
Type | Description |
---|---|
System.Int32 |
TOKEN_TYPES
String token types that correspond to token type int constants
Declaration
public static readonly string[] TOKEN_TYPES
Field Value
Type | Description |
---|---|
System.String[] |
URL
Declaration
public const int URL = 7
Field Value
Type | Description |
---|---|
System.Int32 |
Properties
| Improve this Doc View SourceMaxTokenLength
Set the max allowed token length. Any token longer than this is skipped.
Declaration
public int MaxTokenLength { get; set; }
Property Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceDispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | disposing |
Overrides
| Improve this Doc View SourceEnd()
Declaration
public override sealed void End()
Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override sealed bool IncrementToken()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()
Overrides
Implements
System.IDisposable