Class UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in ` Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
Tokens produced are of the following types:
- <ALPHANUM>: A sequence of alphabetic and numeric characters
 - <NUM>: A number
 - <URL>: A URL
 - <EMAIL>: An email address
 - <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer
 - <IDEOGRAPHIC>: A single CJKV ideographic character
 - <HIRAGANA>: A single hiragana character
 
You must specify the required LuceneVersion compatibility when creating UAX29URLEmailTokenizer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
 
Implements
System.IDisposable
  Inherited Members
      System.Object.Equals(System.Object, System.Object)
    
    
      System.Object.GetType()
    
    
      System.Object.MemberwiseClone()
    
    
      System.Object.ReferenceEquals(System.Object, System.Object)
    
  Namespace: Lucene.Net.Analysis.Standard
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class UAX29URLEmailTokenizer : Tokenizer, IDisposable
  Constructors
| Improve this Doc View SourceUAX29URLEmailTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)
Creates a new UAX29URLEmailTokenizer with a given AttributeSource.AttributeFactory
Declaration
public UAX29URLEmailTokenizer(LuceneVersion matchVersion, AttributeSource.AttributeFactory factory, TextReader input)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | |
| AttributeSource.AttributeFactory | factory | |
| System.IO.TextReader | input | 
UAX29URLEmailTokenizer(LuceneVersion, TextReader)
Creates a new instance of the UAX29URLEmailTokenizer.  Attaches
the input to the newly created JFlex scanner.
Declaration
public UAX29URLEmailTokenizer(LuceneVersion matchVersion, TextReader input)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | Lucene compatibility version  | 
      
| System.IO.TextReader | input | The input reader  | 
      
Fields
| Improve this Doc View SourceALPHANUM
Declaration
public const int ALPHANUM = 0
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
Declaration
public const int EMAIL = 8
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
HANGUL
Declaration
public const int HANGUL = 6
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
HIRAGANA
Declaration
public const int HIRAGANA = 4
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
IDEOGRAPHIC
Declaration
public const int IDEOGRAPHIC = 3
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
KATAKANA
Declaration
public const int KATAKANA = 5
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
NUM
Declaration
public const int NUM = 1
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
SOUTHEAST_ASIAN
Declaration
public const int SOUTHEAST_ASIAN = 2
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
TOKEN_TYPES
String token types that correspond to token type int constants
Declaration
public static readonly string[] TOKEN_TYPES
  Field Value
| Type | Description | 
|---|---|
| System.String[] | 
URL
Declaration
public const int URL = 7
  Field Value
| Type | Description | 
|---|---|
| System.Int32 | 
Properties
| Improve this Doc View SourceMaxTokenLength
Set the max allowed token length. Any token longer than this is skipped.
Declaration
public int MaxTokenLength { get; set; }
  Property Value
| Type | Description | 
|---|---|
| System.Int32 | 
Methods
| Improve this Doc View SourceDispose(Boolean)
Declaration
protected override void Dispose(bool disposing)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.Boolean | disposing | 
Overrides
| Improve this Doc View SourceEnd()
Declaration
public override sealed void End()
  Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override sealed bool IncrementToken()
  Returns
| Type | Description | 
|---|---|
| System.Boolean | 
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()
  Overrides
Implements
      System.IDisposable