Class CJKTokenizer
CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
The tokens returned are every two adjacent characters with overlap match.
Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
Additionally, the following is applied to Latin text (such as English):- Text is converted to lowercase.
 - Numeric digits, '+', '#', and '_' are tokenized as letters.
 - Full-width forms are converted to half-width forms.
 
Implements
IDisposable
  Inherited Members
Namespace: Lucene.Net.Analysis.Cjk
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class CJKTokenizer : Tokenizer, IDisposable
  Constructors
| Improve this Doc View SourceCJKTokenizer(AttributeSource.AttributeFactory, TextReader)
Declaration
public CJKTokenizer(AttributeSource.AttributeFactory factory, TextReader in)
  Parameters
| Type | Name | Description | 
|---|---|---|
| AttributeSource.AttributeFactory | factory | |
| TextReader | in | 
CJKTokenizer(TextReader)
Construct a token stream processing the given input.
Declaration
public CJKTokenizer(TextReader in)
  Parameters
| Type | Name | Description | 
|---|---|---|
| TextReader | in | I/O reader  | 
      
Methods
| Improve this Doc View SourceEnd()
Declaration
public override sealed void End()
  Overrides
| Improve this Doc View SourceIncrementToken()
Returns true for the next token in the stream, or false at EOS. See http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.UnicodeBlock.html for detail.
Declaration
public override bool IncrementToken()
  Returns
| Type | Description | 
|---|---|
| System.Boolean | false for end of stream, true otherwise  | 
      
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()
  Overrides
Implements
      IDisposable