Class CJKTokenizer
CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
The tokens returned are every two adjacent characters with overlap match.
Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
Additionally, the following is applied to Latin text (such as English):- Text is converted to lowercase.
- Numeric digits, '+', '#', and '_' are tokenized as letters.
- Full-width forms are converted to half-width forms.
Inheritance
System.Object
    Lucene.Net.Util.AttributeSource
    Lucene.Net.Analysis.TokenStream
    Lucene.Net.Analysis.Tokenizer
    CJKTokenizer
  Implements
System.IDisposable
  Inherited Members
      Lucene.Net.Analysis.Tokenizer.m_input
    
    
    
    
    
      Lucene.Net.Analysis.TokenStream.Dispose()
    
    
      Lucene.Net.Util.AttributeSource.GetAttributeFactory()
    
    
      Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()
    
    
      Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()
    
    
      Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)
    
    
      Lucene.Net.Util.AttributeSource.AddAttribute<T>()
    
    
      Lucene.Net.Util.AttributeSource.HasAttributes
    
    
      Lucene.Net.Util.AttributeSource.HasAttribute<T>()
    
    
      Lucene.Net.Util.AttributeSource.GetAttribute<T>()
    
    
      Lucene.Net.Util.AttributeSource.ClearAttributes()
    
    
      Lucene.Net.Util.AttributeSource.CaptureState()
    
    
      Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
    
    
      Lucene.Net.Util.AttributeSource.GetHashCode()
    
    
    
    
      Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)
    
    
      Lucene.Net.Util.AttributeSource.CloneAttributes()
    
    
      Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)
    
    
      Lucene.Net.Util.AttributeSource.ToString()
    
    
      System.Object.Equals(System.Object, System.Object)
    
    
      System.Object.GetType()
    
    
      System.Object.MemberwiseClone()
    
    
      System.Object.ReferenceEquals(System.Object, System.Object)
    
  Namespace: Lucene.Net.Analysis.Cjk
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Obsolete("Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.")]
public sealed class CJKTokenizer : Tokenizer, IDisposableConstructors
| Improve this Doc View SourceCJKTokenizer(AttributeSource.AttributeFactory, TextReader)
Declaration
public CJKTokenizer(AttributeSource.AttributeFactory factory, TextReader in)Parameters
| Type | Name | Description | 
|---|---|---|
| Lucene.Net.Util.AttributeSource.AttributeFactory | factory | |
| System.IO.TextReader | in | 
CJKTokenizer(TextReader)
Construct a token stream processing the given input.
Declaration
public CJKTokenizer(TextReader in)Parameters
| Type | Name | Description | 
|---|---|---|
| System.IO.TextReader | in | I/O reader | 
Methods
| Improve this Doc View SourceEnd()
Declaration
public override sealed void End()Overrides
Lucene.Net.Analysis.TokenStream.End()
  
    |
    Improve this Doc
  
  
    View Source
  
  
  IncrementToken()
Returns true for the next token in the stream, or false at EOS. See http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.UnicodeBlock.html for detail.
Declaration
public override bool IncrementToken()Returns
| Type | Description | 
|---|---|
| System.Boolean | false for end of stream, true otherwise | 
Overrides
Lucene.Net.Analysis.TokenStream.IncrementToken()
  Exceptions
| Type | Condition | 
|---|---|
| System.IO.IOException | when read error happened in the InputStream | 
Reset()
Declaration
public override void Reset()Overrides
Lucene.Net.Analysis.Tokenizer.Reset()
  Implements
      System.IDisposable