Class ChineseTokenizer
Tokenize Chinese text as individual chinese characters.
The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.
For example, if the Chinese text "C1C2C3C4" is to be indexed:
- The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
- The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index created by CJKTokenizer is much larger.
The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.
Implements
IDisposable
Inherited Members
Namespace: Lucene.Net.Analysis.Cn
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class ChineseTokenizer : Tokenizer, IDisposable
Constructors
| Improve this Doc View SourceChineseTokenizer(AttributeSource.AttributeFactory, TextReader)
Declaration
public ChineseTokenizer(AttributeSource.AttributeFactory factory, TextReader in)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
TextReader | in |
ChineseTokenizer(TextReader)
Declaration
public ChineseTokenizer(TextReader in)
Parameters
Type | Name | Description |
---|---|---|
TextReader | in |
Methods
| Improve this Doc View SourceEnd()
Declaration
public override sealed void End()
Overrides
| Improve this Doc View SourceIncrementToken()
Declaration
public override bool IncrementToken()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()
Overrides
Implements
IDisposable