Namespace Lucene.Net.Analysis.Cn
Classes
ChineseAnalyzer
An Analyzer that tokenizes text with Chinese
ChineseFilter
A Token
- Numeric tokens are removed.
- English tokens must be larger than 1 character.
- One Chinese character as one Chinese word.
- Add Chinese stop words, such as \ue400
- Dictionary based Chinese word extraction
- Intelligent Chinese word extraction
ChineseFilterFactory
Factory for Chinese
ChineseTokenizer
Tokenize Chinese text as individual chinese characters.
The difference between Chinese
For example, if the Chinese text "C1C2C3C4" is to be indexed:
- The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
- The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index created by CJKTokenizer is much larger.
The problem is that when searching for C1, C1C2, C1C3,
C4C2, C1C2C3 ... the Chinese
ChineseTokenizerFactory
Factory for Chinese