[Missing <summary> documentation for "N:Lucene.Net.Analysis.Cn"]

Classes

  ClassDescription
Public classChineseAnalyzer
Title: ChineseAnalyzer Description: Subclass of org.apache.lucene.analysis.Analyzer build from a ChineseTokenizer, filtered with ChineseFilter. Copyright: Copyright (c) 2001 Company: Yiyi Sun$Id: ChineseAnalyzer.java, v 1.2 2003/01/22 20:54:47 ehatcher Exp $
Public classChineseFilter
Title: ChineseFilter Description: Filter with a stop word table Rule: No digital is allowed. English word/token should larger than 1 character. One Chinese character as one Chinese word. TO DO: 1. Add Chinese stop words, such as \ue400 2. Dictionary based Chinese word extraction 3. Intelligent Chinese word extraction Copyright: Copyright (c) 2001 Company: Yiyi Sun$Id: ChineseFilter.java, v 1.4 2003/01/23 12:49:33 ehatcher Exp $
Public classChineseTokenizer
Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. Let me use an example. If having a Chinese text "C1C2C3C4" to be indexed, the tokens returned from the ChineseTokenizer are C1, C2, C3, C4. And the tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4. Therefore the index the CJKTokenizer created is much larger. The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work. Yiyi Sun$Id: ChineseTokenizer.java, v 1.4 2003/03/02 13:56:03 otis Exp $