Class HMMChineseTokenizerFactory
Factory for HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Lucene.Net.Analysis.Cn.Smart
Assembly: Lucene.Net.Analysis.SmartCn.dll
Syntax
public sealed class HMMChineseTokenizerFactory : TokenizerFactory
Constructors
| Improve this Doc View SourceHMMChineseTokenizerFactory(IDictionary<String, String>)
Creates a new HMMChineseTokenizerFactory
Declaration
public HMMChineseTokenizerFactory(IDictionary<string, string> args)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IDictionary<System.String, System.String> | args |
Methods
| Improve this Doc View SourceCreate(AttributeSource.AttributeFactory, TextReader)
Declaration
public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | reader |
Returns
Type | Description |
---|---|
Tokenizer |