Class HMMChineseTokenizerFactory
Factory for HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add a Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via:words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
TokenizerFactory.AvailableTokenizers
TokenizerFactory.ReloadTokenizers()
AbstractAnalysisFactory.LUCENE_MATCH_VERSION_PARAM
AbstractAnalysisFactory.OriginalArgs
AbstractAnalysisFactory.LuceneMatchVersion
AbstractAnalysisFactory.GetClassArg()
AbstractAnalysisFactory.IsExplicitLuceneMatchVersion
Namespace: Lucene.Net.Analysis.Cn.Smart
Assembly: Lucene.Net.Analysis.SmartCn.dll
Syntax
public sealed class HMMChineseTokenizerFactory : TokenizerFactory
Constructors
HMMChineseTokenizerFactory(IDictionary<string, string>)
Creates a new HMMChineseTokenizerFactory
Declaration
public HMMChineseTokenizerFactory(IDictionary<string, string> args)
Parameters
Type | Name | Description |
---|---|---|
IDictionary<string, string> | args |
Methods
Create(AttributeFactory, TextReader)
Creates a Lucene.Net.Analysis.TokenStream of the specified input using the given Lucene.Net.Util.AttributeSource.AttributeFactory
Declaration
public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
TextReader | reader |
Returns
Type | Description |
---|---|
Tokenizer |