Class ICUTokenizerFactory
Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the ICU4N.Text.BreakIterator and typing provided by the DefaultICUTokenizerConfig.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public class ICUTokenizerFactory : TokenizerFactory, IResourceLoaderAware
Remarks
To use the default set of per-script rules:
<fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory"/>
</analyzer>
</fieldType>
You can customize this tokenizer's behavior by specifying per-script rule files, which are compiled by the ICU ICU4N.Text.RuleBasedBreakIterator. See the ICU RuleBasedBreakIterator syntax reference.
To add per-script rules, add a "rulefiles" argument, which should contain a
comma-separated list of code:rulefile
pairs in the following format:
four-letter ISO 15924 script code, followed by a colon, then a resource
path. E.g. to specify rules for Latin (script code "Latn") and Cyrillic
(script code "Cyrl"):
<fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" cjkAsWords="true"
rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
</analyzer>
</fieldType>
Constructors
| Improve this Doc View SourceICUTokenizerFactory(IDictionary<String, String>)
Creates a new ICUTokenizerFactory.
Declaration
public ICUTokenizerFactory(IDictionary<string, string> args)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IDictionary<System.String, System.String> | args |
Methods
| Improve this Doc View SourceCreate(AttributeSource.AttributeFactory, TextReader)
Declaration
public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)
Parameters
Type | Name | Description |
---|---|---|
AttributeSource.AttributeFactory | factory | |
System.IO.TextReader | input |
Returns
Type | Description |
---|---|
Tokenizer |
Overrides
| Improve this Doc View SourceInform(IResourceLoader)
Declaration
public virtual void Inform(IResourceLoader loader)
Parameters
Type | Name | Description |
---|---|---|
IResourceLoader | loader |