Class ICUTokenizerFactory
Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the ICU4N.Text.BreakIterator and typing provided by the DefaultICUTokenizerConfig.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Icu.Segmentation
Assembly: Lucene.Net.ICU.dll
Syntax
public class ICUTokenizerFactory : TokenizerFactory, IResourceLoaderAware
  Remarks
To use the default set of per-script rules:
<fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.ICUTokenizerFactory"/>
  </analyzer>
</fieldType>
You can customize this tokenizer's behavior by specifying per-script rule files, which are compiled by the ICU ICU4N.Text.RuleBasedBreakIterator. See the ICU RuleBasedBreakIterator syntax reference.
To add per-script rules, add a "rulefiles" argument, which should contain a
comma-separated list of code:rulefile pairs in the following format:
four-letter ISO 15924 script code, followed by a colon, then a resource
path.  E.g. to specify rules for Latin (script code "Latn") and Cyrillic
(script code "Cyrl"):
<fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.ICUTokenizerFactory" cjkAsWords="true"
               rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
  </analyzer>
</fieldType>
Constructors
| Improve this Doc View SourceICUTokenizerFactory(IDictionary<String, String>)
Creates a new ICUTokenizerFactory.
Declaration
public ICUTokenizerFactory(IDictionary<string, string> args)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.Collections.Generic.IDictionary<System.String, System.String> | args | 
Methods
| Improve this Doc View SourceCreate(AttributeSource.AttributeFactory, TextReader)
Declaration
public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)
  Parameters
| Type | Name | Description | 
|---|---|---|
| AttributeSource.AttributeFactory | factory | |
| System.IO.TextReader | input | 
Returns
| Type | Description | 
|---|---|
| Tokenizer | 
Overrides
| Improve this Doc View SourceInform(IResourceLoader)
Declaration
public virtual void Inform(IResourceLoader loader)
  Parameters
| Type | Name | Description | 
|---|---|---|
| IResourceLoader | loader |