Show / Hide Table of Contents

    Class ICUTokenizerFactory

    Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the and typing provided by the DefaultICUTokenizerConfig.

    Inheritance
    System.Object
    AbstractAnalysisFactory
    TokenizerFactory
    ICUTokenizerFactory
    Implements
    IResourceLoaderAware
    Inherited Members
    TokenizerFactory.ForName(String, IDictionary<String, String>)
    TokenizerFactory.LookupClass(String)
    TokenizerFactory.AvailableTokenizers
    TokenizerFactory.ReloadTokenizers()
    TokenizerFactory.Create(TextReader)
    AbstractAnalysisFactory.LUCENE_MATCH_VERSION_PARAM
    AbstractAnalysisFactory.m_luceneMatchVersion
    AbstractAnalysisFactory.OriginalArgs
    AbstractAnalysisFactory.AssureMatchVersion()
    AbstractAnalysisFactory.LuceneMatchVersion
    AbstractAnalysisFactory.Require(IDictionary<String, String>, String)
    AbstractAnalysisFactory.Require(IDictionary<String, String>, String, ICollection<String>)
    AbstractAnalysisFactory.Require(IDictionary<String, String>, String, ICollection<String>, Boolean)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, String)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, ICollection<String>)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, ICollection<String>, String)
    AbstractAnalysisFactory.Get(IDictionary<String, String>, String, ICollection<String>, String, Boolean)
    AbstractAnalysisFactory.RequireInt32(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetInt32(IDictionary<String, String>, String, Int32)
    AbstractAnalysisFactory.RequireBoolean(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetBoolean(IDictionary<String, String>, String, Boolean)
    AbstractAnalysisFactory.RequireSingle(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetSingle(IDictionary<String, String>, String, Single)
    AbstractAnalysisFactory.RequireChar(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetChar(IDictionary<String, String>, String, Char)
    AbstractAnalysisFactory.GetSet(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetPattern(IDictionary<String, String>, String)
    AbstractAnalysisFactory.GetCulture(IDictionary<String, String>, String, CultureInfo)
    AbstractAnalysisFactory.GetWordSet(IResourceLoader, String, Boolean)
    AbstractAnalysisFactory.GetLines(IResourceLoader, String)
    AbstractAnalysisFactory.GetSnowballWordSet(IResourceLoader, String, Boolean)
    AbstractAnalysisFactory.SplitFileNames(String)
    AbstractAnalysisFactory.GetClassArg()
    AbstractAnalysisFactory.IsExplicitLuceneMatchVersion
    Namespace: Lucene.Net.Analysis.Icu.Segmentation
    Assembly: Lucene.Net.ICU.dll
    Syntax
    public class ICUTokenizerFactory : TokenizerFactory, IResourceLoaderAware
    Remarks

    To use the default set of per-script rules:

    <fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.ICUTokenizerFactory"/>
      </analyzer>
    </fieldType>

    You can customize this tokenizer's behavior by specifying per-script rule files, which are compiled by the ICU . See the ICU RuleBasedBreakIterator syntax reference.

    To add per-script rules, add a "rulefiles" argument, which should contain a comma-separated list of code:rulefile pairs in the following format: four-letter ISO 15924 script code, followed by a colon, then a resource path. E.g. to specify rules for Latin (script code "Latn") and Cyrillic (script code "Cyrl"):

    <fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.ICUTokenizerFactory" cjkAsWords="true"
                   rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
      </analyzer>
    </fieldType>

    Constructors

    | Improve this Doc View Source

    ICUTokenizerFactory(IDictionary<String, String>)

    Creates a new ICUTokenizerFactory.

    Declaration
    public ICUTokenizerFactory(IDictionary<string, string> args)
    Parameters
    Type Name Description
    IDictionary<System.String, System.String> args

    Methods

    | Improve this Doc View Source

    Create(AttributeSource.AttributeFactory, TextReader)

    Declaration
    public override Tokenizer Create(AttributeSource.AttributeFactory factory, TextReader input)
    Parameters
    Type Name Description
    AttributeSource.AttributeFactory factory
    TextReader input
    Returns
    Type Description
    Tokenizer
    | Improve this Doc View Source

    Inform(IResourceLoader)

    Declaration
    public virtual void Inform(IResourceLoader loader)
    Parameters
    Type Name Description
    IResourceLoader loader

    Implements

    IResourceLoaderAware
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)