Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Th

    Analyzer for Thai.

    Classes

    ThaiAnalyzer

    Analyzer for Thai language. It uses ICU4N.Text.BreakIterator to break words.

    You must specify the required LuceneVersion compatibility when creating ThaiAnalyzer:

    • As of 3.6, a set of Thai stopwords is used by default

    ThaiTokenizer

    Tokenizer that use ICU4N.Text.BreakIterator to tokenize Thai text.

    ThaiTokenizerFactory

    Factory for ThaiTokenizer.

    <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.ThaiTokenizerFactory"/>
      </analyzer>
    </fieldType>

    ThaiWordFilter

    TokenFilter that use ICU4N.Text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.

    Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.

    ThaiWordFilterFactory

    Factory for ThaiWordFilter.

    <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ThaiWordFilterFactory"/>
      </analyzer>
    </fieldType>
    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)