Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Th

    Analyzer for Thai.

    Classes

    ThaiAnalyzer

    Analyzer for Thai language. It uses to break words.

    You must specify the required LuceneVersion compatibility when creating ThaiAnalyzer:

    • As of 3.6, a set of Thai stopwords is used by default

    ThaiTokenizer

    Tokenizer that use to tokenize Thai text.

    ThaiTokenizerFactory

    Factory for ThaiTokenizer.

    <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.ThaiTokenizerFactory"/>
      </analyzer>
    </fieldType>

    ThaiWordFilter

    TokenFilter that use to break each Token that is Thai into separate Token(s) for each Thai word.

    Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.

    ThaiWordFilterFactory

    Factory for ThaiWordFilter.

    <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ThaiWordFilterFactory"/>
      </analyzer>
    </fieldType>
    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)