Namespace Lucene.Net.Analysis.Th

Analyzer for Thai.

Classes

ThaiAnalyzer

Analyzer for Thai language. It uses ICU4N.Text.BreakIterator to break words.

You must specify the required LuceneVersion compatibility when creating ThaiAnalyzer:

As of 3.6, a set of Thai stopwords is used by default

ThaiTokenizer

Tokenizer that use ICU4N.Text.BreakIterator to tokenize Thai text.

ThaiTokenizerFactory

Factory for ThaiTokenizer.

<fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.ThaiTokenizerFactory"/>
  </analyzer>
</fieldType>

ThaiWordFilter

TokenFilter that use ICU4N.Text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.

Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.

ThaiWordFilterFactory

Factory for ThaiWordFilter.

<fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ThaiWordFilterFactory"/>
  </analyzer>
</fieldType>