Namespace Lucene.Net.Analysis.Th
Classes
ThaiAnalyzer
Lucene.Net.Analysis.Analyzer for Thai language. It uses ICU4N.Text.BreakIterator to break words.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating ThaiAnalyzer:
- As of 3.6, a set of Thai stopwords is used by default
ThaiTokenizer
Tokenizer that use ICU4N.Text.BreakIterator to tokenize Thai text.
ThaiTokenizerFactory
Factory for ThaiTokenizer.
<fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ThaiTokenizerFactory"/>
</analyzer>
</fieldType>
ThaiWordFilter
Lucene.Net.Analysis.TokenFilter that use ICU4N.Text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.
Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a Lucene.Net.Analysis.Core.LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.
ThaiWordFilterFactory
Factory for ThaiWordFilter.
<fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ThaiWordFilterFactory"/>
</analyzer>
</fieldType>