Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis.Th

    Classes

    ThaiAnalyzer

    Lucene.Net.Analysis.Analyzer for Thai language. It uses ICU4N.Text.BreakIterator to break words.

    You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating ThaiAnalyzer:

    • As of 3.6, a set of Thai stopwords is used by default

    ThaiTokenizer

    Tokenizer that use ICU4N.Text.BreakIterator to tokenize Thai text.

    ThaiTokenizerFactory

    Factory for ThaiTokenizer.

    <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.ThaiTokenizerFactory"/>
      </analyzer>
    </fieldType>

    ThaiWordFilter

    Lucene.Net.Analysis.TokenFilter that use ICU4N.Text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.

    Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a Lucene.Net.Analysis.Core.LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.

    ThaiWordFilterFactory

    Factory for ThaiWordFilter.

    <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ThaiWordFilterFactory"/>
      </analyzer>
    </fieldType>
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.