Class HindiAnalyzer
Analyzer for Hindi.
You must specify the required LuceneVersion compatibility when creating HindiAnalyzer:
- As of 3.6, StandardTokenizer is used for tokenization
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Hi
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class HindiAnalyzer : StopwordAnalyzerBase, IDisposable
Constructors
| Improve this Doc View SourceHindiAnalyzer(LuceneVersion)
Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
Declaration
public HindiAnalyzer(LuceneVersion version)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version |
HindiAnalyzer(LuceneVersion, CharArraySet)
Builds an analyzer with the given stop words
Declaration
public HindiAnalyzer(LuceneVersion version, CharArraySet stopwords)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version | lucene compatibility version |
CharArraySet | stopwords | a stopword set |
HindiAnalyzer(LuceneVersion, CharArraySet, CharArraySet)
Builds an analyzer with the given stop words
Declaration
public HindiAnalyzer(LuceneVersion version, CharArraySet stopwords, CharArraySet stemExclusionSet)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | version | lucene compatibility version |
CharArraySet | stopwords | a stopword set |
CharArraySet | stemExclusionSet | a stemming exclusion set |
Fields
| Improve this Doc View SourceDEFAULT_STOPWORD_FILE
File containing default Hindi stopwords.
Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.
Declaration
public const string DEFAULT_STOPWORD_FILE = "stopwords.txt"
Field Value
Type | Description |
---|---|
System.String |
Properties
| Improve this Doc View SourceDefaultStopSet
Returns an unmodifiable instance of the default stop-words set.
Declaration
public static CharArraySet DefaultStopSet { get; }
Property Value
Type | Description |
---|---|
CharArraySet | an unmodifiable instance of the default stop-words set. |
Methods
| Improve this Doc View SourceCreateComponents(String, TextReader)
Creates TokenStreamComponents used to tokenize all the text in the provided System.IO.TextReader.
Declaration
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System.String | fieldName | |
System.IO.TextReader | reader |
Returns
Type | Description |
---|---|
TokenStreamComponents | TokenStreamComponents built from a StandardTokenizer filtered with LowerCaseFilter, IndicNormalizationFilter, HindiNormalizationFilter, SetKeywordMarkerFilter if a stem exclusion set is provided, HindiStemFilter, and Hindi Stop words |