Class PersianAnalyzer
Lucene.Net.Analysis.Analyzer for Persian.
This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Fa
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class PersianAnalyzer : StopwordAnalyzerBase, IDisposable
  Constructors
PersianAnalyzer(LuceneVersion)
Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
Declaration
public PersianAnalyzer(LuceneVersion matchVersion)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | 
PersianAnalyzer(LuceneVersion, CharArraySet)
Builds an analyzer with the given stop words
Declaration
public PersianAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | lucene compatibility version  | 
      
| CharArraySet | stopwords | a stopword set  | 
      
Fields
DEFAULT_STOPWORD_FILE
File containing default Persian stopwords.
Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html. The stopword list is BSD-Licensed.
Declaration
public const string DEFAULT_STOPWORD_FILE = "stopwords.txt"
  Field Value
| Type | Description | 
|---|---|
| string | 
STOPWORDS_COMMENT
The comment character in the stopwords file. All lines prefixed with this will be ignored
Declaration
public const string STOPWORDS_COMMENT = "#"
  Field Value
| Type | Description | 
|---|---|
| string | 
Properties
DefaultStopSet
Returns an unmodifiable instance of the default stop-words set.
Declaration
public static CharArraySet DefaultStopSet { get; }
  Property Value
| Type | Description | 
|---|---|
| CharArraySet | an unmodifiable instance of the default stop-words set.  | 
      
Methods
CreateComponents(string, TextReader)
Creates Lucene.Net.Analysis.TokenStreamComponents used to tokenize all the text in the provided TextReader.
Declaration
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
  Parameters
| Type | Name | Description | 
|---|---|---|
| string | fieldName | |
| TextReader | reader | 
Returns
| Type | Description | 
|---|---|
| TokenStreamComponents | Lucene.Net.Analysis.TokenStreamComponents built from a StandardTokenizer filtered with LowerCaseFilter, ArabicNormalizationFilter, PersianNormalizationFilter and Persian Stop words  | 
      
Overrides
InitReader(string, TextReader)
Wraps the TextReader with PersianCharFilter
Declaration
protected override TextReader InitReader(string fieldName, TextReader reader)
  Parameters
| Type | Name | Description | 
|---|---|---|
| string | fieldName | |
| TextReader | reader | 
Returns
| Type | Description | 
|---|---|
| TextReader |