Class PersianAnalyzer

Lucene.Net.Analysis.Analyzer for Persian.

This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

Inheritance

object

Analyzer

StopwordAnalyzerBase

PersianAnalyzer

Implements

IDisposable

Inherited Members

StopwordAnalyzerBase.StopwordSet

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, ReuseStrategy)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, Func<string, TextReader, TextReader>)

Analyzer.NewAnonymous(Func<string, TextReader, TokenStreamComponents>, Func<string, TextReader, TextReader>, ReuseStrategy)

Analyzer.GetTokenStream(string, TextReader)

Analyzer.GetTokenStream(string, string)

Analyzer.GetPositionIncrementGap(string)

Analyzer.GetOffsetGap(string)

Analyzer.Strategy

Analyzer.Dispose()

Analyzer.GLOBAL_REUSE_STRATEGY

Analyzer.PER_FIELD_REUSE_STRATEGY

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Fa

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public sealed class PersianAnalyzer : StopwordAnalyzerBase, IDisposable

Constructors

PersianAnalyzer(LuceneVersion)

Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.

Declaration

public PersianAnalyzer(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion

PersianAnalyzer(LuceneVersion, CharArraySet)

Builds an analyzer with the given stop words

Declaration

public PersianAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	lucene compatibility version
CharArraySet	stopwords	a stopword set

Fields

DEFAULT_STOPWORD_FILE

File containing default Persian stopwords.

Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html. The stopword list is BSD-Licensed.

Declaration

public const string DEFAULT_STOPWORD_FILE = "stopwords.txt"

Field Value

Type	Description
string

STOPWORDS_COMMENT

The comment character in the stopwords file. All lines prefixed with this will be ignored

Declaration

public const string STOPWORDS_COMMENT = "#"

Field Value

Type	Description
string

Properties

DefaultStopSet

Returns an unmodifiable instance of the default stop-words set.

Declaration

public static CharArraySet DefaultStopSet { get; }

Property Value

Type	Description
CharArraySet	an unmodifiable instance of the default stop-words set.

Methods

CreateComponents(string, TextReader)

Creates Lucene.Net.Analysis.TokenStreamComponents used to tokenize all the text in the provided TextReader.

Declaration

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)

Parameters

Type	Name	Description
string	fieldName
TextReader	reader

Returns

Type	Description
TokenStreamComponents	Lucene.Net.Analysis.TokenStreamComponents built from a StandardTokenizer filtered with LowerCaseFilter, ArabicNormalizationFilter, PersianNormalizationFilter and Persian Stop words

Overrides

Analyzer.CreateComponents(string, TextReader)

InitReader(string, TextReader)

Wraps the TextReader with PersianCharFilter

Declaration

protected override TextReader InitReader(string fieldName, TextReader reader)

Parameters

Type	Name	Description
string	fieldName
TextReader	reader

Returns

Type	Description
TextReader

Overrides

Analyzer.InitReader(string, TextReader)

Implements

IDisposable