Class ArabicAnalyzer

Analyzer for Arabic.

This analyzer implements light-stemming as specified by: Light Stemming for Arabic Information Retrieval
http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf

The analysis package contains three primary components:

ArabicNormalizationFilter: Arabic orthographic normalization.
ArabicStemFilter: Arabic light stemming
Arabic stop words file: a set of default Arabic stop words.

Inheritance

System.Object

Analyzer

StopwordAnalyzerBase

ArabicAnalyzer

Inherited Members

StopwordAnalyzerBase.m_stopwords

StopwordAnalyzerBase.m_matchVersion

StopwordAnalyzerBase.StopwordSet

StopwordAnalyzerBase.LoadStopwordSet(Boolean, Type, String, String)

StopwordAnalyzerBase.LoadStopwordSet(FileInfo, LuceneVersion)

StopwordAnalyzerBase.LoadStopwordSet(TextReader, LuceneVersion)

Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >)

Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >, Lucene.Net.Analysis.ReuseStrategy)

Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >, Func<, , >)

Lucene.Net.Analysis.Analyzer.NewAnonymous(Func<, , >, Func<, , >, Lucene.Net.Analysis.ReuseStrategy)

Analyzer.GetTokenStream(String, TextReader)

Analyzer.GetTokenStream(String, String)

Analyzer.InitReader(String, TextReader)

Analyzer.GetPositionIncrementGap(String)

Analyzer.GetOffsetGap(String)

Analyzer.Strategy

Analyzer.Dispose()

Analyzer.Dispose(Boolean)

Analyzer.GLOBAL_REUSE_STRATEGY

Analyzer.PER_FIELD_REUSE_STRATEGY

Namespace: Lucene.Net.Analysis.Ar

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public sealed class ArabicAnalyzer : StopwordAnalyzerBase

Constructors

| Improve this Doc View Source

ArabicAnalyzer(LuceneVersion)

Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.

Declaration

public ArabicAnalyzer(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion

| Improve this Doc View Source

ArabicAnalyzer(LuceneVersion, CharArraySet)

Builds an analyzer with the given stop words

Declaration

public ArabicAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	lucene compatibility version
CharArraySet	stopwords	a stopword set

| Improve this Doc View Source

ArabicAnalyzer(LuceneVersion, CharArraySet, CharArraySet)

Builds an analyzer with the given stop word. If a none-empty stem exclusion set is provided this analyzer will add a SetKeywordMarkerFilter before ArabicStemFilter.

Declaration

public ArabicAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords, CharArraySet stemExclusionSet)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	lucene compatibility version
CharArraySet	stopwords	a stopword set
CharArraySet	stemExclusionSet	a set of terms not to be stemmed

Fields

| Improve this Doc View Source

DEFAULT_STOPWORD_FILE

File containing default Arabic stopwords.

Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.

Declaration

public const string DEFAULT_STOPWORD_FILE = null

Field Value

Type	Description
System.String

Properties

| Improve this Doc View Source

DefaultStopSet

Returns an unmodifiable instance of the default stop-words set.

Declaration

public static CharArraySet DefaultStopSet { get; }

Property Value

Type	Description
CharArraySet	an unmodifiable instance of the default stop-words set.

Methods

| Improve this Doc View Source

CreateComponents(String, TextReader)

Creates TokenStreamComponents used to tokenize all the text in the provided .

Declaration

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)

Parameters

Type	Name	Description
System.String	fieldName
TextReader	reader

Returns

Type	Description
TokenStreamComponents	TokenStreamComponents built from an StandardTokenizer filtered with LowerCaseFilter, StopFilter, ArabicNormalizationFilter, SetKeywordMarkerFilter if a stem exclusion set is provided and ArabicStemFilter.