Class ArabicAnalyzer
Analyzer for Arabic.
This analyzer implements light-stemming as specified by:
Light Stemming for Arabic Information Retrieval
http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf
The analysis package contains three primary components:
- ArabicNormalizationFilter: Arabic orthographic normalization.
 - ArabicStemFilter: Arabic light stemming
 - Arabic stop words file: a set of default Arabic stop words.
 
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Ar
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class ArabicAnalyzer : StopwordAnalyzerBase, IDisposable
  Constructors
| Improve this Doc View SourceArabicAnalyzer(LuceneVersion)
Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
Declaration
public ArabicAnalyzer(LuceneVersion matchVersion)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | 
ArabicAnalyzer(LuceneVersion, CharArraySet)
Builds an analyzer with the given stop words
Declaration
public ArabicAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | lucene compatibility version  | 
      
| CharArraySet | stopwords | a stopword set  | 
      
ArabicAnalyzer(LuceneVersion, CharArraySet, CharArraySet)
Builds an analyzer with the given stop word. If a none-empty stem exclusion set is provided this analyzer will add a SetKeywordMarkerFilter before ArabicStemFilter.
Declaration
public ArabicAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords, CharArraySet stemExclusionSet)
  Parameters
| Type | Name | Description | 
|---|---|---|
| LuceneVersion | matchVersion | lucene compatibility version  | 
      
| CharArraySet | stopwords | a stopword set  | 
      
| CharArraySet | stemExclusionSet | a set of terms not to be stemmed  | 
      
Fields
| Improve this Doc View SourceDEFAULT_STOPWORD_FILE
File containing default Arabic stopwords.
Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.
Declaration
public const string DEFAULT_STOPWORD_FILE = "stopwords.txt"
  Field Value
| Type | Description | 
|---|---|
| System.String | 
Properties
| Improve this Doc View SourceDefaultStopSet
Returns an unmodifiable instance of the default stop-words set.
Declaration
public static CharArraySet DefaultStopSet { get; }
  Property Value
| Type | Description | 
|---|---|
| CharArraySet | an unmodifiable instance of the default stop-words set.  | 
      
Methods
| Improve this Doc View SourceCreateComponents(String, TextReader)
Creates TokenStreamComponents used to tokenize all the text in the provided System.IO.TextReader.
Declaration
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | |
| System.IO.TextReader | reader | 
Returns
| Type | Description | 
|---|---|
| TokenStreamComponents | TokenStreamComponents built from an StandardTokenizer filtered with LowerCaseFilter, StopFilter, ArabicNormalizationFilter, SetKeywordMarkerFilter if a stem exclusion set is provided and ArabicStemFilter.  |