Class StandardAnalyzer
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
You must specify the required Lucene.Net.Util.LuceneVersion compatibility when creating StandardAnalyzer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Lucene.Net.Analysis.Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Standard
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class StandardAnalyzer : StopwordAnalyzerBase, IDisposable
Constructors
StandardAnalyzer(LuceneVersion)
Builds an analyzer with the default stop words (STOP_WORDS_SET).
Declaration
public StandardAnalyzer(LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Lucene compatibility version - See StandardAnalyzer |
StandardAnalyzer(LuceneVersion, CharArraySet)
Builds an analyzer with the given stop words.
Declaration
public StandardAnalyzer(LuceneVersion matchVersion, CharArraySet stopWords)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Lucene compatibility version - See StandardAnalyzer |
CharArraySet | stopWords | stop words |
StandardAnalyzer(LuceneVersion, TextReader)
Builds an analyzer with the stop words from the given reader.
Declaration
public StandardAnalyzer(LuceneVersion matchVersion, TextReader stopwords)
Parameters
Type | Name | Description |
---|---|---|
LuceneVersion | matchVersion | Lucene compatibility version - See StandardAnalyzer |
TextReader | stopwords | TextReader to read stop words from |
See Also
Fields
DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length
Declaration
public const int DEFAULT_MAX_TOKEN_LENGTH = 255
Field Value
Type | Description |
---|---|
int |
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
Declaration
public static readonly CharArraySet STOP_WORDS_SET
Field Value
Type | Description |
---|---|
CharArraySet |
Properties
MaxTokenLength
Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
Declaration
public int MaxTokenLength { get; set; }
Property Value
Type | Description |
---|---|
int |
Methods
CreateComponents(string, TextReader)
Creates a new Lucene.Net.Analysis.TokenStreamComponents instance for this analyzer.
Declaration
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
string | fieldName | the name of the fields content passed to the Lucene.Net.Analysis.TokenStreamComponents sink as a reader |
TextReader | reader | the reader passed to the Lucene.Net.Analysis.Tokenizer constructor |
Returns
Type | Description |
---|---|
TokenStreamComponents | the Lucene.Net.Analysis.TokenStreamComponents for this analyzer. |