Class StandardAnalyzer

Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required LuceneVersion compatibility when creating StandardAnalyzer:

As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
As of 2.9, StopFilter preserves position increments
As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

Inheritance

System.Object

Analyzer

StopwordAnalyzerBase

StandardAnalyzer

Implements

System.IDisposable

Inherited Members

StopwordAnalyzerBase.m_stopwords

StopwordAnalyzerBase.m_matchVersion

StopwordAnalyzerBase.StopwordSet

StopwordAnalyzerBase.LoadStopwordSet(Boolean, Type, String, String)

StopwordAnalyzerBase.LoadStopwordSet(FileInfo, LuceneVersion)

StopwordAnalyzerBase.LoadStopwordSet(TextReader, LuceneVersion)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)

Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)

Analyzer.GetTokenStream(String, TextReader)

Analyzer.GetTokenStream(String, String)

Analyzer.InitReader(String, TextReader)

Analyzer.GetPositionIncrementGap(String)

Analyzer.GetOffsetGap(String)

Analyzer.Strategy

Analyzer.Dispose()

Analyzer.Dispose(Boolean)

Analyzer.GLOBAL_REUSE_STRATEGY

Analyzer.PER_FIELD_REUSE_STRATEGY

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Lucene.Net.Analysis.Standard

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public sealed class StandardAnalyzer : StopwordAnalyzerBase, IDisposable

Constructors

| Improve this Doc View Source

StandardAnalyzer(LuceneVersion)

Builds an analyzer with the default stop words (STOP_WORDS_SET).

Declaration

public StandardAnalyzer(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene compatibility version - See StandardAnalyzer

| Improve this Doc View Source

StandardAnalyzer(LuceneVersion, CharArraySet)

Builds an analyzer with the given stop words.

Declaration

public StandardAnalyzer(LuceneVersion matchVersion, CharArraySet stopWords)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene compatibility version - See StandardAnalyzer
CharArraySet	stopWords	stop words

| Improve this Doc View Source

StandardAnalyzer(LuceneVersion, TextReader)

Builds an analyzer with the stop words from the given reader.

Declaration

public StandardAnalyzer(LuceneVersion matchVersion, TextReader stopwords)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	Lucene compatibility version - See StandardAnalyzer
System.IO.TextReader	stopwords	System.IO.TextReader to read stop words from

Fields

| Improve this Doc View Source

DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length

Declaration

public const int DEFAULT_MAX_TOKEN_LENGTH = 255

Field Value

Type	Description
System.Int32

| Improve this Doc View Source

STOP_WORDS_SET

An unmodifiable set containing some common English words that are usually not useful for searching.

Declaration

public static readonly CharArraySet STOP_WORDS_SET

Field Value

Type	Description
CharArraySet

Properties

| Improve this Doc View Source

MaxTokenLength

Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.

Declaration

public int MaxTokenLength { get; set; }

Property Value

Type	Description
System.Int32

Methods

| Improve this Doc View Source

CreateComponents(String, TextReader)

Declaration

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)

Parameters

Type	Name	Description
System.String	fieldName
System.IO.TextReader	reader

Returns

Type	Description
TokenStreamComponents

Overrides

Analyzer.CreateComponents(String, TextReader)

Implements

System.IDisposable

Class StandardAnalyzer

Inheritance

Implements

Inherited Members

Namespace: Lucene.Net.Analysis.Standard

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

Constructors

StandardAnalyzer(LuceneVersion)

Declaration

Parameters

StandardAnalyzer(LuceneVersion, CharArraySet)

Declaration

Parameters

StandardAnalyzer(LuceneVersion, TextReader)

Declaration

Parameters

See Also

Fields

DEFAULT_MAX_TOKEN_LENGTH

Declaration

Field Value

STOP_WORDS_SET

Declaration

Field Value

Properties

MaxTokenLength

Declaration

Property Value

Methods

CreateComponents(String, TextReader)

Declaration

Parameters

Returns

Overrides

Implements