Class Analyzer

An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.

In order to define what analysis is done, subclasses must define their TokenStreamComponents in CreateComponents(String, TextReader). The components are then reused in each call to GetTokenStream(String, TextReader).

Simple example:

Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
{
    Tokenizer source = new FooTokenizer(reader);
    TokenStream filter = new FooFilter(source);
    filter = new BarFilter(filter);
    return new TokenStreamComponents(source, filter);
});

For more examples, see the Lucene.Net.Analysis namespace documentation.

For some concrete implementations bundled with Lucene, look in the analysis modules:

Common: Analyzers for indexing content in different languages and domains.
ICU: Exposes functionality from ICU to Apache Lucene.
Kuromoji: Morphological analyzer for Japanese text.
Morfologik: Dictionary-driven lemmatization for the Polish language.
OpenNLP: Analysis integration with Apache OpenNLP.
Phonetic: Analysis for indexing phonetic signatures (for sounds-alike search).
Smart Chinese: Analyzer for Simplified Chinese, which indexes words.
Stempel: Algorithmic Stemmer for the Polish Language.

Inheritance

System.Object

Analyzer

AnalyzerWrapper

Implements

System.IDisposable

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Lucene.Net.Analysis

Assembly: Lucene.Net.dll

Syntax

public abstract class Analyzer : IDisposable

Constructors

| Improve this Doc View Source

Analyzer()

Create a new Analyzer, reusing the same set of components per-thread across calls to GetTokenStream(String, TextReader).

Declaration

protected Analyzer()

| Improve this Doc View Source

Analyzer(ReuseStrategy)

Expert: create a new Analyzer with a custom ReuseStrategy.

NOTE: if you just want to reuse on a per-field basis, its easier to use a subclass of AnalyzerWrapper such as Lucene.Net.Analysis.Common.Miscellaneous.PerFieldAnalyzerWrapper instead.

Declaration

protected Analyzer(ReuseStrategy reuseStrategy)

Parameters

Type	Name	Description
ReuseStrategy	reuseStrategy

Fields

| Improve this Doc View Source

GLOBAL_REUSE_STRATEGY

A predefined ReuseStrategy that reuses the same components for every field.

Declaration

public static readonly ReuseStrategy GLOBAL_REUSE_STRATEGY

Field Value

Type	Description
ReuseStrategy

| Improve this Doc View Source

PER_FIELD_REUSE_STRATEGY

A predefined ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponents per field name.

Declaration

public static readonly ReuseStrategy PER_FIELD_REUSE_STRATEGY

Field Value

Type	Description
ReuseStrategy

Properties

| Improve this Doc View Source

Strategy

Returns the used ReuseStrategy.

Declaration

public ReuseStrategy Strategy { get; }

Property Value

Type	Description
ReuseStrategy

Methods

| Improve this Doc View Source

CreateComponents(String, TextReader)

Creates a new TokenStreamComponents instance for this analyzer.

Declaration

protected abstract TokenStreamComponents CreateComponents(string fieldName, TextReader reader)

Parameters

Type	Name	Description
System.String	fieldName	the name of the fields content passed to the TokenStreamComponents sink as a reader
System.IO.TextReader	reader	the reader passed to the Tokenizer constructor

Returns

Type	Description
TokenStreamComponents	the TokenStreamComponents for this analyzer.

| Improve this Doc View Source

Dispose()

Frees persistent resources used by this Analyzer

Declaration

public void Dispose()

| Improve this Doc View Source

Dispose(Boolean)

Frees persistent resources used by this Analyzer

Declaration

protected virtual void Dispose(bool disposing)

Parameters

Type	Name	Description
System.Boolean	disposing

| Improve this Doc View Source

GetOffsetGap(String)

Just like GetPositionIncrementGap(String), except for Token offsets instead. By default this returns 1. this method is only called if the field produced at least one token for indexing.

Declaration

public virtual int GetOffsetGap(string fieldName)

Parameters

Type	Name	Description
System.String	fieldName	the field just indexed

Returns

Type	Description
System.Int32	offset gap, added to the next token emitted from GetTokenStream(String, TextReader). this value must be `>= 0`.

| Improve this Doc View Source

GetPositionIncrementGap(String)

Invoked before indexing a IIndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IIndexableField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IIndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IIndexableField instance boundaries.

Declaration

public virtual int GetPositionIncrementGap(string fieldName)

Parameters

Type	Name	Description
System.String	fieldName	IIndexableField name being indexed.

Returns

Type	Description
System.Int32	position increment gap, added to the next token emitted from GetTokenStream(String, TextReader). this value must be `>= 0`.

| Improve this Doc View Source

GetTokenStream(String, TextReader)

Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

This method uses CreateComponents(String, TextReader) to obtain an instance of TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through SetReader(TextReader).

NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Lucene.Net.Analysis namespace documentation for some examples demonstrating this.

Declaration

public TokenStream GetTokenStream(string fieldName, TextReader reader)

Parameters

Type	Name	Description
System.String	fieldName	the name of the field the created TokenStream is used for
System.IO.TextReader	reader	the reader the streams source reads from

Returns

Type	Description
TokenStream	TokenStream for iterating the analyzed content of System.IO.TextReader

Exceptions

Type	Condition
System.ObjectDisposedException	if the Analyzer is disposed.
System.IO.IOException	if an i/o error occurs (may rarely happen for strings).

GetTokenStream(String, String)

Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

Declaration

public TokenStream GetTokenStream(string fieldName, string text)

Parameters

Type	Name	Description
System.String	fieldName	the name of the field the created TokenStream is used for
System.String	text	the System.String the streams source reads from

Returns

Type	Description
TokenStream	TokenStream for iterating the analyzed content of `reader`

Exceptions

Type	Condition
System.ObjectDisposedException	if the Analyzer is disposed.
System.IO.IOException	if an i/o error occurs (may rarely happen for strings).

InitReader(String, TextReader)

Override this if you want to add a CharFilter chain.

The default implementation returns reader unchanged.

Declaration

protected virtual TextReader InitReader(string fieldName, TextReader reader)

Parameters

Type	Name	Description
System.String	fieldName	IIndexableField name being indexed
System.IO.TextReader	reader	original System.IO.TextReader

Returns

Type	Description
System.IO.TextReader	reader, optionally decorated with CharFilter(s)

| Improve this Doc View Source

NewAnonymous(Func<String, TextReader, TokenStreamComponents>)

Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader) method through the createComponents parameter. Simple example:

    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    });

LUCENENET specific

Declaration

public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents)

Parameters

Type	Name	Description
System.Func<System.String, System.IO.TextReader, TokenStreamComponents>	createComponents	A delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer.

Returns

Type	Description
Analyzer	A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance.

| Improve this Doc View Source

NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)

Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader) method through the createComponents parameter and allows the use of a ReuseStrategy. Simple example:

    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    }, reuseStrategy);

LUCENENET specific

Declaration

public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy)

Parameters

Type	Name	Description
System.Func<System.String, System.IO.TextReader, TokenStreamComponents>	createComponents	An delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer.
ReuseStrategy	reuseStrategy	A custom ReuseStrategy instance.

Returns

Type	Description
Analyzer	A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance.

| Improve this Doc View Source

NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)

Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader) method through the createComponents parameter and the body of the InitReader(String, TextReader) method through the initReader parameter. Simple example:

    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    }, initReader: (fieldName, reader) => 
    {
        return new HTMLStripCharFilter(reader);
    });

LUCENENET specific

Declaration

public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> initReader)

Parameters

Type	Name	Description
System.Func<System.String, System.IO.TextReader, TokenStreamComponents>	createComponents	A delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer.
System.Func<System.String, System.IO.TextReader, System.IO.TextReader>	initReader	A delegate method that represents (is called by) the InitReader(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the System.IO.TextReader that can be modified or wrapped by the `initReader` method.

Returns

Type	Description
Analyzer	A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance.

| Improve this Doc View Source

NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)

Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader) method through the createComponents parameter, the body of the InitReader(String, TextReader) method through the initReader parameter, and allows the use of a ReuseStrategy. Simple example:

    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    }, initReader: (fieldName, reader) => 
    {
        return new HTMLStripCharFilter(reader);
    }, reuseStrategy);

LUCENENET specific

Declaration

public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> initReader, ReuseStrategy reuseStrategy)

Parameters

Type	Name	Description
System.Func<System.String, System.IO.TextReader, TokenStreamComponents>	createComponents	A delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer.
System.Func<System.String, System.IO.TextReader, System.IO.TextReader>	initReader	A delegate method that represents (is called by) the InitReader(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the System.IO.TextReader that can be modified or wrapped by the `initReader` method.
ReuseStrategy	reuseStrategy	A custom ReuseStrategy instance.

Returns

Type	Description
Analyzer	A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance.

Implements

System.IDisposable

Class Analyzer

Inheritance

Implements

Inherited Members

Namespace: Lucene.Net.Analysis

Assembly: Lucene.Net.dll

Syntax

Constructors

Analyzer()

Declaration

Analyzer(ReuseStrategy)

Declaration

Parameters

Fields

GLOBAL_REUSE_STRATEGY

Declaration

Field Value

PER_FIELD_REUSE_STRATEGY

Declaration

Field Value

Properties

Strategy

Declaration

Property Value

Methods

CreateComponents(String, TextReader)

Declaration

Parameters

Returns

Dispose()

Declaration

Dispose(Boolean)

Declaration

Parameters

GetOffsetGap(String)

Declaration

Parameters

Returns

GetPositionIncrementGap(String)

Declaration

Parameters

Returns

GetTokenStream(String, TextReader)

Declaration

Parameters

Returns

Exceptions

See Also

GetTokenStream(String, String)

Declaration

Parameters

Returns

Exceptions

See Also

InitReader(String, TextReader)

Declaration

Parameters

Returns

NewAnonymous(Func<String, TextReader, TokenStreamComponents>)

Declaration

Parameters

Returns

NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)

Declaration

Parameters

Returns

NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)

Declaration

Parameters

Returns

NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)

Declaration

Parameters

Returns

Implements