Class Analyzer
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.
In order to define what analysis is done, subclasses must define their TokenStreamComponents in CreateComponents(String, TextReader). The components are then reused in each call to GetTokenStream(String, TextReader).
Simple example:
Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
{
    Tokenizer source = new FooTokenizer(reader);
    TokenStream filter = new FooFilter(source);
    filter = new BarFilter(filter);
    return new TokenStreamComponents(source, filter);
});For more examples, see the Lucene.Net.Analysis namespace documentation.
For some concrete implementations bundled with Lucene, look in the analysis modules:
- Common: Analyzers for indexing content in different languages and domains.
- ICU: Exposes functionality from ICU to Apache Lucene.
- Kuromoji: Morphological analyzer for Japanese text.
- Morfologik: Dictionary-driven lemmatization for the Polish language.
- Phonetic: Analysis for indexing phonetic signatures (for sounds-alike search).
- Smart Chinese: Analyzer for Simplified Chinese, which indexes words.
- Stempel: Algorithmic Stemmer for the Polish Language.
- UIMA: Analysis integration with Apache UIMA.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis
Assembly: Lucene.Net.dll
Syntax
public abstract class Analyzer : IDisposableConstructors
| Improve this Doc View SourceAnalyzer()
Create a new Analyzer, reusing the same set of components per-thread across calls to GetTokenStream(String, TextReader).
Declaration
public Analyzer()Analyzer(ReuseStrategy)
Expert: create a new Analyzer with a custom ReuseStrategy.
NOTE: if you just want to reuse on a per-field basis, its easier to
use a subclass of AnalyzerWrapper such as
Lucene.Net.Analysis.Common.Miscellaneous.PerFieldAnalyzerWrapper
instead.
Declaration
public Analyzer(ReuseStrategy reuseStrategy)Parameters
| Type | Name | Description | 
|---|---|---|
| ReuseStrategy | reuseStrategy | 
Fields
| Improve this Doc View SourceGLOBAL_REUSE_STRATEGY
A predefined ReuseStrategy that reuses the same components for every field.
Declaration
public static readonly ReuseStrategy GLOBAL_REUSE_STRATEGYField Value
| Type | Description | 
|---|---|
| ReuseStrategy | 
PER_FIELD_REUSE_STRATEGY
A predefined ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponents per field name.
Declaration
public static readonly ReuseStrategy PER_FIELD_REUSE_STRATEGYField Value
| Type | Description | 
|---|---|
| ReuseStrategy | 
Properties
| Improve this Doc View SourceStrategy
Returns the used ReuseStrategy.
Declaration
public ReuseStrategy Strategy { get; }Property Value
| Type | Description | 
|---|---|
| ReuseStrategy | 
Methods
| Improve this Doc View SourceCreateComponents(String, TextReader)
Creates a new TokenStreamComponents instance for this analyzer.
Declaration
protected abstract TokenStreamComponents CreateComponents(string fieldName, TextReader reader)Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | the name of the fields content passed to the TokenStreamComponents sink as a reader | 
| System.IO.TextReader | reader | the reader passed to the Tokenizer constructor | 
Returns
| Type | Description | 
|---|---|
| TokenStreamComponents | the TokenStreamComponents for this analyzer. | 
Dispose()
Frees persistent resources used by this Analyzer
Declaration
public void Dispose()Dispose(Boolean)
Frees persistent resources used by this Analyzer
Declaration
protected virtual void Dispose(bool disposing)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Boolean | disposing | 
GetOffsetGap(String)
Just like GetPositionIncrementGap(String), except for Token offsets instead. By default this returns 1. this method is only called if the field produced at least one token for indexing.
Declaration
public virtual int GetOffsetGap(string fieldName)Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | the field just indexed | 
Returns
| Type | Description | 
|---|---|
| System.Int32 | offset gap, added to the next token emitted from GetTokenStream(String, TextReader).
       this value must be  | 
GetPositionIncrementGap(String)
Invoked before indexing a IIndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IIndexableField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IIndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IIndexableField instance boundaries.
Declaration
public virtual int GetPositionIncrementGap(string fieldName)Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | IIndexableField name being indexed. | 
Returns
| Type | Description | 
|---|---|
| System.Int32 | position increment gap, added to the next token emitted from GetTokenStream(String, TextReader).
       this value must be  | 
GetTokenStream(String, TextReader)
Returns a TokenStream suitable for fieldName, tokenizing
the contents of text.
This method uses CreateComponents(String, TextReader) to obtain an instance of TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through SetReader(TextReader).
NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Lucene.Net.Analysis namespace documentation for some examples demonstrating this.
Declaration
public TokenStream GetTokenStream(string fieldName, TextReader reader)Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | the name of the field the created TokenStream is used for | 
| System.IO.TextReader | reader | the reader the streams source reads from | 
Returns
| Type | Description | 
|---|---|
| TokenStream | TokenStream for iterating the analyzed content of System.IO.TextReader | 
Exceptions
| Type | Condition | 
|---|---|
| System.ObjectDisposedException | if the Analyzer is disposed. | 
| System.IO.IOException | if an i/o error occurs (may rarely happen for strings). | 
See Also
| Improve this Doc View SourceGetTokenStream(String, String)
Returns a TokenStream suitable for fieldName, tokenizing
the contents of text.
This method uses CreateComponents(String, TextReader) to obtain an instance of TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through SetReader(TextReader).
NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Lucene.Net.Analysis namespace documentation for some examples demonstrating this.
Declaration
public TokenStream GetTokenStream(string fieldName, string text)Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | the name of the field the created TokenStream is used for | 
| System.String | text | the System.String the streams source reads from | 
Returns
| Type | Description | 
|---|---|
| TokenStream | TokenStream for iterating the analyzed content of  | 
Exceptions
| Type | Condition | 
|---|---|
| System.ObjectDisposedException | if the Analyzer is disposed. | 
| System.IO.IOException | if an i/o error occurs (may rarely happen for strings). | 
See Also
| Improve this Doc View SourceInitReader(String, TextReader)
Override this if you want to add a CharFilter chain.
The default implementation returns reader
unchanged.
Declaration
protected virtual TextReader InitReader(string fieldName, TextReader reader)Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | fieldName | IIndexableField name being indexed | 
| System.IO.TextReader | reader | original System.IO.TextReader | 
Returns
| Type | Description | 
|---|---|
| System.IO.TextReader | reader, optionally decorated with CharFilter(s) | 
NewAnonymous(Func<String, TextReader, TokenStreamComponents>)
Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader)
method through the createComponents parameter.
Simple example: 
    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    });LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Func<System.String, System.IO.TextReader, TokenStreamComponents> | createComponents | A delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer. | 
Returns
| Type | Description | 
|---|---|
| Analyzer | A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance. | 
NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)
Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader)
method through the createComponents parameter and allows the use of a ReuseStrategy.
Simple example: 
    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    }, reuseStrategy);LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Func<System.String, System.IO.TextReader, TokenStreamComponents> | createComponents | An delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer. | 
| ReuseStrategy | reuseStrategy | A custom ReuseStrategy instance. | 
Returns
| Type | Description | 
|---|---|
| Analyzer | A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance. | 
NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)
Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader)
method through the createComponents parameter and the body of the InitReader(String, TextReader)
method through the initReader parameter.
Simple example: 
    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    }, initReader: (fieldName, reader) => 
    {
        return new HTMLStripCharFilter(reader);
    });LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> initReader)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Func<System.String, System.IO.TextReader, TokenStreamComponents> | createComponents | A delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer. | 
| System.Func<System.String, System.IO.TextReader, System.IO.TextReader> | initReader | A delegate method that represents (is called by) the InitReader(String, TextReader)
method. It accepts a System.String fieldName and a System.IO.TextReader reader and 
returns the System.IO.TextReader that can be modified or wrapped by the  | 
Returns
| Type | Description | 
|---|---|
| Analyzer | A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance. | 
NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)
Creates a new instance with the ability to specify the body of the CreateComponents(String, TextReader)
method through the createComponents parameter, the body of the InitReader(String, TextReader)
method through the initReader parameter, and allows the use of a ReuseStrategy.
Simple example: 
    var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
    {
        Tokenizer source = new FooTokenizer(reader);
        TokenStream filter = new FooFilter(source);
        filter = new BarFilter(filter);
        return new TokenStreamComponents(source, filter);
    }, initReader: (fieldName, reader) => 
    {
        return new HTMLStripCharFilter(reader);
    }, reuseStrategy);LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> initReader, ReuseStrategy reuseStrategy)Parameters
| Type | Name | Description | 
|---|---|---|
| System.Func<System.String, System.IO.TextReader, TokenStreamComponents> | createComponents | A delegate method that represents (is called by) the CreateComponents(String, TextReader) method. It accepts a System.String fieldName and a System.IO.TextReader reader and returns the TokenStreamComponents for this analyzer. | 
| System.Func<System.String, System.IO.TextReader, System.IO.TextReader> | initReader | A delegate method that represents (is called by) the InitReader(String, TextReader)
method. It accepts a System.String fieldName and a System.IO.TextReader reader and 
returns the System.IO.TextReader that can be modified or wrapped by the  | 
| ReuseStrategy | reuseStrategy | A custom ReuseStrategy instance. | 
Returns
| Type | Description | 
|---|---|
| Analyzer | A new Lucene.Net.Analysis.Analyzer.AnonymousAnalyzer instance. |