Class Analyzer
An Analyzer builds Token
In order to define what analysis is done, subclasses must define their
Token
Simple example:
Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
});
For more examples, see the Lucene.
For some concrete implementations bundled with Lucene, look in the analysis modules:
- Common: Analyzers for indexing content in different languages and domains.
- ICU: Exposes functionality from ICU to Apache Lucene.
- Kuromoji: Morphological analyzer for Japanese text.
- Morfologik: Dictionary-driven lemmatization for the Polish language.
- Phonetic: Analysis for indexing phonetic signatures (for sounds-alike search).
- Smart Chinese: Analyzer for Simplified Chinese, which indexes words.
- Stempel: Algorithmic Stemmer for the Polish Language.
- UIMA: Analysis integration with Apache UIMA.
Namespace: Lucene.Net.Analysis
Assembly: Lucene.Net.dll
Syntax
public abstract class Analyzer : IDisposable
Constructors
| Improve this Doc View SourceAnalyzer()
Create a new Analyzer, reusing the same set of components per-thread
across calls to Get
Declaration
public Analyzer()
Analyzer(ReuseStrategy)
Expert: create a new Analyzer with a custom Reuse
NOTE: if you just want to reuse on a per-field basis, its easier to
use a subclass of AnalyzerLucene.Net.Analysis.Common.Miscellaneous.PerFieldAnalyzerWrapper
instead.
Declaration
public Analyzer(ReuseStrategy reuseStrategy)
Parameters
Type | Name | Description |
---|---|---|
Reuse |
reuseStrategy |
Fields
| Improve this Doc View SourceGLOBAL_REUSE_STRATEGY
A predefined Reuse
Declaration
public static readonly ReuseStrategy GLOBAL_REUSE_STRATEGY
Field Value
Type | Description |
---|---|
Reuse |
PER_FIELD_REUSE_STRATEGY
A predefined Reuse
Declaration
public static readonly ReuseStrategy PER_FIELD_REUSE_STRATEGY
Field Value
Type | Description |
---|---|
Reuse |
Properties
| Improve this Doc View SourceStrategy
Returns the used Reuse
Declaration
public ReuseStrategy Strategy { get; }
Property Value
Type | Description |
---|---|
Reuse |
Methods
| Improve this Doc View SourceCreateComponents(String, TextReader)
Creates a new Token
Declaration
protected abstract TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System. |
fieldName | the name of the fields content passed to the
Token |
Text |
reader | the reader passed to the Tokenizer constructor |
Returns
Type | Description |
---|---|
Token |
the Token |
Dispose()
Frees persistent resources used by this Analyzer
Declaration
public void Dispose()
Dispose(Boolean)
Frees persistent resources used by this Analyzer
Declaration
protected virtual void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
System. |
disposing |
GetOffsetGap(String)
Just like Get
Declaration
public virtual int GetOffsetGap(string fieldName)
Parameters
Type | Name | Description |
---|---|---|
System. |
fieldName | the field just indexed |
Returns
Type | Description |
---|---|
System. |
offset gap, added to the next token emitted from Get |
GetPositionIncrementGap(String)
Invoked before indexing a IIndexable
Declaration
public virtual int GetPositionIncrementGap(string fieldName)
Parameters
Type | Name | Description |
---|---|---|
System. |
fieldName | IIndexable |
Returns
Type | Description |
---|---|
System. |
position increment gap, added to the next token emitted from Get |
GetTokenStream(String, String)
Returns a TokenfieldName
, tokenizing
the contents of text
.
This method uses Create
NOTE: After calling this method, the consumer must follow the
workflow described in Token
Declaration
public TokenStream GetTokenStream(string fieldName, string text)
Parameters
Type | Name | Description |
---|---|---|
System. |
fieldName | the name of the field the created Token |
System. |
text | the |
Returns
Type | Description |
---|---|
Token |
Token |
See Also
| Improve this Doc View SourceGetTokenStream(String, TextReader)
Returns a TokenfieldName
, tokenizing
the contents of text
.
This method uses Create
NOTE: After calling this method, the consumer must follow the
workflow described in Token
Declaration
public TokenStream GetTokenStream(string fieldName, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System. |
fieldName | the name of the field the created Token |
Text |
reader | the reader the streams source reads from |
Returns
Type | Description |
---|---|
Token |
Token |
See Also
| Improve this Doc View SourceInitReader(String, TextReader)
Override this if you want to add a Char
The default implementation returns reader
unchanged.
Declaration
protected virtual TextReader InitReader(string fieldName, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System. |
fieldName | IIndexable |
Text |
reader | original |
Returns
Type | Description |
---|---|
Text |
reader, optionally decorated with Char |
NewAnonymous(Func<String, TextReader, TokenStreamComponents>)
Creates a new instance with the ability to specify the body of the CreatecreateComponents
parameter.
Simple example:
var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
});
LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents)
Parameters
Type | Name | Description |
---|---|---|
Func<System. |
createComponents | A delegate method that represents (is called by) the Create |
Returns
Type | Description |
---|---|
Analyzer | A new Lucene. |
NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>)
Creates a new instance with the ability to specify the body of the CreatecreateComponents
parameter and the body of the InitinitReader
parameter.
Simple example:
var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
}, initReader: (fieldName, reader) =>
{
return new HTMLStripCharFilter(reader);
});
LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> initReader)
Parameters
Type | Name | Description |
---|---|---|
Func<System. |
createComponents | A delegate method that represents (is called by) the Create |
Func<System. |
initReader | A delegate method that represents (is called by) the Init |
Returns
Type | Description |
---|---|
Analyzer | A new Lucene. |
NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy)
Creates a new instance with the ability to specify the body of the CreatecreateComponents
parameter, the body of the InitinitReader
parameter, and allows the use of a Reuse
var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
}, initReader: (fieldName, reader) =>
{
return new HTMLStripCharFilter(reader);
}, reuseStrategy);
LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> initReader, ReuseStrategy reuseStrategy)
Parameters
Type | Name | Description |
---|---|---|
Func<System. |
createComponents | A delegate method that represents (is called by) the Create |
Func<System. |
initReader | A delegate method that represents (is called by) the Init |
Reuse |
reuseStrategy | A custom Reuse |
Returns
Type | Description |
---|---|
Analyzer | A new Lucene. |
NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy)
Creates a new instance with the ability to specify the body of the CreatecreateComponents
parameter and allows the use of a Reuse
var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
}, reuseStrategy);
LUCENENET specific
Declaration
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy)
Parameters
Type | Name | Description |
---|---|---|
Func<System. |
createComponents | An delegate method that represents (is called by) the Create |
Reuse |
reuseStrategy | A custom Reuse |
Returns
Type | Description |
---|---|
Analyzer | A new Lucene. |