Namespace Lucene.Net.Analysis.Util
Utility functions for text analysis.
Classes
AbstractAnalysisFactory
Abstract parent class for analysis factories Tokenizer
The typical lifecycle for a factory consumer is:
- Create factory via its constructor (or via XXXFactory.ForName)
- (Optional) If the factory uses resources such as files,
Inform(IResource
Loader) is called to initialize those resources. - Consumer calls create() to obtain instances.
BufferedCharFilter
LUCENENET specific class to mimic Java's BufferedReader (that is, a reader that is seekable) so it supports Mark() and Reset() (which are part of the Java Reader class), but also provide the Correct() method of BaseCharFilter.
CharacterUtils
Character
CharacterUtils.CharacterBuffer
A simple IO buffer to use with
Fill(Character
CharArrayMap
CharArrayMap<TValue>
A simple class that stores key
You must specify the required Lucene
- As of 3.1, supplementary characters are properly lowercased.
CharArrayMap<TValue>.EntryIterator
public iterator class so efficient methods are exposed to users
CharArrayMap<TValue>.EntrySet_
public EntrySet_ class so efficient methods are exposed to users
NOTE: In .NET this was renamed to EntrySet_ because it conflicted with the
method EntrySet(). Since there is also an extension method named IDictionary{K,V}.
Another difference between this set and the Java counterpart is that it implements
CharArrayMapExtensions
LUCENENET specific extension methods for CharArrayMap
CharArraySet
A simple class that stores
You must specify the required Lucene
- As of 3.1, supplementary characters are properly lowercased.
Please note: This class implements
CharArraySetExtensions
LUCENENET specific extension methods for CharArraySet
CharFilterFactory
Abstract parent class for analysis factories that create Char
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
You must specify the required Lucene
- As of 3.1, Char
Tokenizer uses an int based API to normalize and detect token codepoints. See IsToken and Normalize(Int32) for details.Char(Int32)
A new Char
As of Lucene 3.1 each Char
Note: If you use a subclass of Char
ClasspathResourceLoader
Simple IResource
ElisionFilter
Removes elisions from a Token
ElisionFilterFactory
Factory for Elision
<fieldType name="text_elsn" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ElisionFilterFactory"
articles="stopwordarticles.txt" ignoreCase="true"/>
</analyzer>
</fieldType>
FilesystemResourceLoader
Simple IResource
This loader wraps a delegate IResource
You can chain several Filesystem
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
You have to implement Accept() and return a boolean if the current
token should be preserved. Increment
As of Lucene 4.4, an
OpenStringBuilder
A StringBuilder that allows one to access the array.
RollingCharBuffer
Acts like a forever growing char[] as you read
characters into it from the provided reader, but
internally it uses a circular buffer to only hold the
characters that haven't been freed yet. This is like a
PushbackReader, except you don't have to specify
up-front the max size of the buffer, but you do have to
periodically call Free
StemmerUtil
Some commonly-used stemming functions
StopwordAnalyzerBase
Base class for Analyzers that need to make use of stopword sets.
TokenFilterFactory
Abstract parent class for analysis factories that create Token
TokenizerFactory
Abstract parent class for analysis factories that create Tokenizer instances.
WordlistLoader
Loader for text files that represent a list of stopwords.
IOUtils to obtain
Interfaces
IMultiTermAwareComponent
Add to any analysis factory component to allow returning an analysis component factory for use with partial terms in prefix queries, wildcard queries, range query endpoints, regex queries, etc.
IResourceLoader
Abstraction for loading resources (streams, files, and classes).
IResourceLoaderAware
Interface for a component that needs to be initialized by
an implementation of IResource