Namespace Lucene.Net.Analysis

Classes

BaseTokenStreamTestCase

Base class for all Lucene unit tests that use Lucene.Net.Analysis.TokenStreams.

When writing unit tests for analysis components, its highly recommended to use the helper methods here (especially in conjunction with MockAnalyzer or MockTokenizer), as they contain many assertions and checks to catch bugs.

BinaryTermAttribute

Implementation for IBinaryTermAttribute.

BinaryToken

Represents a binary token.

CannedBinaryTokenStream

Lucene.Net.Analysis.TokenStream from a canned list of binary (Lucene.Net.Util.BytesRef-based) tokens.

CannedTokenStream

Lucene.Net.Analysis.TokenStream from a canned list of Lucene.Net.Analysis.Tokens.

CheckClearAttributesAttribute

Attribute that records if it was cleared or not. this is used for testing that Lucene.Net.Util.AttributeSource.ClearAttributes() was called correctly.

CollationTestBase

Base test class for testing Unicode collation.

LookaheadTokenFilter

LUCENENET specific abstraction so we can reference LookaheadTokenFilter.Position without specifying a generic closing type.

LookaheadTokenFilter.Position

Holds all state for a single position; subclass this to record other state at each position.

LookaheadTokenFilter<T>

An abstract Lucene.Net.Analysis.TokenFilter to make it easier to build graph token filters requiring some lookahead. This class handles the details of buffering up tokens, recording them by position, restoring them, providing access to them, etc.

MockAnalyzer

Analyzer for testing.

This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:

By default, the assertions in MockTokenizer are turned on for extra checks that the consumer is consuming properly. These checks can be disabled with EnableChecks.
Payload data is randomly injected into the stream for more thorough testing of payloads.

MockBytesAnalyzer

Lucene.Net.Analysis.Analyzer for testing that encodes terms as UTF-16 bytes.

MockBytesAttributeFactory

Lucene.Net.Util.AttributeSource.AttributeFactory that implements Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute with MockUTF16TermAttributeImpl.

MockCharFilter

The purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use Lucene.Net.Analysis.CharFilter.CorrectOffset(System.Int32) or does incorrect offset math.

MockFixedLengthPayloadFilter

Lucene.Net.Analysis.TokenFilter that adds random fixed-length payloads.

MockGraphTokenFilter

Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1. The chain must have an Lucene.Net.Analysis.TokenAttributes.IOffsetAttribute.

MockHoleInjectingTokenFilter

Randomly injects holes (similar to what a stopfilter would do)

MockPayloadAnalyzer

Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.

MockRandomLookaheadTokenFilter

Uses LookaheadTokenFilter to randomly peek at future tokens.

MockReaderWrapper

Wraps a System.IO.TextReader, and can throw random or fixed exceptions, and spoon feed read chars.

MockTokenFilter

A Lucene.Net.Analysis.TokenFilter for testing that removes terms accepted by a DFA.

Union a list of singletons to act like a Lucene.Net.Analysis.Core.StopFilter.
Use the complement to act like a Lucene.Net.Analysis.Miscellaneous.KeepWordFilter.
Use a regex like .{12,} to act like a Lucene.Net.Analysis.Miscellaneous.LengthFilter.

MockTokenizer

Tokenizer for testing.

This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a Lucene.Net.Analysis.TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:

An internal state-machine is used for checking consumer consistency. These checks can be disabled with EnableChecks.
For convenience, optionally lowercases terms that it outputs.

MockUTF16TermAttributeImpl

Extension of Lucene.Net.Analysis.TokenAttributes.CharTermAttribute that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.

MockVariableLengthPayloadFilter

Lucene.Net.Analysis.TokenFilter that adds random variable-length payloads.

TokenStreamToDot

Consumes a Lucene.Net.Analysis.TokenStream and outputs the dot (graphviz) string (graph).

ValidatingTokenFilter

A Lucene.Net.Analysis.TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).

VocabularyAssert

Utility class for doing vocabulary-based stemming tests.

Interfaces

IBinaryTermAttribute

An attribute extending Lucene.Net.Analysis.TokenAttributes.ITermToBytesRefAttribute but exposing BytesRef property.

ICheckClearAttributesAttribute

Attribute that records if it was cleared or not. this is used for testing that Lucene.Net.Util.AttributeSource.ClearAttributes() was called correctly.