Namespace Lucene.Net.Analysis
Support for testing analysis components.
The main classes of interest are: * BaseTokenStreamTestCase: Highly recommended to use its helper methods, (especially in conjunction with MockAnalyzer or MockTokenizer), as it contains many assertions and checks to catch bugs. * MockTokenizer: Tokenizer for testing. Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. * MockAnalyzer: Analyzer for testing. Analyzer that uses MockTokenizer for additional verification. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead.
Classes
BaseTokenStreamTestCase
Base class for all Lucene unit tests that use TokenStreams.
When writing unit tests for analysis components, its highly recommended to use the helper methods here (especially in conjunction with MockAnalyzer or MockTokenizer), as they contain many assertions and checks to catch bugs.
BinaryTermAttribute
Implementation for IBinaryTermAttribute.
BinaryToken
Represents a binary token.
CannedBinaryTokenStream
TokenStream from a canned list of binary (BytesRef-based) tokens.
CannedTokenStream
TokenStream from a canned list of Tokens.
CheckClearAttributesAttribute
Attribute that records if it was cleared or not. this is used for testing that ClearAttributes() was called correctly.
CollationTestBase
Base test class for testing Unicode collation.
LookaheadTokenFilter
LUCENENET specific abstraction so we can reference LookaheadTokenFilter.Position without specifying a generic closing type.
LookaheadTokenFilter.Position
Holds all state for a single position; subclass this to record other state at each position.
LookaheadTokenFilter<T>
An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead. This class handles the details of buffering up tokens, recording them by position, restoring them, providing access to them, etc.
MockAnalyzer
Analyzer for testing.
This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:
- By default, the assertions in MockTokenizer are turned on for extra checks that the consumer is consuming properly. These checks can be disabled with EnableChecks.
- Payload data is randomly injected into the stream for more thorough testing of payloads.
MockBytesAnalyzer
Analyzer for testing that encodes terms as UTF-16 bytes.
MockBytesAttributeFactory
AttributeSource.AttributeFactory that implements ICharTermAttribute with MockUTF16TermAttributeImpl.
MockCharFilter
The purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use CorrectOffset(Int32) or does incorrect offset math.
MockFixedLengthPayloadFilter
TokenFilter that adds random fixed-length payloads.
MockGraphTokenFilter
Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1. The chain must have an IOffsetAttribute.
MockHoleInjectingTokenFilter
Randomly injects holes (similar to what a stopfilter would do)
MockPayloadAnalyzer
Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.
MockRandomLookaheadTokenFilter
Uses LookaheadTokenFilter to randomly peek at future tokens.
MockReaderWrapper
Wraps a System.IO.TextReader, and can throw random or fixed exceptions, and spoon feed read chars.
MockTokenFilter
A TokenFilter for testing that removes terms accepted by a DFA.
- Union a list of singletons to act like a StopFilter.
- Use the complement to act like a KeepWordFilter.
- Use a regex like
.{12,}
to act like a LengthFilter.
MockTokenizer
Tokenizer for testing.
This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:
- An internal state-machine is used for checking consumer consistency. These checks can be disabled with EnableChecks.
- For convenience, optionally lowercases terms that it outputs.
MockUTF16TermAttributeImpl
Extension of CharTermAttribute that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.
MockVariableLengthPayloadFilter
TokenFilter that adds random variable-length payloads.
TokenStreamToDot
Consumes a TokenStream and outputs the dot (graphviz) string (graph).
ValidatingTokenFilter
A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).
VocabularyAssert
Utility class for doing vocabulary-based stemming tests.
Interfaces
IBinaryTermAttribute
An attribute extending ITermToBytesRefAttribute but exposing BytesRef property.
ICheckClearAttributesAttribute
Attribute that records if it was cleared or not. this is used for testing that ClearAttributes() was called correctly.