Namespace Lucene.Net.Analysis
Support for testing analysis components.
The main classes of interest are: * BaseTokenStreamTestCase: Highly recommended to use its helper methods, (especially in conjunction with MockAnalyzer or MockTokenizer), as it contains many assertions and checks to catch bugs. * MockTokenizer: Tokenizer for testing. Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. * MockAnalyzer: Analyzer for testing. Analyzer that uses MockTokenizer for additional verification. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead.
Base class for all Lucene unit tests that use TokenStreams.
When writing unit tests for analysis components, its highly recommended to use the helper methods here (especially in conjunction with MockAnalyzer or MockTokenizer), as they contain many assertions and checks to catch bugs.
Implementation for IBinaryTermAttribute.
Represents a binary token.
TokenStream from a canned list of binary (BytesRef-based) tokens.
TokenStream from a canned list of Tokens.
Attribute that records if it was cleared or not. this is used for testing that ClearAttributes() was called correctly.
Base test class for testing Unicode collation.
LUCENENET specific abstraction so we can reference LookaheadTokenFilter.Position without specifying a generic closing type.
Holds all state for a single position; subclass this to record other state at each position.
An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead. This class handles the details of buffering up tokens, recording them by position, restoring them, providing access to them, etc.
Analyzer for testing.
This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:
- By default, the assertions in MockTokenizer are turned on for extra checks that the consumer is consuming properly. These checks can be disabled with EnableChecks.
- Payload data is randomly injected into the stream for more thorough testing of payloads.
Analyzer for testing that encodes terms as UTF-16 bytes.
AttributeSource.AttributeFactory that implements ICharTermAttribute with MockUTF16TermAttributeImpl.
The purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use CorrectOffset(Int32) or does incorrect offset math.
TokenFilter that adds random fixed-length payloads.
Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1. The chain must have an IOffsetAttribute.
Randomly injects holes (similar to what a stopfilter would do)
Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.
Uses LookaheadTokenFilter to randomly peek at future tokens.
Wraps a System.IO.TextReader, and can throw random or fixed exceptions, and spoon feed read chars.
A TokenFilter for testing that removes terms accepted by a DFA.
- Union a list of singletons to act like a StopFilter.
- Use the complement to act like a KeepWordFilter.
- Use a regex like
to act like a LengthFilter.
Tokenizer for testing.
This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:
- An internal state-machine is used for checking consumer consistency. These checks can be disabled with EnableChecks.
- For convenience, optionally lowercases terms that it outputs.
Extension of CharTermAttribute that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.
TokenFilter that adds random variable-length payloads.
Consumes a TokenStream and outputs the dot (graphviz) string (graph).
A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).
Utility class for doing vocabulary-based stemming tests.
An attribute extending ITermToBytesRefAttribute but exposing BytesRef property.
Attribute that records if it was cleared or not. this is used for testing that ClearAttributes() was called correctly.