• API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Analysis

    Classes

    BaseTokenStreamTestCase

    Base class for all Lucene unit tests that use Lucene.Net.Analysis.TokenStreams.

    When writing unit tests for analysis components, its highly recommended to use the helper methods here (especially in conjunction with MockAnalyzer or MockTokenizer), as they contain many assertions and checks to catch bugs.

    BinaryTermAttribute

    Implementation for IBinaryTermAttribute.

    BinaryToken

    Represents a binary token.

    CannedBinaryTokenStream

    Lucene.Net.Analysis.TokenStream from a canned list of binary (Lucene.Net.Util.BytesRef-based) tokens.

    CannedTokenStream

    Lucene.Net.Analysis.TokenStream from a canned list of Lucene.Net.Analysis.Tokens.

    CheckClearAttributesAttribute

    Attribute that records if it was cleared or not. this is used for testing that Lucene.Net.Util.AttributeSource.ClearAttributes() was called correctly.

    CollationTestBase

    Base test class for testing Unicode collation.

    LookaheadTokenFilter

    LUCENENET specific abstraction so we can reference LookaheadTokenFilter.Position without specifying a generic closing type.

    LookaheadTokenFilter.Position

    Holds all state for a single position; subclass this to record other state at each position.

    LookaheadTokenFilter<T>

    An abstract Lucene.Net.Analysis.TokenFilter to make it easier to build graph token filters requiring some lookahead. This class handles the details of buffering up tokens, recording them by position, restoring them, providing access to them, etc.

    MockAnalyzer

    Analyzer for testing.

    This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:

    • By default, the assertions in MockTokenizer are turned on for extra checks that the consumer is consuming properly. These checks can be disabled with EnableChecks.
    • Payload data is randomly injected into the stream for more thorough testing of payloads.

    MockBytesAnalyzer

    Lucene.Net.Analysis.Analyzer for testing that encodes terms as UTF-16 bytes.

    MockBytesAttributeFactory

    Lucene.Net.Util.AttributeSource.AttributeFactory that implements Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute with MockUTF16TermAttributeImpl.

    MockCharFilter

    The purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use Lucene.Net.Analysis.CharFilter.CorrectOffset(System.Int32) or does incorrect offset math.

    MockFixedLengthPayloadFilter

    Lucene.Net.Analysis.TokenFilter that adds random fixed-length payloads.

    MockGraphTokenFilter

    Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1. The chain must have an Lucene.Net.Analysis.TokenAttributes.IOffsetAttribute.

    MockHoleInjectingTokenFilter

    Randomly injects holes (similar to what a stopfilter would do)

    MockPayloadAnalyzer

    Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.

    MockRandomLookaheadTokenFilter

    Uses LookaheadTokenFilter to randomly peek at future tokens.

    MockReaderWrapper

    Wraps a System.IO.TextReader, and can throw random or fixed exceptions, and spoon feed read chars.

    MockTokenFilter

    A Lucene.Net.Analysis.TokenFilter for testing that removes terms accepted by a DFA.

    • Union a list of singletons to act like a Lucene.Net.Analysis.Core.StopFilter.
    • Use the complement to act like a Lucene.Net.Analysis.Miscellaneous.KeepWordFilter.
    • Use a regex like .{12,} to act like a Lucene.Net.Analysis.Miscellaneous.LengthFilter.

    MockTokenizer

    Tokenizer for testing.

    This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a Lucene.Net.Analysis.TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:

    • An internal state-machine is used for checking consumer consistency. These checks can be disabled with EnableChecks.
    • For convenience, optionally lowercases terms that it outputs.

    MockUTF16TermAttributeImpl

    Extension of Lucene.Net.Analysis.TokenAttributes.CharTermAttribute that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.

    MockVariableLengthPayloadFilter

    Lucene.Net.Analysis.TokenFilter that adds random variable-length payloads.

    TokenStreamToDot

    Consumes a Lucene.Net.Analysis.TokenStream and outputs the dot (graphviz) string (graph).

    ValidatingTokenFilter

    A Lucene.Net.Analysis.TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).

    VocabularyAssert

    Utility class for doing vocabulary-based stemming tests.

    Interfaces

    IBinaryTermAttribute

    An attribute extending Lucene.Net.Analysis.TokenAttributes.ITermToBytesRefAttribute but exposing BytesRef property.

    ICheckClearAttributesAttribute

    Attribute that records if it was cleared or not. this is used for testing that Lucene.Net.Util.AttributeSource.ClearAttributes() was called correctly.

    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)