Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Namespace Lucene.Net.Util.Automaton

    Finite-state automaton for regular expressions.

    This package contains a full DFA/NFA implementation with Unicode alphabet and support for all standard (and a number of non-standard) regular expression operations.

    The most commonly used functionality is located in the classes Automaton and RegExp.

    For more information, go to the package home page at http://www.brics.dk/automaton/.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    Classes

    Automaton

    Finite-state automaton with regular expression operations.

    Class invariants:

    • An automaton is either represented explicitly (with State and Transition objects) or with a singleton string (see Singleton and ExpandSingleton()) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.)
    • Automata are always reduced (see Reduce()) and have no transitions to dead states (see RemoveDeadTransitions()).
    • If an automaton is nondeterministic, then IsDeterministic returns false (but the converse is not required).
    • Automata provided as input to operations are generally assumed to be disjoint.

    If the states or transitions are manipulated manually, the RestoreInvariant() method and IsDeterministic setter should be used afterwards to restore representation invariants that are assumed by the built-in automata operations.

    Note: this class has internal mutable state and is not thread safe. It is the caller's responsibility to ensure any necessary synchronization if you wish to use the same Automaton from multiple threads. In general it is instead recommended to use a RunAutomaton for multithreaded matching: it is immutable, thread safe, and much faster.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    BasicAutomata

    Construction of basic automata.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    BasicOperations

    Basic automata operations.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    ByteRunAutomaton

    Automaton representation for matching UTF-8 byte[].

    CharacterRunAutomaton

    Automaton representation for matching char[].

    CompiledAutomaton

    Immutable class holding compiled details for a given Automaton. The Automaton is deterministic, must not have dead states but is not necessarily minimal.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    LevenshteinAutomata

    Class to construct DFAs that match a word within some edit distance.

    Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    MinimizationOperations

    Operations for minimizing automata.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    RegExp

    Regular Expression extension to Automaton.

    Regular expressions are built from the following abstract syntax:

    regexp::=unionexp
    |
    unionexp::=interexp | unionexp(union)
    |interexp
    interexp::=concatexp & interexp(intersection)[OPTIONAL]
    |concatexp
    concatexp::=repeatexp concatexp(concatenation)
    |repeatexp
    repeatexp::=repeatexp ?(zero or one occurrence)
    |repeatexp *(zero or more occurrences)
    |repeatexp +(one or more occurrences)
    |repeatexp {n}(n occurrences)
    |repeatexp {n,}(n or more occurrences)
    |repeatexp {n,m}(n to m occurrences, including both)
    |complexp
    complexp::=~ complexp(complement)[OPTIONAL]
    |charclassexp
    charclassexp::=[ charclasses ](character class)
    |[^ charclasses ](negated character class)
    |simpleexp
    charclasses::=charclass charclasses
    |charclass
    charclass::=charexp - charexp(character range, including end-points)
    |charexp
    simpleexp::=charexp
    |.(any single character)
    |#(the empty language)[OPTIONAL]
    |@(any string)[OPTIONAL]
    |" <Unicode string without double-quotes> "(a string)
    |( )(the empty string)
    |( unionexp )(precedence override)
    |< <identifier> >(named automaton)[OPTIONAL]
    |<n-m>(numerical interval)[OPTIONAL]
    charexp::=<Unicode character>(a single non-reserved character)
    |</strong> <Unicode character> (a single character)

    The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (</code>) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    RunAutomaton

    Finite-state automaton with fast run operation.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    SpecialOperations

    Special automata operations.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    State

    Automaton state.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    StatePair

    Pair of states.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    Transition

    Automaton transition.

    A transition, which belongs to a source state, consists of a Unicode codepoint interval and a destination state.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    UTF32ToUTF8

    Converts UTF-32 automata to the equivalent UTF-8 representation.

    This is a Lucene.NET INTERNAL API, use at your own risk

    Interfaces

    IAutomatonProvider

    Automaton provider for RegExp.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk

    Enums

    CompiledAutomaton.AUTOMATON_TYPE

    Automata are compiled into different internal forms for the most efficient execution depending upon the language they accept.

    RegExpSyntax

    • Improve this Doc
    Back to top Copyright © 2021 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.