Namespace Lucene.Net.Util.Fst

Classes

Builder

LUCENENET specific type used to access nested types of Builder<T> without referring to its generic closing type.

Builder.Arc<S>

Expert: holds a pending (seen but not yet serialized) arc.

Builder.FreezeTail<S>

Expert: this is invoked by Builder whenever a suffix is serialized.

Builder.UnCompiledNode<S>

Expert: holds a pending (seen but not yet serialized) Node.

Builds a minimal FST (maps an Int32sRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).

NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

The parameterized type is the output type. See the subclasses of Outputs<T>.

FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

ByteSequenceOutputs

An FST Outputs{BytesRef} implementation where each output is a sequence of bytes.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

BytesRefFSTEnum

LUCENENET specific. This class is to mimic Java's ability to specify nested classes of Generics without having to specify the generic type (i.e. BytesRefFSTEnum.InputOutput{T} rather than BytesRefFSTEnum{T}.InputOutput{T})

BytesRefFSTEnum.InputOutput<T>

Holds a single input (BytesRef) + output pair.

BytesRefFSTEnum<T>

Enumerates all input (BytesRef) + output pairs in an FST.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

CharSequenceOutputs

An FST Outputs<T> implementation where each output is a sequence of characters.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

FST

LUCENENET specific: This new base class is to mimic Java's ability to use nested types without specifying a type parameter. i.e. FST.BytesReader instead of FST<BytesRef>.BytesReader

FST.Arc<T>

Represents a single arc.

FST.BytesReader

Reads bytes stored in an FST.

FST<T>

Represents an finite state machine (FST), using a compact byte[] format.

The format is similar to what's used by Morfologik (http://sourceforge.net/projects/morfologik).

See the FST package documentation for some simple examples.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

FSTEnum<T>

Can Next() and Advance() through the terms in an FST

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Int32SequenceOutputs

An FST Outputs<T> implementation where each output is a sequence of System.Int32s.

NOTE: This was IntSequenceOutputs in Lucene

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Int32sRefFSTEnum

LUCENENET specific. This class is to mimic Java's ability to specify nested classes of Generics without having to specify the generic type (i.e. Int32sRefFSTEnum.InputOutput{T} rather than Int32sRefFSTEnum{T}.InputOutput{T})

NOTE: This was Int32sRefFSTEnum{T} in Lucene

Int32sRefFSTEnum.InputOutput<T>

Holds a single input (Int32sRef) + output pair.

Int32sRefFSTEnum<T>

Enumerates all input (Int32sRef) + output pairs in an FST.

NOTE: This was IntsRefFSTEnum{T} in Lucene

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

ListOfOutputs<T>

Wraps another Outputs implementation and encodes one or more of its output values. You can use this when a single input may need to map to more than one output, maintaining order: pass the same input with a different output by calling Add(Int32sRef, T) multiple times. The builder will then combine the outputs using the Merge(T, T) method.

The resulting FST may not be minimal when an input has more than one output, as this requires pushing all multi-output values to a final state.

NOTE: the only way to create multiple outputs is to add the same input to the FST multiple times in a row. This is how the FST maps a single input to multiple outputs (e.g. you cannot pass a List<Object> to Add(Int32sRef, T)). If your outputs are longs, and you need at most 2, then use UpToTwoPositiveInt64Outputs instead since it stores the outputs more compactly (by stealing a bit from each long value).

NOTE: this cannot wrap itself (ie you cannot make an FST with List<List<Object>> outputs using this). @lucene.experimental

NoOutputs

A null FST Outputs<T> implementation; use this if you just want to build an FSA.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

Outputs<T>

Represents the outputs for an FST, providing the basic algebra required for building and traversing the FST.

Note that any operation that returns NO_OUTPUT must return the same singleton object from NoOutput.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

PairOutputs<A, B>

An FST Outputs<T> implementation, holding two other outputs.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

PairOutputs<A, B>.Pair

Holds a single pair of two outputs.

PositiveInt32Outputs

An FST Outputs<T> implementation where each output is a non-negative value.

NOTE: This was PositiveIntOutputs in Lucene

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

UpToTwoPositiveInt64Outputs

An FST Outputs<T> implementation where each output is one or two non-negative long values. If it's a System.Single output, System.Nullable<T> is returned; else, TwoLongs. Order is preserved in the TwoLongs case, ie .first is the first input/output added to Builder<T>, and .second is the second. You cannot store 0 output with this (that's reserved to mean "no output")!

NOTE: the only way to create a TwoLongs output is to add the same input to the FST twice in a row. This is how the FST maps a single input to two outputs (e.g. you cannot pass a UpToTwoPositiveInt64Outputs.TwoInt64s to Add(Int32sRef, T). If you need more than two then use ListOfOutputs<T>, but if you only have at most 2 then this implementation will require fewer bytes as it steals one bit from each long value.

NOTE: the resulting FST is not guaranteed to be minimal! See Builder<T>.

NOTE: This was UpToTwoPositiveIntOutputs in Lucene - the data type (int) was wrong there - it should have been long

Namespace Lucene.Net.Util.Fst

Classes

Builder

Builder.Arc<S>

Builder.CompiledNode

Builder.FreezeTail<S>

Builder.UnCompiledNode<S>

Builder<T>

ByteSequenceOutputs

BytesRefFSTEnum

BytesRefFSTEnum.InputOutput<T>

BytesRefFSTEnum<T>

CharSequenceOutputs

FST

FST.Arc<T>

FST.BytesReader

FST<T>

FSTEnum<T>

Int32SequenceOutputs

Int32sRefFSTEnum

Int32sRefFSTEnum.InputOutput<T>

Int32sRefFSTEnum<T>

ListOfOutputs<T>

NoOutputs

Outputs<T>

PairOutputs<A, B>

PairOutputs<A, B>.Pair

PositiveInt32Outputs

UpToTwoPositiveInt64Outputs

UpToTwoPositiveInt64Outputs.TwoInt64s

Util

Util.FSTPath<T>

Util.Result<T>

Util.TopNSearcher<T>

Util.TopResults<T>

Interfaces

Builder.INode

Enums

FST.INPUT_TYPE