Namespace Lucene.Net.Util

Some utility classes.

Classes

AlreadySetException

Thrown when Set(T) is called more than once.

ArrayUtil

Methods for manipulating arrays.

This is a Lucene.NET INTERNAL API, use at your own risk

Attribute

Base class for Attributes that can be added to a AttributeSource.

Attributes are used to add data in a dynamic, yet type-safe way to a source of usually streamed objects, e. g. a TokenStream.

An AttributeSource contains a list of different Attributes, and methods to add and get them. There can only be a single instance of an attribute in the same AttributeSource instance. This is ensured by passing in the actual type of the IAttribute to the AddAttribute<T>(), which then checks if an instance of that type is already present. If yes, it returns the instance, otherwise it creates a new instance and returns it.

AttributeSource.AttributeFactory

An AttributeSource.AttributeFactory creates instances of Attributes.

AttributeSource.State

This class holds the state of an AttributeSource.

Bits

Bits.MatchAllBits

Bits impl of the specified length with all bits set.

Bits.MatchNoBits

Bits impl of the specified length with no bits set.

BitUtil

A variety of high efficiency bit twiddling routines.

This is a Lucene.NET INTERNAL API, use at your own risk

BroadWord

Methods and constants inspired by the article "Broadword Implementation of Rank/Select Queries" by Sebastiano Vigna, January 30, 2012:

algorithm 1: Lucene.Net.Util.BroadWord.BitCount(System.Int64), count of set bits in a System.Int64
algorithm 2: Select(Int64, Int32), selection of a set bit in a System.Int64,
bytewise signed smaller <₈ operator: SmallerUpTo7_8(Int64, Int64).
shortwise signed smaller <₁₆ operator: SmallerUpto15_16(Int64, Int64).
some of the Lk and Hk constants that are used by the above: L8 L8_L, H8 H8_L, L9 L9_L, L16 L16_Land H16 H8_L.

This is a Lucene.NET INTERNAL API, use at your own risk

BundleResourceManagerFactory

This implementation of IResourceManagerFactory uses a convention to retrieve resources. In Java NLS, the convention is to use the same name for the resource key propeties and for the resource file names. This presents a problem for .NET because the resource generator already creates an internal class with the same name as the .resx file.

To work around this, we use the convention of appending the suffix "Bundle" to the end of the type the resource key propeties are stored in. For example, if our constants are stored in a class named ErrorMessages, the type that will be looked up by this factory will be ErrorMessagesBundle (which is the name of the .resx file that should be added to your project).

This implementation can be inherited to use a different convention or can be replaced to get the resources from an external source.

ByteBlockPool

Class that Posting and PostingVector use to write byte streams into shared fixed-size byte[] arrays. The idea is to allocate slices of increasing lengths. For example, the first slice is 5 bytes, the next slice is 14, etc. We start by writing our bytes into the first 5 bytes. When we hit the end of the slice, we allocate the next slice and then write the address of the new slice into the last 4 bytes of the previous slice (the "forwarding address").

Each slice is filled with 0's initially, and we mark the end with a non-zero byte. This way the methods that are writing into the slice don't need to record its length and instead allocate a new slice once they hit a non-zero byte.

This is a Lucene.NET INTERNAL API, use at your own risk

ByteBlockPool.Allocator

Abstract class for allocating and freeing byte blocks.

ByteBlockPool.DirectAllocator

A simple ByteBlockPool.Allocator that never recycles.

ByteBlockPool.DirectTrackingAllocator

A simple ByteBlockPool.Allocator that never recycles, but tracks how much total RAM is in use.

BytesRef

Represents byte[], as a slice (offset + length) into an existing byte[]. The Bytes property should never be null; use EMPTY_BYTES if necessary.

Important note: Unless otherwise noted, Lucene uses this class to represent terms that are encoded as UTF8 bytes in the index. To convert them to a .NET System.String (which is UTF16), use Utf8ToString(). Using code like new String(bytes, offset, length) to do this is wrong, as it does not respect the correct character set and may return wrong results (depending on the platform's defaults)!

BytesRefArray

A simple append only random-access BytesRef array that stores full copies of the appended bytes in a ByteBlockPool.

Note: this class is not Thread-Safe!

This is a Lucene.NET INTERNAL API, use at your own risk

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

BytesRefHash

BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

Note: The maximum capacity BytesRef instance passed to Add(BytesRef) must not be longer than BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

This is a Lucene.NET INTERNAL API, use at your own risk

BytesRefHash.BytesStartArray

Manages allocation of the per-term addresses.

BytesRefHash.DirectBytesStartArray

A simple BytesRefHash.BytesStartArray that tracks memory allocation using a private Counter instance.

BytesRefHash.MaxBytesLengthExceededException

Thrown if a BytesRef exceeds the BytesRefHash limit of BYTE_BLOCK_SIZE-2.

BytesRefIterator

LUCENENET specific class to make the syntax of creating an empty IBytesRefIterator the same as it was in Lucene. Example:

var iter = BytesRefIterator.Empty;

CharsRef

Represents char[], as a slice (offset + Length) into an existing char[]. The Chars property should never be null; use EMPTY_CHARS if necessary.

This is a Lucene.NET INTERNAL API, use at your own risk

CollectionUtil

Methods for manipulating (sorting) collections. Sort methods work directly on the supplied lists and don't copy to/from arrays before/after. For medium size collections as used in the Lucene indexer that is much more efficient.

This is a Lucene.NET INTERNAL API, use at your own risk

CommandLineUtil

Class containing some useful methods used by command line tools

Constants

Some useful constants.

Counter

Simple counter class

This is a Lucene.NET INTERNAL API, use at your own risk

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

DisposableThreadLocal<T>

Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries.

While not technically a memory leak, because eventually the memory will be reclaimed, it can take a long time and you can easily hit System.OutOfMemoryException because from the GC's standpoint the stale entries are not reclaimable.

This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call Dispose(), these hard references are cleared and then GC is freely able to reclaim space by objects stored in it.

You should not call Dispose() until all threads are done using the instance.

This is a Lucene.NET INTERNAL API, use at your own risk

DocIdBitSet

Simple DocIdSet and DocIdSetIterator backed by a BitSet

DoubleBarrelLRUCache

LUCENENET specific class to nest the DoubleBarrelLRUCache.CloneableKey so it can be accessed without referencing the generic closing types of DoubleBarrelLRUCache<TKey, TValue>.

DoubleBarrelLRUCache.CloneableKey

Object providing clone(); the key class must subclass this.

DoubleBarrelLRUCache<TKey, TValue>

Simple concurrent LRU cache, using a "double barrel" approach where two ConcurrentHashMaps record entries.

At any given time, one hash is primary and the other is secondary. Get(TKey) first checks primary, and if that's a miss, checks secondary. If secondary has the entry, it's promoted to primary (NOTE: the key is cloned at this point). Once primary is full, the secondary is cleared and the two are swapped.

This is not as space efficient as other possible concurrent approaches (see LUCENE-2075): to achieve perfect LRU(N) it requires 2*N storage. But, this approach is relatively simple and seems in practice to not grow unbounded in size when under hideously high load.

This is a Lucene.NET INTERNAL API, use at your own risk

ExceptionExtensions

Extensions to the System.Exception class to allow for adding and retrieving suppressed exceptions, like you can do in Java.

ExcludeServiceAttribute

Base class for Attribute types that exclude services from Reflection scanning.

FieldCacheSanityChecker

Provides methods for sanity checking that entries in the FieldCache are not wasteful or inconsistent.

Lucene 2.9 Introduced numerous enhancements into how the FieldCache is used by the low levels of Lucene searching (for Sorting and ValueSourceQueries) to improve both the speed for Sorting, as well as reopening of IndexReaders. But these changes have shifted the usage of FieldCache from "top level" IndexReaders (frequently a MultiReader or DirectoryReader) down to the leaf level SegmentReaders. As a result, existing applications that directly access the FieldCache may find RAM usage increase significantly when upgrading to 2.9 or Later. This class provides an API for these applications (or their Unit tests) to check at run time if the FieldCache contains "insane" usages of the FieldCache.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

FieldCacheSanityChecker.Insanity

Simple container for a collection of related FieldCache.CacheEntry objects that in conjunction with each other represent some "insane" usage of the IFieldCache.

FieldCacheSanityChecker.InsanityType

An Enumeration of the different types of "insane" behavior that may be detected in a IFieldCache.

FilterIterator<T>

An System.Collections.Generic.IEnumerator<T> implementation that filters elements with a boolean predicate.

FixedBitSet

BitSet of fixed length (numBits), backed by accessible (GetBits()) long[], accessed with an int index, implementing GetBits() and DocIdSet. If you need to manage more than 2.1B bits, use Int64BitSet.

This is a Lucene.NET INTERNAL API, use at your own risk

FixedBitSet.FixedBitSetIterator

A DocIdSetIterator which iterates over set bits in a FixedBitSet.

GrowableByteArrayDataOutput

A DataOutput that can be used to build a byte[].

This is a Lucene.NET INTERNAL API, use at your own risk

IndexableBinaryStringTools

Provides support for converting byte sequences to System.Strings and back again. The resulting System.Strings preserve the original byte sequences' sort order.

The System.Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded System.String represents a 15-bit chunk from the byte sequence. Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D8000-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.

Although unset bits are used as padding in the final char, the original byte sequence could contain trailing bytes with no set bits (null bytes): padding is indistinguishable from valid information. To overcome this problem, a char is appended, indicating the number of encoded bytes in the final content char.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

InfoStream

Debugging API for Lucene classes such as IndexWriter and SegmentInfos.

NOTE: Enabling infostreams may cause performance degradation in some components.

This is a Lucene.NET INTERNAL API, use at your own risk

InPlaceMergeSorter

Sorter implementation based on the merge-sort algorithm that merges in place (no extra memory will be allocated). Small arrays are sorted with insertion sort.

This is a Lucene.NET INTERNAL API, use at your own risk

Int32BlockPool

A pool for System.Int32 blocks similar to ByteBlockPool.

NOTE: This was IntBlockPool in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

Int32BlockPool.Allocator

Abstract class for allocating and freeing System.Int32 blocks.

Int32BlockPool.DirectAllocator

A simple Int32BlockPool.Allocator that never recycles.

Int32BlockPool.SliceReader

A Int32BlockPool.SliceReader that can read System.Int32 slices written by a Int32BlockPool.SliceWriter.

This is a Lucene.NET INTERNAL API, use at your own risk

Int32BlockPool.SliceWriter

A Int32BlockPool.SliceWriter that allows to write multiple integer slices into a given Int32BlockPool.

This is a Lucene.NET INTERNAL API, use at your own risk

Int32sRef

Represents int[], as a slice (offset + length) into an existing int[]. The Int32s member should never be null; use EMPTY_INT32S if necessary.

NOTE: This was IntsRef in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

Int64BitSet

BitSet of fixed length (Lucene.Net.Util.Int64BitSet.numBits), backed by accessible (GetBits()) long[], accessed with a System.Int64 index. Use it only if you intend to store more than 2.1B bits, otherwise you should use FixedBitSet.

NOTE: This was LongBitSet in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

Int64sRef

Represents long[], as a slice (offset + length) into an existing long[]. The Int64s member should never be null; use EMPTY_INT64S if necessary.

NOTE: This was LongsRef in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

Int64Values

Abstraction over an array of System.Int64s. This class extends NumericDocValues so that we don't need to add another level of abstraction every time we want eg. to use the PackedInt32s utility classes to represent a NumericDocValues instance.

NOTE: This was LongValues in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

IntroSorter

Sorter implementation based on a variant of the quicksort algorithm called introsort: when the recursion level exceeds the log of the length of the array to sort, it falls back to heapsort. This prevents quicksort from running into its worst-case quadratic runtime. Small arrays are sorted with insertion sort.

This is a Lucene.NET INTERNAL API, use at your own risk

IOUtils

This class emulates the new Java 7 "Try-With-Resources" statement. Remove once Lucene is on Java 7.

This is a Lucene.NET INTERNAL API, use at your own risk

ListExtensions

Extensions to System.Collections.Generic.IList<T>.

LuceneVersionExtensions

Extension methods to the LuceneVersion enumeration to provide version comparison and parsing functionality.

MapOfSets<TKey, TValue>

Helper class for keeping Lists of Objects associated with keys. WARNING: this CLASS IS NOT THREAD SAFE

This is a Lucene.NET INTERNAL API, use at your own risk

MathUtil

Math static utility methods.

MergedIterator<T>

Provides a merged sorted view from several sorted iterators.

If built with Lucene.Net.Util.MergedIterator`1.removeDuplicates set to true and an element appears in multiple iterators then it is deduplicated, that is this iterator returns the sorted union of elements.

If built with Lucene.Net.Util.MergedIterator`1.removeDuplicates set to false then all elements in all iterators are returned.

Caveats:

The behavior is undefined if the iterators are not actually sorted.
Null elements are unsupported.
If Lucene.Net.Util.MergedIterator`1.removeDuplicates is set to true and if a single iterator contains duplicates then they will not be deduplicated.
When elements are deduplicated it is not defined which one is returned.
If Lucene.Net.Util.MergedIterator`1.removeDuplicates is set to false then the order in which duplicates are returned isn't defined.

The caller is responsible for disposing the System.Collections.Generic.IEnumerator<T> instances that are passed into the constructor, MergedIterator<T> doesn't do it automatically.

This is a Lucene.NET INTERNAL API, use at your own risk

NamedServiceFactory<TService>

LUCENENET specific abstract class containing common fuctionality for named service factories.

NumberFormat

A LUCENENET specific class that represents a numeric format. This class mimicks the design of Java's NumberFormat class, which unlike the System.Globalization.NumberFormatInfo class in .NET, can be subclassed.

NumericUtils

This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.

This class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding.

To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: DoubleToSortableInt64(Double), SingleToSortableInt32(Single). You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to System.Int64s or System.Int32s (e.g. date to long: System.DateTime.Ticks).

For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index System.Int32, System.Int64, System.Single, and System.Double. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.

This class can also be used, to generate lexicographically sortable (according to UTF8SortedAsUTF16Comparer) representations of numeric data types for other usages (e.g. sorting).

This is a Lucene.NET INTERNAL API, use at your own risk

@since 2.9, API changed non backwards-compliant in 4.0

NumericUtils.Int32RangeBuilder

Callback for SplitInt32Range(NumericUtils.Int32RangeBuilder, Int32, Int32, Int32). You need to override only one of the methods.

NOTE: This was IntRangeBuilder in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

@since 2.9, API changed non backwards-compliant in 4.0

NumericUtils.Int64RangeBuilder

Callback for SplitInt64Range(NumericUtils.Int64RangeBuilder, Int32, Int64, Int64). You need to override only one of the methods.

NOTE: This was LongRangeBuilder in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

@since 2.9, API changed non backwards-compliant in 4.0

OfflineSorter

On-disk sorting of byte arrays. Each byte array (entry) is a composed of the following fields:

(two bytes) length of the following byte array,
exactly the above count of bytes for the sequence to be sorted.

OfflineSorter.BufferSize

A bit more descriptive unit for constructors.

OfflineSorter.ByteSequencesReader

Utility class to read length-prefixed byte[] entries from an input. Complementary to OfflineSorter.ByteSequencesWriter.

OfflineSorter.ByteSequencesWriter

Utility class to emit length-prefixed byte[] entries to an output stream for sorting. Complementary to OfflineSorter.ByteSequencesReader.

OfflineSorter.SortInfo

Sort info (debugging mostly).

OpenBitSet

An "open" BitSet implementation that allows direct access to the array of words storing the bits.

NOTE: This can be used in .NET any place where a java.util.BitSet is used in Java.

Unlike java.util.BitSet, the fact that bits are packed into an array of longs is part of the interface. This allows efficient implementation of other algorithms by someone other than the author. It also allows one to efficiently implement alternate serialization or interchange formats.

OpenBitSet is faster than java.util.BitSet in most operations and much faster at calculating cardinality of sets and results of set operations. It can also handle sets of larger cardinality (up to 64 * 2**32-1)

The goals of OpenBitSet are the fastest implementation possible, and maximum code reuse. Extra safety and encapsulation may always be built on top, but if that's built in, the cost can never be removed (and hence people re-implement their own version in order to get better performance).

Performance Results

Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M

BitSet size = 1,000,000

Results are java.util.BitSet time divided by OpenBitSet time.

cardinalityIntersectionCountUnionNextSetBitGetGetIterator
50% full	3.363.961.441.461.991.58
1% full	3.313.90 1.04 0.99

Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M

BitSet size = 1,000,000

Results are java.util.BitSet time divided by OpenBitSet time.

cardinalityIntersectionCountUnionNextSetBitGetGetIterator
50% full	2.503.501.001.031.121.25
1% full	2.513.49 1.00 1.02

OpenBitSetDISI

OpenBitSet with added methods to bulk-update the bits from a DocIdSetIterator. (DISI stands for DocIdSetIterator).

OpenBitSetIterator

An iterator to iterate over set bits in an OpenBitSet. this is faster than NextSetBit(Int64) for iterating over the complete set of bits, especially when the density of the bits set is high.

PagedBytes

Represents a logical byte[] as a series of pages. You can write-once into the logical byte[] (append only), using copy, and then retrieve slices (BytesRef) into it using fill.

This is a Lucene.NET INTERNAL API, use at your own risk

PagedBytes.PagedBytesDataInput

PagedBytes.PagedBytesDataOutput

PagedBytes.Reader

Provides methods to read BytesRefs from a frozen PagedBytes.

PForDeltaDocIdSet

DocIdSet implementation based on pfor-delta encoding.

This implementation is inspired from LinkedIn's Kamikaze (http://data.linkedin.com/opensource/kamikaze) and Daniel Lemire's JavaFastPFOR (https://github.com/lemire/JavaFastPFOR).

On the contrary to the original PFOR paper, exceptions are encoded with FOR instead of Simple16.

PForDeltaDocIdSet.Builder

A builder for PForDeltaDocIdSet.

PrintStreamInfoStream

LUCENENET specific stub to assist with migration to TextWriterInfoStream.

PriorityQueue<T>

A PriorityQueue<T> maintains a partial ordering of its elements such that the element with least priority can always be found in constant time. Put()'s and Pop()'s require log(size) time.

NOTE: this class will pre-allocate a full array of length maxSize+1 if instantiated via the PriorityQueue(Int32, Boolean) constructor with prepopulate set to true. That maximum size can grow as we insert elements over the time.

This is a Lucene.NET INTERNAL API, use at your own risk

QueryBuilder

Creates queries from the Analyzer chain.

Example usage:

    QueryBuilder builder = new QueryBuilder(analyzer);
    Query a = builder.CreateBooleanQuery("body", "just a test");
    Query b = builder.CreatePhraseQuery("body", "another test");
    Query c = builder.CreateMinShouldMatchQuery("body", "another test", 0.5f);

This can also be used as a subclass for query parsers to make it easier to interact with the analysis chain. Factory methods such as NewTermQuery(Term) are provided so that the generated queries can be customized.

RamUsageEstimator

Estimates the size (memory representation) of .NET objects.

This is a Lucene.NET INTERNAL API, use at your own risk

RecyclingByteBlockAllocator

A ByteBlockPool.Allocator implementation that recycles unused byte blocks in a buffer and reuses them in subsequent calls to GetByteBlock().

Note: this class is not thread-safe.

This is a Lucene.NET INTERNAL API, use at your own risk

RecyclingInt32BlockAllocator

A Int32BlockPool.Allocator implementation that recycles unused System.Int32 blocks in a buffer and reuses them in subsequent calls to GetInt32Block().

Note: this class is not thread-safe.

NOTE: This was RecyclingIntBlockAllocator in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

RefCount<T>

Manages reference counting for a given object. Extensions can override Release() to do custom logic when reference counting hits 0.

RollingBuffer

LUCENENET specific class to allow referencing static members of RollingBuffer<T> without referencing its generic closing type.

RollingBuffer<T>

Acts like forever growing T[], but internally uses a circular buffer to reuse instances of .

This is a Lucene.NET INTERNAL API, use at your own risk

SentinelInt32Set

A native System.Int32 hash-based set where one value is reserved to mean "EMPTY" internally. The space overhead is fairly low as there is only one power-of-two sized int[] to hold the values. The set is re-hashed when adding a value that would make it >= 75% full. Consider extending and over-riding Hash(Int32) if the values might be poor hash keys; Lucene docids should be fine. The internal fields are exposed publicly to enable more efficient use at the expense of better O-O principles.

To iterate over the integers held in this set, simply use code like this:

SentinelIntSet set = ...
foreach (int v in set.keys) 
{
    if (v == set.EmptyVal)
        continue;
    //use v...
}

NOTE: This was SentinelIntSet in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

ServiceNameAttribute

LUCENENET specific abstract class for System.Attributes that can be used to override the default convention-based names of services. For example, "Lucene40Codec" will by convention be named "Lucene40". Using the CodecNameAttribute, the name can be overridden with a custom value.

SetOnce<T>

A convenient class which offers a semi-immutable object wrapper implementation which allows one to set the value of an object exactly once, and retrieve it many times. If Set(T) is called more than once, AlreadySetException is thrown and the operation will fail.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

SloppyMath

Math functions that trade off accuracy for speed.

SmallSingle

Floating point numbers smaller than 32 bits.

NOTE: This was SmallFloat in Lucene

This is a Lucene.NET INTERNAL API, use at your own risk

Sorter

Base class for sorting algorithms implementations.

This is a Lucene.NET INTERNAL API, use at your own risk

SPIClassIterator<S>

Helper class for loading SPI classes from classpath (META-INF files). This is a light impl of java.util.ServiceLoader but is guaranteed to be bug-free regarding classpath order and does not instantiate or initialize the classes found.

This is a Lucene.NET INTERNAL API, use at your own risk

StringHelper

Methods for manipulating strings.

This is a Lucene.NET INTERNAL API, use at your own risk

SystemConsole

Mimics System.Console, but allows for swapping the System.IO.TextWriter of Out and Error, or the System.IO.TextReader of In with user-defined implementations.

TextWriterInfoStream

InfoStream implementation over a System.IO.TextWriter such as System.Console.Out.

NOTE: This is analogous to PrintStreamInfoStream in Lucene.

This is a Lucene.NET INTERNAL API, use at your own risk

TimSorter

Sorter implementation based on the TimSort algorithm.

This implementation is especially good at sorting partially-sorted arrays and sorts small arrays with binary sort.

NOTE:There are a few differences with the original implementation:

The extra amount of memory to perform merges is configurable. This allows small merges to be very fast while large merges will be performed in-place (slightly slower). You can make sure that the fast merge routine will always be used by having maxTempSlots equal to half of the length of the slice of data to sort.
Only the fast merge routine can gallop (the one that doesn't run in-place) and it only gallops on the longest slice.

This is a Lucene.NET INTERNAL API, use at your own risk

ToStringUtils

Helper methods to ease implementing System.Object.ToString().

UnicodeUtil

Class to encode .NET's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as System.Text.Encoding.GetBytes(System.String) of System.Text.Encoding.UTF8 does.

This is a Lucene.NET INTERNAL API, use at your own risk

VirtualMethod

A utility for keeping backwards compatibility on previously abstract methods (or similar replacements).

Before the replacement method can be made abstract, the old method must kept deprecated. If somebody still overrides the deprecated method in a non-sealed class, you must keep track, of this and maybe delegate to the old method in the subclass. The cost of reflection is minimized by the following usage of this class:

Define static readonly fields in the base class (BaseClass), where the old and new method are declared:

internal static readonly VirtualMethod newMethod =
    new VirtualMethod(typeof(BaseClass), "newName", parameters...);
internal static readonly VirtualMethod oldMethod =
    new VirtualMethod(typeof(BaseClass), "oldName", parameters...);

this enforces the singleton status of these objects, as the maintenance of the cache would be too costly else. If you try to create a second instance of for the same method/baseClass combination, an exception is thrown.

To detect if e.g. the old method was overridden by a more far subclass on the inheritance path to the current instance's class, use a non-static field:

 bool isDeprecatedMethodOverridden =
     oldMethod.GetImplementationDistance(this.GetType()) > newMethod.GetImplementationDistance(this.GetType());

// alternatively (more readable):
bool isDeprecatedMethodOverridden =
    VirtualMethod.CompareImplementationDistance(this.GetType(), oldMethod, newMethod) > 0

GetImplementationDistance(Type) returns the distance of the subclass that overrides this method. The one with the larger distance should be used preferable. this way also more complicated method rename scenarios can be handled (think of 2.9 TokenStream deprecations).

This is a Lucene.NET INTERNAL API, use at your own risk

WAH8DocIdSet

DocIdSet implementation based on word-aligned hybrid encoding on words of 8 bits.

This implementation doesn't support random-access but has a fast DocIdSetIterator which can advance in logarithmic time thanks to an index.

The compression scheme is simplistic and should work well with sparse and very dense doc id sets while being only slightly larger than a FixedBitSet for incompressible sets (overhead<2% in the worst case) in spite of the index.

Format: The format is byte-aligned. An 8-bits word is either clean, meaning composed only of zeros or ones, or dirty, meaning that it contains between 1 and 7 bits set. The idea is to encode sequences of clean words using run-length encoding and to leave sequences of dirty words as-is.

TokenClean length+Dirty length+Dirty words
1 byte0-n bytes0-n bytes0-n bytes

Token encodes whether clean means full of zeros or ones in the first bit, the number of clean words minus 2 on the next 3 bits and the number of dirty words on the last 4 bits. The higher-order bit is a continuation bit, meaning that the number is incomplete and needs additional bytes to be read.
Clean length+: If clean length has its higher-order bit set, you need to read a vint (ReadVInt32()), shift it by 3 bits on the left side and add it to the 3 bits which have been read in the token.
Dirty length+ works the same way as Clean length+ but on 4 bits and for the length of dirty words.
Dirty wordsare the dirty words, there are Dirty length of them.

This format cannot encode sequences of less than 2 clean words and 0 dirty word. The reason is that if you find a single clean word, you should rather encode it as a dirty word. This takes the same space as starting a new sequence (since you need one byte for the token) but will be lighter to decode. There is however an exception for the first sequence. Since the first sequence may start directly with a dirty word, the clean length is encoded directly, without subtracting 2.

There is an additional restriction on the format: the sequence of dirty words is not allowed to contain two consecutive clean words. This restriction exists to make sure no space is wasted and to make sure iterators can read the next doc ID by reading at most 2 dirty words.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

WAH8DocIdSet.Builder

A builder for WAH8DocIdSets.

WAH8DocIdSet.WordBuilder

Word-based builder.

Interfaces

IAccountable

An object whose RAM usage can be computed.

This is a Lucene.NET INTERNAL API, use at your own risk

IAttribute

Base interface for attributes.

IAttributeReflector

This interface is used to reflect contents of AttributeSource or Attribute.

IBits

Interface for Bitset-like structures.

This is a Lucene.NET EXPERIMENTAL API, use at your own risk

IBytesRefIterator

A simple iterator interface for BytesRef iteration.

IMutableBits

Extension of IBits for live documents.

IResourceManagerFactory

LUCENENET specific interface used to inject instances of System.Resources.ResourceManager. This extension point can be used to override the default behavior to, for example, retrieve resources from a persistent data store, rather than getting them from resource files.

IServiceListable

LUCENENET specific contract that provides support for AvailableCodecs, AvailableDocValuesFormats, and AvailablePostingsFormats. Implement this interface in addition to ICodecFactory, IDocValuesFormatFactory, or IPostingsFormatFactory to provide optional support for the above methods when providing a custom implementation. If this interface is not supported by the corresponding factory, a System.NotSupportedException will be thrown from the above methods.

RollingBuffer.IResettable

Implement to reset an instance

Enums

LuceneVersion

Use by certain classes to match version compatibility across releases of Lucene.

WARNING: When changing the version parameter that you supply to components in Lucene, do not simply change the version at search-time, but instead also adjust your indexing code to match, and re-index.