[Missing <summary> documentation for "N:Lucene.Net.Util"]

Classes

  ClassDescription
Public classArrayUtil
Methods for manipulating arrays.
Public classAttributeImpl
Base class for Attributes that can be added to a {@link Lucene.Net.Util.AttributeSource}.

Attributes are used to add data in a dynamic, yet type-safe way to a source of usually streamed objects, e. g. a {@link Lucene.Net.Analysis.TokenStream}.

Public classAttributeSource
An AttributeSource contains a list of different {@link AttributeImpl}s, and methods to add and get them. There can only be a single instance of an attribute in the same AttributeSource instance. This is ensured by passing in the actual type of the Attribute (Class<Attribute>) to the {@link #AddAttribute(Class)}, which then checks if an instance of that type is already present. If yes, it returns the instance, otherwise it creates a new instance and returns it.
Public classAttributeSource..::..AttributeFactory
An AttributeFactory creates instances of {@link AttributeImpl}s.
Public classAttributeSource..::..State
This class holds the state of an AttributeSource.
Public classAverageGuessMemoryModel
An average, best guess, MemoryModel that should work okay on most systems.
Public classBitUtil
A variety of high efficiencly bit twiddling routines.
Public classBitVector
Optimized implementation of a vector of bits. This is more-or-less like java.util.BitSet, but also includes the following:
  • a count() method, which efficiently computes the number of one bits;
  • optimized read from and write to disk;
  • inlinable get() method;
  • store and load, as bit set or d-gaps, depending on sparseness;
Public classCloseableThreadLocal
Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries. While not technically a memory leak, because eventually the memory will be reclaimed, it can take a long time and you can easily hit OutOfMemoryError because from the GC's standpoint the stale entries are not reclaimaible. This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call {@link #close}, these hard references are cleared and then GC is freely able to reclaim space by objects stored in it.
Public classConstants
Some useful constants.
Public classDocIdBitSet
Simple DocIdSet and DocIdSetIterator backed by a BitSet
Public classFieldCacheSanityChecker
Provides methods for sanity checking that entries in the FieldCache are not wasteful or inconsistent.

Lucene 2.9 Introduced numerous enhancements into how the FieldCache is used by the low levels of Lucene searching (for Sorting and ValueSourceQueries) to improve both the speed for Sorting, as well as reopening of IndexReaders. But these changes have shifted the usage of FieldCache from "top level" IndexReaders (frequently a MultiReader or DirectoryReader) down to the leaf level SegmentReaders. As a result, existing applications that directly access the FieldCache may find RAM usage increase significantly when upgrading to 2.9 or Later. This class provides an API for these applications (or their Unit tests) to check at run time if the FieldCache contains "insane" usages of the FieldCache.

EXPERIMENTAL API: This API is considered extremely advanced and experimental. It may be removed or altered w/o warning in future releases of Lucene.

Public classFieldCacheSanityChecker..::..Insanity
Simple container for a collection of related CacheEntry objects that in conjunction with eachother represent some "insane" usage of the FieldCache.
Public classFieldCacheSanityChecker..::..InsanityType
An Enumaration of the differnet types of "insane" behavior that may be detected in a FieldCache.
Public classIndexableBinaryStringTools
Provides support for converting byte sequences to Strings and back again. The resulting Strings preserve the original byte sequences' sort order. The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence. Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D8000-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit. Although unset bits are used as padding in the final char, the original byte sequence could contain trailing bytes with no set bits (null bytes): padding is indistinguishable from valid information. To overcome this problem, a char is appended, indicating the number of encoded bytes in the final content char. This class's operations are defined over CharBuffers and ByteBuffers, to allow for wrapped arrays to be reused, reducing memory allocation costs for repeated operations. Note that this class calls array() and arrayOffset() on the CharBuffers and ByteBuffers it uses, so only wrapped arrays may be used. This class interprets the arrayOffset() and limit() values returned by its input buffers as beginning and end+1 positions on the wrapped array, resprectively; similarly, on the output buffer, arrayOffset() is the first position written to, and limit() is set to one past the final output array position.
Public classMapOfSets<(Of <(<'T, V>)>)>
Helper class for keeping Listss of Objects associated with keys. WARNING: THIS CLASS IS NOT THREAD SAFE
Public classMemoryModel
Returns primitive memory sizes for estimating RAM usage.
Public classNumericUtils
This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.

This class generates terms to achive this: First the numerical integer values need to be converted to strings. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting string is sortable like the original integer value. Each value is also prefixed (in the first char) by the

CopyC#
shift
value (number of bits removed) used during encoding.

To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: {@link #doubleToSortableLong}, {@link #floatToSortableInt}. You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long: {@link java.util.Date#getTime}).

For easy usage, the trie algorithm is implemented for indexing inside {@link NumericTokenStream} that can index

CopyC#
int
,
CopyC#
long
,
CopyC#
float
, and
CopyC#
double
. For querying, {@link NumericRangeQuery} and {@link NumericRangeFilter} implement the query part for the same data types.

This class can also be used, to generate lexicographically sortable (according {@link String#compareTo(String)}) representations of numeric data types for other usages (e.g. sorting).

NOTE: This API is experimental and might change in incompatible ways in the next release.

Public classNumericUtils..::..IntRangeBuilder
Expert: Callback for {@link #splitIntRange}. You need to overwrite only one of the methods.

NOTE: This is a very low-level interface, the method signatures may change in later versions.

Public classNumericUtils..::..LongRangeBuilder
Expert: Callback for {@link #splitLongRange}. You need to overwrite only one of the methods.

NOTE: This is a very low-level interface, the method signatures may change in later versions.

Public classOpenBitSet
An "open" BitSet implementation that allows direct access to the array of words storing the bits.

Unlike java.util.bitset, the fact that bits are packed into an array of longs is part of the interface. This allows efficient implementation of other algorithms by someone other than the author. It also allows one to efficiently implement alternate serialization or interchange formats.

CopyC#
OpenBitSet
is faster than
CopyC#
java.util.BitSet
in most operations and *much* faster at calculating cardinality of sets and results of set operations. It can also handle sets of larger cardinality (up to 64 * 2**32-1)

The goals of

CopyC#
OpenBitSet
are the fastest implementation possible, and maximum code reuse. Extra safety and encapsulation may always be built on top, but if that's built in, the cost can never be removed (and hence people re-implement their own version in order to get better performance). If you want a "safe", totally encapsulated (and slower and limited) BitSet class, use
CopyC#
java.util.BitSet
.

Performance Results

Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M
BitSet size = 1,000,000
Results are java.util.BitSet time divided by OpenBitSet time.
cardinalityintersect_countunionnextSetBitgetiterator
50% full3.363.961.441.461.991.58
1% full3.313.90 1.04 0.99

Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M
BitSet size = 1,000,000
Results are java.util.BitSet time divided by OpenBitSet time.
cardinalityintersect_countunionnextSetBitgetiterator
50% full2.503.501.001.031.121.25
1% full2.513.49 1.00 1.02
Public classOpenBitSetDISI
Public classOpenBitSetIterator
An iterator to iterate over set bits in an OpenBitSet. This is faster than nextSetBit() for iterating over the complete set of bits, especially when the density of the bits set is high.
Public classParameter
A serializable Enum class.
Public classPriorityQueue
A PriorityQueue maintains a partial ordering of its elements such that the least element can always be found in constant time. Put()'s and pop()'s require log(size) time.

NOTE: This class pre-allocates a full array of length

CopyC#
maxSize+1
, in {@link #initialize}.
Public classRamUsageEstimator
Estimates the size of a given Object using a given MemoryModel for primitive size information. Resource Usage: Internally uses a Map to temporally hold a reference to every object seen. If checkIntered, all Strings checked will be interned, but those that were not already interned will be released for GC when the estimate is complete.
Public classReaderUtil
Common util methods for dealing with {@link IndexReader}s.
Public classScorerDocQueue
A ScorerDocQueue maintains a partial ordering of its Scorers such that the least Scorer can always be found in constant time. Put()'s and pop()'s require log(size) time. The ordering is by Scorer.doc().
Public classSimpleStringInterner
Simple lockless and memory barrier free String intern cache that is guaranteed to return the same String instance as String.intern() does.
Public classSmallFloat
Floating point numbers smaller than 32 bits.
Public classSortedVIntList
Stores and iterate on sorted integers in compressed form in RAM.
The code for compressing the differences between ascending integers was borrowed from {@link Lucene.Net.Store.IndexInput} and {@link Lucene.Net.Store.IndexOutput}.

NOTE: this class assumes the stored integers are doc Ids (hence why it extends {@link DocIdSet}). Therefore its {@link #Iterator()} assumes {@link DocIdSetIterator#NO_MORE_DOCS} can be used as sentinel. If you intent to use this value, then make sure it's not used during search flow.

Public classSorterTemplate
Borrowed from Cglib. Allows custom swap so that two arrays can be sorted at the same time.
Public classStringHelper
Methods for manipulating strings. $Id: StringHelper.java 801344 2009-08-05 18:05:06Z yonik $
Public classStringInterner
Subclasses of StringInterner are required to return the same single String object for all equal strings. Depending on the implementation, this may not be the same object returned as String.intern(). This StringInterner base class simply delegates to String.intern().
Public classToStringUtils
Helper methods to ease implementing {@link Object#toString()}.
Public classUnicodeUtil
Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.

WARNING: This API is a new and experimental and may suddenly change.

Public classUnicodeUtil..::..UTF16Result
Public classUnicodeUtil..::..UTF8Result
Public classVersion
Use by certain classes to match version compatibility across releases of Lucene.

WARNING: When changing the version parameter that you supply to components in Lucene, do not simply change the version at search-time, but instead also adjust your indexing code to match, and re-index.

Interfaces

  InterfaceDescription
Public interfaceAttribute
Base interface for attributes.