Namespace Lucene.Net.Codecs.Bloom
Classes
BloomFilterFactory
Class used to create index-time Fuzzy
BloomFilteringPostingsFormat
A Postings
A choice of Bloom
The format of the blm file is as follows:
- BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
- Filter --> FieldNumber, FuzzySet
- FuzzySet -->See Serialize(Data
Output) - Header --> CodecHeader (Write
Header(Data )Output, String, Int32) - DelegatePostingsFormatName --> String (Write
String(String) ) The name of a ServiceProvider registered PostingsFormat - NumFilteredFields --> Uint32 (Write
Int32(Int32) ) - FieldNumber --> Uint32 (Write
Int32(Int32) ) The number of the field in this segment - Footer --> CodecFooter (Write
Footer(Index )Output)
DefaultBloomFilterFactory
Default policy is to allocate a bitset with 10% saturation given a unique term per document.
Bits are set via Murmur
FuzzySet
A class used to represent a set of many, potentially large, values (e.g. many long strings such as URLs), using a significantly smaller amount of memory.
The set is "lossy" in that it cannot definitively state that is does contain a value but it can definitively say if a value is not in the set. It can therefore be used as a Bloom Filter.
Another application of the set is that it can be used to perform fuzzy counting because it can estimate reasonably accurately how many unique values are contained in the set.
This class is NOT threadsafe.
Internally a Bitset is used to record values and once a client has finished recording a stream of values the Downsize(Single) method can be used to create a suitably smaller set that is sized appropriately for the number of values recorded and desired saturation levels.
HashFunction
Base class for hashing functions that can be referred to by name.
Subclasses are expected to provide threadsafe implementations of the hash function
on the range of bytes referenced in the provided Bytes
MurmurHash2
This is a very fast, non-cryptographic hash suitable for general hash-based lookup. See http://murmurhash.googlepages.com/ for more details.
The C version of MurmurHash 2.0 found at that site was ported to Java by Andrzej Bialecki (ab at getopt org).
The code from getopt.org was adapted by Mark Harwood in the form here as one of a pluggable choice of
hashing functions as the core function had to be adapted to work with Bytes