Class FuzzySet
  
  A class used to represent a set of many, potentially large, values (e.g. many
long strings such as URLs), using a significantly smaller amount of memory.
The set is "lossy" in that it cannot definitively state that is does contain
a value but it can definitively say if a value is not in
the set. It can therefore be used as a Bloom Filter.
Another application of the set is that it can be used to perform fuzzy counting because
it can estimate reasonably accurately how many unique values are contained in the set. 
This class is NOT threadsafe.
Internally a Bitset is used to record values and once a client has finished recording
a stream of values the Downsize(Single) method can be used to create a suitably smaller set that
is sized appropriately for the number of values recorded and desired saturation levels. 
This is a Lucene.NET EXPERIMENTAL API, use at your own risk
 
  
  
    Inheritance
    System.Object
    FuzzySet
   
  
    Inherited Members
    
      System.Object.Equals(System.Object)
    
    
      System.Object.Equals(System.Object, System.Object)
    
    
      System.Object.GetHashCode()
    
    
      System.Object.GetType()
    
    
      System.Object.MemberwiseClone()
    
    
      System.Object.ReferenceEquals(System.Object, System.Object)
    
    
      System.Object.ToString()
    
   
  
  Assembly: Lucene.Net.Codecs.dll
  Syntax
  
  Fields
  
  
    |
    Improve this Doc
  
  
    View Source
  
  VERSION_CURRENT
  
  
  Declaration
  
    public static readonly int VERSION_CURRENT
   
  Field Value
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  VERSION_SPI
  
  
  Declaration
  
    public static readonly int VERSION_SPI
   
  Field Value
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  VERSION_START
  
  
  Declaration
  
    public static readonly int VERSION_START
   
  Field Value
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
         | 
      
    
  
  Methods
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  AddValue(BytesRef)
  Records a value in the set. The referenced bytes are hashed and then modulo n'd where n is the
chosen size of the internal bitset.
 
  
  Declaration
  
    public virtual void AddValue(BytesRef value)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | BytesRef | 
        value | 
        The Key value to be hashed. 
 | 
      
    
  
  Exceptions
  
    
      
        | Type | 
        Condition | 
      
    
    
      
        | System.IO.IOException | 
        If there is a low-level I/O error. 
 | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  Contains(BytesRef)
  The main method required for a Bloom filter which, given a value determines set membership.
Unlike a conventional set, the fuzzy set returns NO or 
MAYBE rather than true or false.
 
  
  Declaration
  
    public virtual FuzzySet.ContainsResult Contains(BytesRef value)
   
  Parameters
  
  Returns
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  CreateSetBasedOnMaxMemory(Int32)
  
  
  Declaration
  
    public static FuzzySet CreateSetBasedOnMaxMemory(int maxNumBytes)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Int32 | 
        maxNumBytes | 
         | 
      
    
  
  Returns
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  CreateSetBasedOnQuality(Int32, Single)
  
  
  Declaration
  
    public static FuzzySet CreateSetBasedOnQuality(int maxNumUniqueValues, float desiredMaxSaturation)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Int32 | 
        maxNumUniqueValues | 
         | 
      
      
        | System.Single | 
        desiredMaxSaturation | 
         | 
      
    
  
  Returns
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  
  
  
  Declaration
  
    public static FuzzySet Deserialize(DataInput input)
   
  Parameters
  
  Returns
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  Downsize(Single)
  
  
  Declaration
  
    public virtual FuzzySet Downsize(float targetMaxSaturation)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Single | 
        targetMaxSaturation | 
        A number between 0 and 1 describing the % of bits that would ideally be set in the result. 
Lower values have better accuracy but require more space. 
 | 
      
    
  
  Returns
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  GetEstimatedNumberUniqueValuesAllowingForCollisions(Int32, Int32)
  Given a setSize and a the number of set bits, produces an estimate of the number of unique values recorded.
 
  
  Declaration
  
    public static int GetEstimatedNumberUniqueValuesAllowingForCollisions(int setSize, int numRecordedBits)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Int32 | 
        setSize | 
         | 
      
      
        | System.Int32 | 
        numRecordedBits | 
         | 
      
    
  
  Returns
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  GetEstimatedUniqueValues()
  
  
  Declaration
  
    public virtual int GetEstimatedUniqueValues()
   
  Returns
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  GetNearestSetSize(Int32)
  Rounds down required maxNumberOfBits to the nearest number that is made up
of all ones as a binary number.
Use this method where controlling memory use is paramount.
 
  
  Declaration
  
    public static int GetNearestSetSize(int maxNumberOfBits)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Int32 | 
        maxNumberOfBits | 
         | 
      
    
  
  Returns
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  GetNearestSetSize(Int32, Single)
  Use this method to choose a set size where accuracy (low content saturation) is more important
than deciding how much memory to throw at the problem.
 
  
  Declaration
  
    public static int GetNearestSetSize(int maxNumberOfValuesExpected, float desiredSaturation)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Int32 | 
        maxNumberOfValuesExpected | 
         | 
      
      
        | System.Single | 
        desiredSaturation | 
        A number between 0 and 1 expressing the % of bits set once all values have been recorded. 
 | 
      
    
  
  Returns
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int32 | 
        The size of the set nearest to the required size. 
 | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  GetSaturation()
  
  
  Declaration
  
    public virtual float GetSaturation()
   
  Returns
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Single | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  HashFunctionForVersion(Int32)
  
  
  Declaration
  
    public static HashFunction HashFunctionForVersion(int version)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | System.Int32 | 
        version | 
         | 
      
    
  
  Returns
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  RamBytesUsed()
  
  
  Declaration
  
    public virtual long RamBytesUsed()
   
  Returns
  
    
      
        | Type | 
        Description | 
      
    
    
      
        | System.Int64 | 
         | 
      
    
  
  
    |
    Improve this Doc
  
  
    View Source
  
  
  Serialize(DataOutput)
  Serializes the data set to file using the following format:
 
  
  Declaration
  
    public virtual void Serialize(DataOutput output)
   
  Parameters
  
    
      
        | Type | 
        Name | 
        Description | 
      
    
    
      
        | DataOutput | 
        output | 
        Data output stream. 
 | 
      
    
  
  Exceptions
  
    
      
        | Type | 
        Condition | 
      
    
    
      
        | System.IO.IOException | 
        If there is a low-level I/O error. 
 |