lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

Data.BloomFilter

Description

A fast, space efficient Bloom filter implementation. A Bloom filter is a set-like data structure that provides a probabilistic membership test.

  • Queries do not give false negatives. When an element is added to a filter, a subsequent membership test will definitely return True.
  • False positives are possible. If an element has not been added to a filter, a membership test may nevertheless indicate that the element is present.

This module provides low-level control. For an easier to use interface, see the Data.BloomFilter.Easy module.

Synopsis

Overview

Each of the functions for creating Bloom filters accepts two parameters:

  • The number of bits that should be used for the filter. Note that a filter is fixed in size; it cannot be resized after creation.
  • A number of hash functions, k, to be used for the filter.

By choosing these parameters with care, it is possible to tune for a particular false positive rate. The suggestSizing function in the Data.BloomFilter.Easy module calculates useful estimates for these parameters.

Ease of use

This module provides immutable interfaces for working with a query-only Bloom filter, and for converting to and from mutable Bloom filters.

For a higher-level interface that is easy to use, see the Data.BloomFilter.Easy module.

Performance

The implementation has been carefully tuned for high performance and low space consumption.

Differences from bloomfilter package

This package is (almost entirely rewritten) fork of bloomfilter package.

The main differences are

  • This packages support bloomfilters of arbitrary sizes (not limited to powers of two). Also sizes over 2^32 are supported.
  • The Bloom and MBloom types are parametrised over Hashes variable, instead of having a a -> [Hash] typed field. This separation allows clean de/serialization of Bloom filters in this package, as the hashing scheme is a static.
  • XXH3 hash is used instead of Jenkins' lookup3.

Types

type Hash = Word64 Source #

A hash value is 64 bits wide.

type Bloom = Bloom' CheapHashes Source #

Bloom filter using CheapHashes hashing scheme.

type MBloom s = MBloom' s CheapHashes Source #

Mutable Bloom filter using CheapHashes hashing scheme.

data Bloom' h a Source #

Instances

Instances details
Show (Bloom' h a) Source # 
Instance details

Defined in Data.BloomFilter.Internal

Methods

showsPrec :: Int -> Bloom' h a -> ShowS #

show :: Bloom' h a -> String #

showList :: [Bloom' h a] -> ShowS #

NFData (Bloom' h a) Source # 
Instance details

Defined in Data.BloomFilter.Internal

Methods

rnf :: Bloom' h a -> () #

Eq (Bloom' h a) Source # 
Instance details

Defined in Data.BloomFilter.Internal

Methods

(==) :: Bloom' h a -> Bloom' h a -> Bool #

(/=) :: Bloom' h a -> Bloom' h a -> Bool #

data MBloom' s h a Source #

A mutable Bloom filter, for use within the ST monad.

Instances

Instances details
Show (MBloom' s h a) Source # 
Instance details

Defined in Data.BloomFilter.Mutable.Internal

Methods

showsPrec :: Int -> MBloom' s h a -> ShowS #

show :: MBloom' s h a -> String #

showList :: [MBloom' s h a] -> ShowS #

data RealHashes a Source #

A closure of real hashing function.

Instances

Instances details
Hashes (RealHashes :: Type -> Type) Source # 
Instance details

Defined in Data.BloomFilter.Hash

Immutable Bloom filters

Conversion

freeze :: MBloom' s h a -> ST s (Bloom' h a) Source #

Create an immutable Bloom filter from a mutable one. The mutable filter may be modified afterwards.

thaw :: Bloom' h a -> ST s (MBloom' s h a) Source #

Copy an immutable Bloom filter to create a mutable one. There is no non-copying equivalent.

unsafeFreeze :: MBloom' s h a -> ST s (Bloom' h a) Source #

Create an immutable Bloom filter from a mutable one. The mutable filter must not be modified afterwards, or a runtime crash may occur. For a safer creation interface, use freeze or create.

Creation

unfold Source #

Arguments

:: forall a b h. (Hashes h, Hashable a) 
=> Int

number of hash functions to use

-> Word64

number of bits in filter

-> (b -> Maybe (a, b))

seeding function

-> b

initial seed

-> Bloom' h a 

Build an immutable Bloom filter from a seed value. The seeding function populates the filter as follows.

  • If it returns Nothing, it is finished producing values to insert into the filter.
  • If it returns Just (a,b), a is added to the filter and b is used as a new seed.

fromList Source #

Arguments

:: (Hashes h, Hashable a) 
=> Int

number of hash functions to use

-> Word64

number of bits in filter

-> [a]

values to populate with

-> Bloom' h a 

Create an immutable Bloom filter, populating it from a list of values.

Here is an example that uses the cheapHashes function from the Data.BloomFilter.Hash module to create a hash function that returns three hashes.

filt = fromList 3 1024 ["foo", "bar", "quux"]

empty Source #

Arguments

:: Int

number of hash functions to use

-> Word64

number of bits in filter

-> Bloom' h a 

Create an empty Bloom filter.

singleton Source #

Arguments

:: (Hashes h, Hashable a) 
=> Int

number of hash functions to use

-> Word64

number of bits in filter

-> a

element to insert

-> Bloom' h a 

Create a Bloom filter with a single element.

Accessors

length :: Bloom' h a -> Word64 Source #

Return the size of an immutable Bloom filter, in bits.

elem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #

Query an immutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

elemHashes :: Hashes h => h a -> Bloom' h a -> Bool Source #

Query an immutable Bloom filter for membership using already constructed Hashes value.

notElem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #

Query an immutable Bloom filter for non-membership. If the value is present, return False. If the value is not present, there is still some possibility that False will be returned.