Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Data.BloomFilter
Description
A fast, space efficient Bloom filter implementation. A Bloom filter is a set-like data structure that provides a probabilistic membership test.
- Queries do not give false negatives. When an element is added to
a filter, a subsequent membership test will definitely return
True
. - False positives are possible. If an element has not been added to a filter, a membership test may nevertheless indicate that the element is present.
This module provides low-level control. For an easier to use interface, see the Data.BloomFilter.Easy module.
Synopsis
- type Hash = Word64
- type Bloom = Bloom' CheapHashes
- type MBloom s = MBloom' s CheapHashes
- data Bloom' h a
- data MBloom' s h a
- data CheapHashes a
- data RealHashes a
- freeze :: MBloom' s h a -> ST s (Bloom' h a)
- thaw :: Bloom' h a -> ST s (MBloom' s h a)
- unsafeFreeze :: MBloom' s h a -> ST s (Bloom' h a)
- unfold :: forall a b h. (Hashes h, Hashable a) => Int -> Word64 -> (b -> Maybe (a, b)) -> b -> Bloom' h a
- fromList :: (Hashes h, Hashable a) => Int -> Word64 -> [a] -> Bloom' h a
- empty :: Int -> Word64 -> Bloom' h a
- singleton :: (Hashes h, Hashable a) => Int -> Word64 -> a -> Bloom' h a
- length :: Bloom' h a -> Word64
- elem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool
- elemHashes :: Hashes h => h a -> Bloom' h a -> Bool
- notElem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool
Overview
Each of the functions for creating Bloom filters accepts two parameters:
- The number of bits that should be used for the filter. Note that a filter is fixed in size; it cannot be resized after creation.
- A number of hash functions, k, to be used for the filter.
By choosing these parameters with care, it is possible to tune for
a particular false positive rate.
The suggestSizing
function in
the Data.BloomFilter.Easy module calculates useful estimates for
these parameters.
Ease of use
This module provides immutable interfaces for working with a query-only Bloom filter, and for converting to and from mutable Bloom filters.
For a higher-level interface that is easy to use, see the Data.BloomFilter.Easy module.
Performance
The implementation has been carefully tuned for high performance and low space consumption.
Differences from bloomfilter package
This package is (almost entirely rewritten) fork of bloomfilter package.
The main differences are
- This packages support bloomfilters of arbitrary sizes (not limited to powers of two). Also sizes over 2^32 are supported.
- The
Bloom
andMBloom
types are parametrised overHashes
variable, instead of having aa -> [
typed field. This separation allows clean de/serialization of Bloom filters in this package, as the hashing scheme is a static.Hash
] - XXH3 hash is used instead of Jenkins' lookup3.
Types
type Bloom = Bloom' CheapHashes Source #
Bloom filter using CheapHashes
hashing scheme.
type MBloom s = MBloom' s CheapHashes Source #
Mutable Bloom filter using CheapHashes hashing scheme.
A mutable Bloom filter, for use within the ST
monad.
data CheapHashes a Source #
A pair of hashes used for a double hashing scheme.
See evalCheapHashes
.
Instances
data RealHashes a Source #
A closure of real hashing function.
Instances
Hashes (RealHashes :: Type -> Type) Source # | |
Defined in Data.BloomFilter.Hash Methods makeHashes :: Hashable a => a -> RealHashes a Source # evalHashes :: RealHashes a -> Int -> Hash Source # |
Immutable Bloom filters
Conversion
freeze :: MBloom' s h a -> ST s (Bloom' h a) Source #
Create an immutable Bloom filter from a mutable one. The mutable filter may be modified afterwards.
thaw :: Bloom' h a -> ST s (MBloom' s h a) Source #
Copy an immutable Bloom filter to create a mutable one. There is no non-copying equivalent.
unsafeFreeze :: MBloom' s h a -> ST s (Bloom' h a) Source #
Create an immutable Bloom filter from a mutable one. The mutable
filter must not be modified afterwards, or a runtime crash may
occur. For a safer creation interface, use freeze
or create
.
Creation
Arguments
:: (Hashes h, Hashable a) | |
=> Int | number of hash functions to use |
-> Word64 | number of bits in filter |
-> [a] | values to populate with |
-> Bloom' h a |
Create an immutable Bloom filter, populating it from a list of values.
Here is an example that uses the cheapHashes
function from the
Data.BloomFilter.Hash module to create a hash function that
returns three hashes.
filt = fromList 3 1024 ["foo", "bar", "quux"]
Create an empty Bloom filter.
Arguments
:: (Hashes h, Hashable a) | |
=> Int | number of hash functions to use |
-> Word64 | number of bits in filter |
-> a | element to insert |
-> Bloom' h a |
Create a Bloom filter with a single element.
Accessors
elem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #
Query an immutable Bloom filter for membership. If the value is
present, return True
. If the value is not present, there is
still some possibility that True
will be returned.