Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Data.BloomFilter.Blocked
Description
A fast, space efficient Bloom filter implementation. A Bloom filter is a set-like data structure that provides a probabilistic membership test.
- Queries do not give false negatives. When an element is added to
a filter, a subsequent membership test will definitely return
True
. - False positives are possible. If an element has not been added to a filter, a membership test may nevertheless indicate that the element is present.
Synopsis
- type Hash = Word64
- class Hashable a
- data Bloom a
- create :: BloomSize -> (forall s. MBloom s a -> ST s ()) -> Bloom a
- unfold :: forall a b. Hashable a => BloomSize -> (b -> Maybe (a, b)) -> b -> Bloom a
- fromList :: (Foldable t, Hashable a) => BloomPolicy -> t a -> Bloom a
- formatVersion :: Int
- serialise :: Bloom a -> (BloomSize, ByteArray, Int, Int)
- deserialise :: PrimMonad m => BloomSize -> (MutableByteArray (PrimState m) -> Int -> Int -> m ()) -> m (Bloom a)
- type NumEntries = Int
- data BloomSize = BloomSize {
- sizeBits :: !Int
- sizeHashes :: !Int
- type FPR = Double
- sizeForFPR :: FPR -> NumEntries -> BloomSize
- type BitsPerEntry = Double
- sizeForBits :: BitsPerEntry -> NumEntries -> BloomSize
- sizeForPolicy :: BloomPolicy -> NumEntries -> BloomSize
- data BloomPolicy = BloomPolicy {
- policyBits :: !Double
- policyHashes :: !Int
- policyFPR :: BloomPolicy -> FPR
- policyForFPR :: FPR -> BloomPolicy
- policyForBits :: BitsPerEntry -> BloomPolicy
- size :: Bloom a -> BloomSize
- elem :: Hashable a => a -> Bloom a -> Bool
- notElem :: Hashable a => a -> Bloom a -> Bool
- (?) :: Hashable a => Bloom a -> a -> Bool
- data MBloom s a
- new :: BloomSize -> ST s (MBloom s a)
- maxSizeBits :: Int
- insert :: Hashable a => MBloom s a -> a -> ST s ()
- insertMany :: forall a s. Hashable a => MBloom s a -> (Int -> ST s a) -> Int -> ST s ()
- freeze :: MBloom s a -> ST s (Bloom a)
- thaw :: Bloom a -> ST s (MBloom s a)
- unsafeFreeze :: MBloom s a -> ST s (Bloom a)
- data Hashes a
- hashes :: Hashable a => a -> Hashes a
- insertHashes :: forall s a. MBloom s a -> Hashes a -> ST s ()
- elemHashes :: Bloom a -> Hashes a -> Bool
- prefetchInsert :: MBloom s a -> Hashes a -> ST s ()
- prefetchElem :: Bloom a -> Hashes a -> ST s ()
Types
The class of types that can be converted to a hash value.
The instances are meant to be stable, the hash values can be persisted.
Minimal complete definition
Instances
Hashable ByteArray Source # | |
Defined in Data.BloomFilter.Hash | |
Hashable Word32 Source # | |
Defined in Data.BloomFilter.Hash | |
Hashable Word64 Source # | |
Defined in Data.BloomFilter.Hash | |
Hashable ByteString Source # | |
Defined in Data.BloomFilter.Hash Methods hashSalt64 :: Word64 -> ByteString -> Word64 Source # | |
Hashable ByteString Source # | |
Defined in Data.BloomFilter.Hash Methods hashSalt64 :: Word64 -> ByteString -> Word64 Source # | |
Hashable () Source # | |
Defined in Data.BloomFilter.Hash Methods hashSalt64 :: Word64 -> () -> Word64 Source # | |
Hashable Char Source # | |
Defined in Data.BloomFilter.Hash | |
Hashable Int Source # | |
Defined in Data.BloomFilter.Hash | |
Hashable Word Source # | |
Defined in Data.BloomFilter.Hash | |
Hashable a => Hashable [a] Source # | |
Defined in Data.BloomFilter.Hash Methods hashSalt64 :: Word64 -> [a] -> Word64 Source # | |
(Hashable a, Hashable b) => Hashable (a, b) Source # | |
Defined in Data.BloomFilter.Hash Methods hashSalt64 :: Word64 -> (a, b) -> Word64 Source # |
Immutable Bloom filters
An immutable Bloom filter.
Creation
Create an immutable Bloom filter, using the given setup function
which executes in the ST
monad.
Example:
filter = create (sizeForBits 16 2) $ mf -> do insert mf "foo" insert mf "bar"
Note that the result of the setup function is not used.
Arguments
:: (Foldable t, Hashable a) | |
=> BloomPolicy | |
-> t a | values to populate with |
-> Bloom a |
Create a Bloom filter, populating it from a sequence of values.
For example
filt = fromList (policyForBits 10) ["foo", "bar", "quux"]
(De)Serialisation
formatVersion :: Int Source #
The version of the format used by serialise
and deserialise
. The
format number will change when there is an incompatible change in the
library, such that deserialising and using the filter will not work.
This can include more than just changes to the serialised format, for
example changes to hash functions or how the hash is mapped to bits.
Note that the format produced does not include this version. Version checking is the responsibility of the user of the library.
The library guarantes that the format version value for the classic (Data.BloomFilter.Classic) and blocked (Data.BloomFilter.Blocked) implementation will not overlap with each other or any previous value used by either implementation. So switching between the two implementations will always be detectable and unambigious.
History:
- Version 1000: original blocked implementation
serialise :: Bloom a -> (BloomSize, ByteArray, Int, Int) Source #
Serialise the bloom filter to a BloomSize
(which is needed to
deserialise) and a ByteArray
along with the offset and length containing
the filter's bit data.
See also formatVersion
for compatibility advice.
deserialise :: PrimMonad m => BloomSize -> (MutableByteArray (PrimState m) -> Int -> Int -> m ()) -> m (Bloom a) Source #
Sizes
type NumEntries = Int Source #
Parameters for constructing a Bloom filter.
Constructors
BloomSize | |
Fields
|
sizeForFPR :: FPR -> NumEntries -> BloomSize Source #
type BitsPerEntry = Double Source #
sizeForBits :: BitsPerEntry -> NumEntries -> BloomSize Source #
sizeForPolicy :: BloomPolicy -> NumEntries -> BloomSize Source #
data BloomPolicy Source #
A policy on intended bloom filter size -- independent of the number of elements.
We can decide a policy based on:
- a target false positive rate (FPR) using
policyForFPR
- a number of bits per entry using
policyForBits
A policy can be turned into a BloomSize
given a target NumEntries
using
sizeForPolicy
.
Either way we define the policy, we can inspect the result to see:
- The bits per entry
policyBits
. This will determine the size of the bloom filter in bits. In general the bits per entry can be fractional. The final bloom filter size in will be rounded to a whole number of bits. - The number of hashes
policyHashes
. - The expected FPR for the policy using
policyFPR
.
Constructors
BloomPolicy | |
Fields
|
Instances
Show BloomPolicy Source # | |
Defined in Data.BloomFilter.Classic.Calc Methods showsPrec :: Int -> BloomPolicy -> ShowS # show :: BloomPolicy -> String # showList :: [BloomPolicy] -> ShowS # |
policyFPR :: BloomPolicy -> FPR Source #
policyForFPR :: FPR -> BloomPolicy Source #
Accessors
elem :: Hashable a => a -> Bloom a -> Bool Source #
Query an immutable Bloom filter for membership. If the value is
present, return True
. If the value is not present, there is
still some possibility that True
will be returned.
notElem :: Hashable a => a -> Bloom a -> Bool Source #
Query an immutable Bloom filter for non-membership. If the value
is present, return False
. If the value is not present, there
is still some possibility that False
will be returned.
Mutable Bloom filters
A mutable Bloom filter, for use within the ST
monad.
Instances
new :: BloomSize -> ST s (MBloom s a) Source #
Create a new mutable Bloom filter.
The filter size is capped at maxSizeBits
.
maxSizeBits :: Int Source #
insert :: Hashable a => MBloom s a -> a -> ST s () Source #
Insert a value into a mutable Bloom filter. Afterwards, a
membership query for the same value is guaranteed to return True
.
Arguments
:: forall a s. Hashable a | |
=> MBloom s a | |
-> (Int -> ST s a) | Action to lookup elements, indexed |
-> Int |
|
-> ST s () |
A bulk insert of many elements.
This is somewhat faster than repeated insertion using insert
. It uses
memory prefetching to improve the utilisation of memory bandwidth. This has
greatest benefit for large filters (that do not fit in L3 cache) and for
inserting many elements, e.g. > 10.
To get best performance, you probably want to specialise this function to
the Hashable
instance and to the lookup action. It is marked INLINABLE
to help with this.
Conversion
freeze :: MBloom s a -> ST s (Bloom a) Source #
Create an immutable Bloom filter from a mutable one. The mutable filter may be modified afterwards.
thaw :: Bloom a -> ST s (MBloom s a) Source #
Copy an immutable Bloom filter to create a mutable one. There is no non-copying equivalent.
unsafeFreeze :: MBloom s a -> ST s (Bloom a) Source #
Create an immutable Bloom filter from a mutable one without copying. The
mutable filter must not be modified afterwards. For a safer creation
interface, use freeze
or create
.
Low level variants
A small family of hashes, for probing bits in a (blocked) bloom filter.
Instances
Show (Hashes a) Source # | |
Prim (Hashes a) Source # | |
Defined in Data.BloomFilter.Blocked.Internal Methods sizeOfType# :: Proxy (Hashes a) -> Int# Source # sizeOf# :: Hashes a -> Int# Source # alignmentOfType# :: Proxy (Hashes a) -> Int# Source # alignment# :: Hashes a -> Int# Source # indexByteArray# :: ByteArray# -> Int# -> Hashes a Source # readByteArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Hashes a #) Source # writeByteArray# :: MutableByteArray# s -> Int# -> Hashes a -> State# s -> State# s Source # setByteArray# :: MutableByteArray# s -> Int# -> Int# -> Hashes a -> State# s -> State# s Source # indexOffAddr# :: Addr# -> Int# -> Hashes a Source # readOffAddr# :: Addr# -> Int# -> State# s -> (# State# s, Hashes a #) Source # writeOffAddr# :: Addr# -> Int# -> Hashes a -> State# s -> State# s Source # setOffAddr# :: Addr# -> Int# -> Int# -> Hashes a -> State# s -> State# s Source # |
elemHashes :: Bloom a -> Hashes a -> Bool Source #
Query an immutable Bloom filter for membership using already constructed
Hash
value.