Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Data.BloomFilter.Easy
Description
An easy-to-use Bloom filter interface.
Synopsis
- type Bloom = Bloom' CheapHashes
- easyList :: Hashable a => Double -> [a] -> Bloom a
- elem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool
- notElem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool
- length :: Bloom' h a -> Word64
- type MBloom s = MBloom' s CheapHashes
- easyNew :: Double -> Int -> ST s (MBloom s a)
- new :: Int -> Word64 -> ST s (MBloom' s h a)
- insert :: (Hashes h, Hashable a) => MBloom' s h a -> a -> ST s ()
- freeze :: MBloom' s h a -> ST s (Bloom' h a)
- safeSuggestSizing :: Int -> Double -> Either String (Word64, Int)
- suggestSizing :: Int -> Double -> (Word64, Int)
Easy creation and querying
type Bloom = Bloom' CheapHashes Source #
Bloom filter using CheapHashes
hashing scheme.
Arguments
:: Hashable a | |
=> Double | desired false positive rate (0 < ε < 1) |
-> [a] | values to populate with |
-> Bloom a |
Create a Bloom filter with the desired false positive rate and
members. The hash functions used are computed by the cheapHashes
function from the Hash
module.
elem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #
Query an immutable Bloom filter for membership. If the value is
present, return True
. If the value is not present, there is
still some possibility that True
will be returned.
notElem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #
Query an immutable Bloom filter for non-membership. If the value
is present, return False
. If the value is not present, there
is still some possibility that False
will be returned.
Mutable bloom filter
type MBloom s = MBloom' s CheapHashes Source #
Mutable Bloom filter using CheapHashes hashing scheme.
Arguments
:: Double | desired false positive rate (0 < ε < 1) |
-> Int | expected maximum size, n |
-> ST s (MBloom s a) |
Create a Bloom filter with the desired false positive rate, ε and expected maximum size, n.
Arguments
:: Int | number of hash functions to use |
-> Word64 | number of bits in filter |
-> ST s (MBloom' s h a) |
Create a new mutable Bloom filter.
The size is ceiled at $2^48$. Tell us if you need bigger bloom filters.
insert :: (Hashes h, Hashable a) => MBloom' s h a -> a -> ST s () Source #
Insert a value into a mutable Bloom filter. Afterwards, a
membership query for the same value is guaranteed to return True
.
freeze :: MBloom' s h a -> ST s (Bloom' h a) Source #
Create an immutable Bloom filter from a mutable one. The mutable filter may be modified afterwards.
Example: a spell checker
This example reads a dictionary file containing one word per line,
constructs a Bloom filter with a 1% false positive rate, and
spellchecks its standard input. Like the Unix spell
command, it
prints each word that it does not recognize.
import Data.Maybe (mapMaybe) import qualified Data.BloomFilter.Easy as B main = do filt <- B.easyList
0.01 . words <$> readFile "/usr/share/dict/words" let check word | B.elem
word filt = Nothing | otherwise = Just word interact (unlines . mapMaybe check . lines)
Useful defaults for creation
Arguments
:: Int | expected maximum capacity |
-> Double | desired false positive rate (0 < e < 1) |
-> Either String (Word64, Int) |
Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.
The false positive rate is the rate at which queries against the
filter should return True
when an element is not actually
present. It should be a fraction between 0 and 1, so a 1% false
positive rate is represented by 0.01.
This function will suggest to use a bloom filter of prime size. These theoretically behave the best. Also it won't suggest to use over 63 hash functions, because CheapHashes work only up to 63 functions.
Note that while creating bloom filters with extremely small (or even negative) capacity is allowed for convenience, it is often not very useful. This function will always suggest to use at least 61 bits.
>>>
safeSuggestSizing 10000 0.01
Right (99317,7)