Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Data.BloomFilter
Description
By default, this module re-exports the classic bloom filter implementation from Data.BloomFilter.Classic. If you want to use the blocked bloom filter implementation, import Data.BloomFilter.Blocked.
Synopsis
- module Data.BloomFilter.Classic
Documentation
module Data.BloomFilter.Classic
Example: a spelling checker
This example reads a dictionary file containing one word per line,
constructs a Bloom filter with a 1% false positive rate, and
spellchecks its standard input. Like the Unix spell
command, it
prints each word that it does not recognize.
>>>
import Control.Monad (forM_)
>>>
import System.Environment (getArgs)
>>>
import qualified Data.BloomFilter as B
>>>
:{
main :: IO () main = do files <- getArgs dictionary <- readFile "/usr/share/dict/words" let !bloom = B.fromList (B.policyForFPR 0.01) 4 (words dictionary) forM_ files $ \file -> putStrLn . unlines . filter (`B.notElem` bloom) . words =<< readFile file :}
Differences with the bloomfilter
package
This package is an entirely rewritten fork of the bloomfilter package.
The main differences are
- Support for both classic and "blocked" Bloom filters. Blocked-structured
Bloom filters arrange all the bits for each insert or lookup into a single
cache line, which greatly reduces the number of slow uncached memory reads.
The trade-off for this performance optimisation is a slightly worse
trade-off between bits per element and the FPR. In practice for typical
FPRs of
1-e3
up to1e-4
, this requires a couple extra bits per element. - This package support Bloom filters of arbitrary sizes (not limited to powers of two).
- Sizes over
2^32
are supported up to2^48
for classic Bloom filters and2^41
for blocked Bloom filters. - The
Bloom
andMBloom
types are parametrised over aHashable
type class, instead of having aa -> [
typed field. This separation allows clean (de-)serialisation of Bloom filters in this package, as the hashing scheme is static.Hash
] XXH3
hash is used instead of Jenkins'lookup3
.