lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

Data.BloomFilter.Easy

Description

An easy-to-use Bloom filter interface.

Synopsis

Easy creation and querying

type Bloom = Bloom' CheapHashes Source #

Bloom filter using CheapHashes hashing scheme.

easyList Source #

Arguments

:: Hashable a 
=> Double

desired false positive rate (0 < ε < 1)

-> [a]

values to populate with

-> Bloom a 

Create a Bloom filter with the desired false positive rate and members. The hash functions used are computed by the cheapHashes function from the Hash module.

elem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #

Query an immutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

notElem :: (Hashes h, Hashable a) => a -> Bloom' h a -> Bool Source #

Query an immutable Bloom filter for non-membership. If the value is present, return False. If the value is not present, there is still some possibility that False will be returned.

length :: Bloom' h a -> Word64 Source #

Return the size of an immutable Bloom filter, in bits.

Mutable bloom filter

type MBloom s = MBloom' s CheapHashes Source #

Mutable Bloom filter using CheapHashes hashing scheme.

easyNew Source #

Arguments

:: Double

desired false positive rate (0 < ε < 1)

-> Int

expected maximum size, n

-> ST s (MBloom s a) 

Create a Bloom filter with the desired false positive rate, ε and expected maximum size, n.

new Source #

Arguments

:: Int

number of hash functions to use

-> Word64

number of bits in filter

-> ST s (MBloom' s h a) 

Create a new mutable Bloom filter.

The size is ceiled at $2^48$. Tell us if you need bigger bloom filters.

insert :: (Hashes h, Hashable a) => MBloom' s h a -> a -> ST s () Source #

Insert a value into a mutable Bloom filter. Afterwards, a membership query for the same value is guaranteed to return True.

freeze :: MBloom' s h a -> ST s (Bloom' h a) Source #

Create an immutable Bloom filter from a mutable one. The mutable filter may be modified afterwards.

Example: a spell checker

This example reads a dictionary file containing one word per line, constructs a Bloom filter with a 1% false positive rate, and spellchecks its standard input. Like the Unix spell command, it prints each word that it does not recognize.

import Data.Maybe (mapMaybe)
import qualified Data.BloomFilter.Easy as B

main = do
  filt <- B.easyList 0.01 . words <$> readFile "/usr/share/dict/words"
  let check word | B.elem word filt  = Nothing
                 | otherwise         = Just word
  interact (unlines . mapMaybe check . lines)

Useful defaults for creation

safeSuggestSizing Source #

Arguments

:: Int

expected maximum capacity

-> Double

desired false positive rate (0 < e < 1)

-> Either String (Word64, Int) 

Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.

The false positive rate is the rate at which queries against the filter should return True when an element is not actually present. It should be a fraction between 0 and 1, so a 1% false positive rate is represented by 0.01.

This function will suggest to use a bloom filter of prime size. These theoretically behave the best. Also it won't suggest to use over 63 hash functions, because CheapHashes work only up to 63 functions.

Note that while creating bloom filters with extremely small (or even negative) capacity is allowed for convenience, it is often not very useful. This function will always suggest to use at least 61 bits.

>>> safeSuggestSizing 10000 0.01
Right (99317,7)

suggestSizing Source #

Arguments

:: Int

expected maximum capacity

-> Double

desired false positive rate (0 < e < 1)

-> (Word64, Int) 

Behaves as safeSuggestSizing, but calls error if given invalid or out-of-range inputs.