Safe Haskell	Safe-Inferred
Language	GHC2021

Database.LSMTree.Internal.BloomFilterQuery2

Description

An implementation of batched bloom filter query, optimised for memory prefetch.

Synopsis

bloomQueries :: Vector (Bloom SerialisedKey) -> Vector SerialisedKey -> Vector RunIxKeyIx
data RunIxKeyIx where
- pattern RunIxKeyIx :: RunIx -> KeyIx -> RunIxKeyIx
type RunIx = Int
type KeyIx = Int
data CandidateProbe = MkCandidateProbe !Word64 !Word64

Documentation

bloomQueries :: Vector (Bloom SerialisedKey) -> Vector SerialisedKey -> Vector RunIxKeyIx Source #

data RunIxKeyIx where Source #

A RunIxKeyIx is a (compact) pair of a RunIx and a KeyIx.

We represent it as a 32bit word, using:

16 bits for the run/filter index (MSB)
16 bits for the key index (LSB)

Bundled Patterns

pattern RunIxKeyIx :: RunIx -> KeyIx -> RunIxKeyIx

Instances

Instances details

Show RunIxKeyIx Source #
Instance details Defined in Database.LSMTree.Internal.BloomFilterQuery1 Methods showsPrec :: Int -> RunIxKeyIx -> ShowS # show :: RunIxKeyIx -> String # showList :: [RunIxKeyIx] -> ShowS #
Eq RunIxKeyIx Source #
Instance details Defined in Database.LSMTree.Internal.BloomFilterQuery1 Methods (==) :: RunIxKeyIx -> RunIxKeyIx -> Bool # (/=) :: RunIxKeyIx -> RunIxKeyIx -> Bool #
Prim RunIxKeyIx Source #
Instance details Defined in Database.LSMTree.Internal.BloomFilterQuery1 Methods sizeOfType# :: Proxy RunIxKeyIx -> Int# Source # sizeOf# :: RunIxKeyIx -> Int# Source # alignmentOfType# :: Proxy RunIxKeyIx -> Int# Source # alignment# :: RunIxKeyIx -> Int# Source # indexByteArray# :: ByteArray# -> Int# -> RunIxKeyIx Source # readByteArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, RunIxKeyIx #) Source # writeByteArray# :: MutableByteArray# s -> Int# -> RunIxKeyIx -> State# s -> State# s Source # setByteArray# :: MutableByteArray# s -> Int# -> Int# -> RunIxKeyIx -> State# s -> State# s Source # indexOffAddr# :: Addr# -> Int# -> RunIxKeyIx Source # readOffAddr# :: Addr# -> Int# -> State# s -> (# State# s, RunIxKeyIx #) Source # writeOffAddr# :: Addr# -> Int# -> RunIxKeyIx -> State# s -> State# s Source # setOffAddr# :: Addr# -> Int# -> Int# -> RunIxKeyIx -> State# s -> State# s Source #

type RunIx = Int Source #

type KeyIx = Int Source #

Key algorithm concepts

There is almost no opportunity for memory prefetching when looking up a single key in a single Bloom filter. So this is a bulk algorithm, to do a lot of work all in one go, which does create the opportunity to prefetch memory. It also provides an opportunity to share key hashes across filters.

We have as inputs N bloom filters and M keys to look up in them. So overall there are N * M independent bloom filter tests. The result is expected to be sparse, with roughly M*(1+FPR) positive results. So we use a sparse representation for the result: pairs of indexes identifying input Bloom filters and input keys with a positive test result.

We aim for the number of memory operations in-flight simultaneously to be on the order of 32. This trades-off memory parallelism with cache use. In particular this means the algorithm must be able to use a fixed prefetch "distance" rather than this being dependent on the input sizes. To achieve this, we use a fixed size circular buffer (queue). The buffer size can be tuned to optimise the prefetch behaviour: indeed we pick exactly 32.

Micro optimisation

We use primitive arrays and arrays -- rather than vectors -- so as to minimise the number of function variables, to be able to keep most things in registers. Normal vectors use two additional variables, which triples register pressure.

We use an Array of Strict BloomFilter. This avoids the test for WHNF in the inner loop, which causes all registered to be spilled to and restored from the stack.

data CandidateProbe Source #

A candidate probe point is the combination of a filter/run index, key index and hash number. This combination determines the probe location in the bloom filter, which is also cached.

We store these in PrimArrays as a pair of 64bit words:

Low 64bit word:
16 bits padding (always 0s) (MSB)
16 bits for the hash number
16 bits for the filter index
16 bits for the key index (LSB)
High 64bit word: FilterBitIx

Constructors

MkCandidateProbe !Word64 !Word64

Instances

Instances details

Show CandidateProbe Source #
Instance details Defined in Database.LSMTree.Internal.BloomFilterQuery2 Methods showsPrec :: Int -> CandidateProbe -> ShowS # show :: CandidateProbe -> String # showList :: [CandidateProbe] -> ShowS #
Prim CandidateProbe Source #
Instance details Defined in Database.LSMTree.Internal.BloomFilterQuery2 Methods sizeOfType# :: Proxy CandidateProbe -> Int# Source # sizeOf# :: CandidateProbe -> Int# Source # alignmentOfType# :: Proxy CandidateProbe -> Int# Source # alignment# :: CandidateProbe -> Int# Source # indexByteArray# :: ByteArray# -> Int# -> CandidateProbe Source # readByteArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, CandidateProbe #) Source # writeByteArray# :: MutableByteArray# s -> Int# -> CandidateProbe -> State# s -> State# s Source # setByteArray# :: MutableByteArray# s -> Int# -> Int# -> CandidateProbe -> State# s -> State# s Source # indexOffAddr# :: Addr# -> Int# -> CandidateProbe Source # readOffAddr# :: Addr# -> Int# -> State# s -> (# State# s, CandidateProbe #) Source # writeOffAddr# :: Addr# -> Int# -> CandidateProbe -> State# s -> State# s Source # setOffAddr# :: Addr# -> Int# -> Int# -> CandidateProbe -> State# s -> State# s Source #