Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Database.LSMTree.Internal.RunAcc
Description
Incremental (in-memory portion of) run consruction
Synopsis
- data RunAcc s = RunAcc {
- mbloom :: !(MBloom s SerialisedKey)
- mindex :: !(IndexAcc s)
- mpageacc :: !(PageAcc s)
- entryCount :: !(PrimVar s Int)
- new :: NumEntries -> RunBloomFilterAlloc -> IndexType -> ST s (RunAcc s)
- unsafeFinalise :: RunAcc s -> ST s (Maybe RawPage, Maybe Chunk, Bloom SerialisedKey, Index, NumEntries)
- addKeyOp :: RunAcc s -> SerialisedKey -> Entry SerialisedValue BlobSpan -> ST s ([RawPage], [RawOverflowPage], [Chunk])
- addSmallKeyOp :: RunAcc s -> SerialisedKey -> Entry SerialisedValue BlobSpan -> ST s (Maybe (RawPage, Maybe Chunk))
- addLargeKeyOp :: RunAcc s -> SerialisedKey -> Entry SerialisedValue BlobSpan -> ST s ([RawPage], [RawOverflowPage], [Chunk])
- addLargeSerialisedKeyOp :: RunAcc s -> SerialisedKey -> RawPage -> [RawOverflowPage] -> ST s ([RawPage], [RawOverflowPage], [Chunk])
- entryWouldFitInPage :: SerialisedKey -> Entry SerialisedValue b -> Bool
- data RunBloomFilterAlloc
- newMBloom :: NumEntries -> RunBloomFilterAlloc -> ST s (MBloom s a)
- numHashFunctions :: Integer -> Integer -> Integer
- falsePositiveRate :: Floating a => a -> a -> a
Documentation
The run accumulator is a mutable structure that accumulates key/op pairs. It yields pages and chunks of the index incrementally, and returns the Bloom filter and complete index at the end.
Use new
to start run construction, add new key/operation pairs to the run
by using addKeyOp
and co, and complete run construction using
unsafeFinalise
.
new :: NumEntries -> RunBloomFilterAlloc -> IndexType -> ST s (RunAcc s) Source #
starts an incremental run construction.new
nentries
nentries
should be an upper bound on the expected number of entries in the
output run.
unsafeFinalise :: RunAcc s -> ST s (Maybe RawPage, Maybe Chunk, Bloom SerialisedKey, Index, NumEntries) Source #
Finalise an incremental run construction. Do not use a RunAcc
after
finalising it.
The frozen bloom filter and compact index will be returned, along with the final page of the run (if necessary), and the remaining chunks of the incrementally constructed compact index.
Adding key/op pairs
There are a few variants of actions to add key/op pairs to the run accumulator. Which one to use depends on a couple questions:
- Is it fully in memory or is it pre-serialised and only partly in memory?
- Is the key/op pair known to be "small" or "large"?
If it's in memory but it's not known whether it's small or large then
use addKeyOp
. One can use entryWouldFitInPage
to find out if it's
small or large. If it's in memory and known to be small or large then
use addSmallKeyOp
or addLargeKeyOp
as appropriate. If it's large
and pre-serialised, use addLargeSerialisedKeyOp
but note its
constraints carefully.
Arguments
:: RunAcc s | |
-> SerialisedKey | |
-> Entry SerialisedValue BlobSpan | the full value, not just a prefix |
-> ST s ([RawPage], [RawOverflowPage], [Chunk]) |
Add a key/op pair with an optional blob span to the run accumulator.
Note that this version expects the full value to be in the givenEntry
, not
just a prefix of the value that fits into a single page.
If the key/op pair is known to be "small" or "large" then you can use
the special versions addSmallKeyOp
or addLargeKeyOp
. If it is
pre-serialised, use addLargeSerialisedKeyOp
.
addSmallKeyOp :: RunAcc s -> SerialisedKey -> Entry SerialisedValue BlobSpan -> ST s (Maybe (RawPage, Maybe Chunk)) Source #
Add a "small" key/op pair with an optional blob span to the run accumulator.
This version is only for small entries that can fit within a single page.
Use addLargeKeyOp
if the entry is bigger than a page. If this distinction
is not known at the use site, use entryWouldFitInPage
to determine
which case applies, or use addKeyOp
.
This is guaranteed to add the key/op, and it may yield (at most one) page.
Arguments
:: RunAcc s | |
-> SerialisedKey | |
-> Entry SerialisedValue BlobSpan | the full value, not just a prefix |
-> ST s ([RawPage], [RawOverflowPage], [Chunk]) |
Add a "large" key/op pair with an optional blob span to the run accumulator.
This version is only for large entries that span multiple pages. Use
addSmallKeyOp
if the entry is smaller than a page. If this distinction
is not known at the use site, use entryWouldFitInPage
to determine
which case applies.
Note that this version expects the full large value to be in the given
Entry
, not just the prefix of the value that fits into a single page.
For large multi-page values that are represented by a pre-serialised
RawPage
(as occurs when merging runs), use addLargeSerialisedKeyOp
.
This is guaranteed to add the key/op. It will yield one or two RawPage
s,
and one or more RawOverflowPage
s. These pages should be written out to
the run's page file in that order, the RawPage
s followed by the
RawOverflowPage
s.
addLargeSerialisedKeyOp Source #
Arguments
:: RunAcc s | |
-> SerialisedKey | The key |
-> RawPage | The page that this key/op is in, which must be the
first page of a multi-page representation of a single
key/op without a |
-> [RawOverflowPage] | The overflow pages for this key/op |
-> ST s ([RawPage], [RawOverflowPage], [Chunk]) |
Add a "large" pre-serialised key/op entry to the run accumulator.
This version is for large entries that span multiple pages and are
represented by already serialised RawPage
and one or more
RawOverflowPage
s.
For this case, the caller provides the key, the raw page it is from and the
overflow pages. The raw page and overflow pages are returned along with any
other pages that need to be yielded (in order). The caller should write out
the pages to the run's page file in order: the returned RawPage
s followed
by the RawOverflowPage
s (the same as for addLargeKeyOp
).
Note that this action is not appropriate for key/op entries that would fit
within a page (entryWouldFitInPage
) but just happen to have
ended up in a page on their own in an input to a merge. A page can end up
with a single entry because a page boundary was needed rather than because
the entry itself was too big. Furthermore, pre-serialised pages can only be
used unaltered if the entry does not use a BlobSpan
, since the BlobSpan
typically needs to be modified. Thus the caller should use the following
tests to decide if addLargeSerialisedKeyOp
should be used:
- The entry does not use a
BlobSpan
. - The entry definitely overflows onto one or more overflow pages.
Otherwise, use addLargeKeyOp
or addSmallKeyOp
as appropriate.
entryWouldFitInPage :: SerialisedKey -> Entry SerialisedValue b -> Bool Source #
Determine if the key, value and optional blobspan would fit within a
single page. If it does, then it's appropriate to use pageAccAddElem
, but
otherwise use singletonPage
.
If entryWouldFitInPage
is True
and the PageAcc
is empty (i.e. using
resetPageAcc
) then pageAccAddElem
is guaranteed to succeed.
Bloom filter allocation
data RunBloomFilterAlloc Source #
See BloomFilterAlloc
Constructors
RunAllocFixed !Word64 | Bits per element in a filter |
RunAllocRequestFPR !Double |
Instances
Show RunBloomFilterAlloc Source # | |
Defined in Database.LSMTree.Internal.RunAcc Methods showsPrec :: Int -> RunBloomFilterAlloc -> ShowS # show :: RunBloomFilterAlloc -> String # showList :: [RunBloomFilterAlloc] -> ShowS # | |
NFData RunBloomFilterAlloc Source # | |
Defined in Database.LSMTree.Internal.RunAcc Methods rnf :: RunBloomFilterAlloc -> () # | |
Eq RunBloomFilterAlloc Source # | |
Defined in Database.LSMTree.Internal.RunAcc Methods (==) :: RunBloomFilterAlloc -> RunBloomFilterAlloc -> Bool # (/=) :: RunBloomFilterAlloc -> RunBloomFilterAlloc -> Bool # | |
DecodeVersioned RunBloomFilterAlloc Source # | |
Defined in Database.LSMTree.Internal.Snapshot.Codec Methods decodeVersioned :: SnapshotVersion -> Decoder s RunBloomFilterAlloc Source # | |
Encode RunBloomFilterAlloc Source # | |
Defined in Database.LSMTree.Internal.Snapshot.Codec Methods encode :: RunBloomFilterAlloc -> Encoding Source # |
Exposed for testing
newMBloom :: NumEntries -> RunBloomFilterAlloc -> ST s (MBloom s a) Source #
Arguments
:: Integer | Number of bits assigned to the bloom filter. |
-> Integer | Number of entries inserted into the bloom filter. |
-> Integer |
Computes the optimal number of hash functions that minimises the false positive rate for a bloom filter.
See Niv Dayan, Manos Athanassoulis, Stratos Idreos, Optimal Bloom Filters and Adaptive Merging for LSM-Trees, Footnote 2, page 6.
Arguments
:: Floating a | |
=> a | entries |
-> a | bits |
-> a |
False positive rate
Assumes that the bloom filter uses numHashFunctions
hash functions.
See Niv Dayan, Manos Athanassoulis, Stratos Idreos, Optimal Bloom Filters and Adaptive Merging for LSM-Trees, Equation 2.