lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

Database.LSMTree.Internal.MergingTree

Description

An incremental merge of multiple runs, preserving a bracketing structure.

Synopsis

Documentation

Semantically, tables are key-value stores like Haskell's Map. Table unions then behave like Map.unionWith (<>). If one of the input tables contains a value at a particular key, the result will also contain it. If multiple tables share that key, the values will be combined monoidally.

Looking at the implementation, tables are not just key-value pairs, but consist of runs. If each table was just a single run, unioning would involve a run merge similar to the one used for compaction (when a level is full), but with a different merge type MergeUnion that differs semantically: Here, runs don't represent updates (overwriting each other), but they each represent the full state of a table. There is no distinction between no entry and a Delete, between an Insert and a Mupsert.

To union two tables, we can therefore first merge down each table into a single run (using regular level merges) and then union merge these.

However, we want to spread out the work required and perform these merges incrementally. At first, we only create a new table that is empty except for a data structure MergingTree, representing the merges that need to be done. The usual operations can then be performed on the table while the merge is in progress: Inserts go into the table as usual, not affecting its last level (UnionLevel), lookups need to consider the tree (requiring some complexity and runtime overhead), further unions incorporate the in-progress tree into the resulting one, which also shares future merging work.

It seems necessary to represent the suspended merges using a tree. Other approaches don't allow for full sharing of the incremental work (e.g. because they effectively "re-bracket" nested unions). It also seems necessary to first merge each input table into a single run, as there is no practical distributive property between level and union merges.

data MergingTree m h Source #

A "merging tree" is a mutable representation of an incremental tree-shaped nested merge. This allows to represent union merges of entire tables, each of which itself first need to be merged to become a single run.

Trees have to support arbitrarily deep nesting, since each input to union might already contain an in-progress merging tree (which then becomes shared between multiple tables).

Constructors

MergingTree 

Instances

Instances details
RefCounted m (MergingTree m h) Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingTree

newOngoingMerge :: (MonadMVar m, PrimMonad m, MonadMask m) => Ref (MergingRun TreeMergeType m h) -> m (Ref (MergingTree m h)) Source #

Create a new MergingTree representing the merge of an ongoing run. The usage of this function is primarily to facilitate the reloading of an ongoing merge from a persistent snapshot.

newPendingLevelMerge :: forall m h. (MonadMVar m, MonadMask m, PrimMonad m) => [PreExistingRun m h] -> Maybe (Ref (MergingTree m h)) -> m (Ref (MergingTree m h)) Source #

Create a new MergingTree representing the merge of a sequence of pre-existing runs (completed or ongoing, plus a optional final tree). This is for merging the entire contents of a table down to a single run (while sharing existing ongoing merges).

Shape: if the list of runs is empty and the optional input tree is structurally empty, the result will also be structurally empty. See isStructurallyEmpty.

Resource tracking: * This allocates a new Ref which the caller is responsible for releasing eventually. * The ownership of all input Refs remains with the caller. This action will create duplicate references, not adopt the given ones.

ASYNC: this should be called with asynchronous exceptions masked because it allocates/creates resources.

newPendingUnionMerge :: (MonadMVar m, MonadMask m, PrimMonad m) => [Ref (MergingTree m h)] -> m (Ref (MergingTree m h)) Source #

Create a new MergingTree representing the union of one or more merging trees. This is for unioning the content of multiple tables (represented themselves as merging trees).

Shape: if all of the input trees are structurally empty, the result will also be structurally empty. See isStructurallyEmpty.

Resource tracking: * This allocates a new Ref which the caller is responsible for releasing eventually. * The ownership of all input Refs remains with the caller. This action will create duplicate references, not adopt the given ones.

ASYNC: this should be called with asynchronous exceptions masked because it allocates/creates resources.

isStructurallyEmpty :: MonadMVar m => Ref (MergingTree m h) -> m Bool Source #

Test if a MergingTree is "obviously" empty by virtue of its structure. This is not the same as being empty due to a pending or ongoing merge happening to produce an empty run.

remainingMergeDebt :: (MonadMVar m, PrimMonad m) => Ref (MergingTree m h) -> m (MergeDebt, NumEntries) Source #

Calculate an upper bound on the merge credits required to complete the merge, i.e. turn the tree into a CompletedTreeMerge. For the recursive calculation, we also return an upper bound on the size of the resulting run.

Internal state

data MergingTreeState m h Source #

Constructors

CompletedTreeMerge !(Ref (Run m h))

Output run

OngoingTreeMerge !(Ref (MergingRun TreeMergeType m h))

Reuses MergingRun to allow sharing existing merges.

PendingTreeMerge !(PendingMerge m h) 

data PendingMerge m h Source #

A merge that is waiting for its inputs to complete.

Constructors

PendingLevelMerge !(Vector (PreExistingRun m h)) !(Maybe (Ref (MergingTree m h)))

The collection of inputs is the entire contents of a table, i.e. its (merging) runs and finally a union merge (if that table already contained a union).

PendingUnionMerge !(Vector (Ref (MergingTree m h)))

Each input is the entire content of a table (as a merging tree).