lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

Database.LSMTree.Internal.MergingRun

Description

An incremental merge of multiple runs.

Synopsis

Merging run

data MergingRun t m h Source #

Instances

Instances details
RefCounted m (MergingRun t m h) Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

new :: (IsMergeType t, MonadMVar m, MonadMask m, MonadSTM m, MonadST m) => HasFS m h -> HasBlockIO m h -> ResolveSerialisedValue -> RunParams -> t -> RunFsPaths -> Vector (Ref (Run m h)) -> m (Ref (MergingRun t m h)) Source #

Create a new merging run, returning a reference to it that must ultimately be released via releaseRef.

Duplicates the supplied references to the runs.

This function should be run with asynchronous exceptions masked to prevent failing after internal resources have already been created.

newCompleted Source #

Arguments

:: (MonadMVar m, MonadMask m, MonadSTM m, MonadST m) 
=> MergeDebt

Since there are no longer any input runs, we need to be told what the merge debt was.

-> Ref (Run m h) 
-> m (Ref (MergingRun t m h)) 

Create a merging run that is already in the completed state, returning a reference that must ultimately be released via releaseRef.

Duplicates the supplied reference to the run.

This function should be run with asynchronous exceptions masked to prevent failing after internal resources have already been created.

duplicateRuns :: (PrimMonad m, MonadMVar m, MonadMask m) => Ref (MergingRun t m h) -> m (Vector (Ref (Run m h))) Source #

Create references to the runs that should be queried for lookups. In particular, if the merge is not complete, these are the input runs.

TODO: This interface doesn't work well with the action registry. Just doing withRollback reg (duplicateRuns mr) (mapM_ releaseRef) isn't exception-safe since if one of the releaseRef calls fails, the following ones aren't run.

remainingMergeDebt :: (MonadMVar m, PrimMonad m) => Ref (MergingRun t m h) -> m (MergeDebt, NumEntries) Source #

Calculate the merge credits required to complete the merge, as well as an upper bound on the size of the resulting run.

supplyChecked :: forall m r s. (HasCallStack, Monad m) => (r -> m (MergeDebt, s)) -> (r -> MergeCredits -> m MergeCredits) -> r -> MergeCredits -> m MergeCredits Source #

Helper function to assert common invariants for functions that supply credits.

supplyCreditsRelative :: forall t m h. (MonadSTM m, MonadST m, MonadMVar m, MonadMask m) => Ref (MergingRun t m h) -> CreditThreshold -> MergeCredits -> m MergeCredits Source #

Supply the given amount of credits to a merging run. This may cause an ongoing merge to progress.

The credits are given in relative terms: as an addition to the current supplied credits. See supplyCreditsAbsolute to set the supplied credits to an absolute value.

The result is the number of credits left over. This will be non-zero if the credits supplied would take the total supplied credits over the total merge debt.

supplyCreditsAbsolute Source #

Arguments

:: forall t m h. (MonadSTM m, MonadST m, MonadMVar m, MonadMask m) 
=> Ref (MergingRun t m h) 
-> CreditThreshold 
-> MergeCredits 
-> m (MergeCredits, MergeCredits)

(suppliedCredits, suppliedCredits')

Set the supplied credits to the given value, unless the current value is already greater. This may cause an ongoing merge to progress.

The credits are given in absolute terms: as the new value for the current supplied credits. See supplyCreditsRelative to set the supplied credits as a relative addition to the current value.

The given credit value must be no greater than the totalMergeDebt.

The result is the new value of the total supplied credits, which may be more than the specified value if the current value was already greater than the specified value.

The result is:

  1. The (absolute value of the) supplied credits beforehand.
  2. The (absolute value of the) supplied credits afterwards. This will be equal to the given value or to the supplied credits beforehand, whichever is the greater.

expectCompleted :: (MonadMVar m, MonadSTM m, MonadST m, MonadMask m) => Ref (MergingRun t m h) -> m (Ref (Run m h)) Source #

This does not release the reference, but allocates a new reference for the returned run, which must be released at some point.

snapshot :: (PrimMonad m, MonadMVar m) => Ref (MergingRun t m h) -> m (MergeDebt, MergeCredits, MergingRunState t m h) Source #

Take a snapshot of the state of a merging run.

TODO: this is not concurrency safe! The inputs runs to the merging run could be released concurrently by another thread that completes the merge, while the snapshot is taking place. The solution is for snapshot here to duplicate the runs it returns _while_ holding the mergeState MVar (to exclude threads that might concurrently complete the merge). And then the caller of course must be updated to release the extra references.

mergeType :: MonadMVar m => Ref (MergingRun t m h) -> m (Maybe t) Source #

Merge types

class IsMergeType t where Source #

Merges can either exist on a level of the LSM, or be a union merge of two tables.

Methods

isLastLevel :: t -> Bool Source #

A last level merge behaves differently from a mid-level merge: last level merges can actually remove delete operations, whereas mid-level merges must preserve them.

isUnion :: t -> Bool Source #

Union merges follow the semantics of Data.Map.unionWith (<>). Since the input runs are semantically treated like Data.Maps, deletes are ignored and inserts act like mupserts, so they need to be merged monoidally using resolveValue.

data LevelMergeType Source #

Different types of merges created as part of a regular (non-union) level.

A last level merge behaves differently from a mid-level merge: last level merges can actually remove delete operations, whereas mid-level merges must preserve them. This is orthogonal to the MergePolicy.

data TreeMergeType Source #

Different types of merges created as part of the merging tree.

Constructors

MergeLevel 
MergeUnion 

Instances

Instances details
Show TreeMergeType Source # 
Instance details

Defined in Database.LSMTree.Internal.Merge

NFData TreeMergeType Source # 
Instance details

Defined in Database.LSMTree.Internal.Merge

Methods

rnf :: TreeMergeType -> () #

Eq TreeMergeType Source # 
Instance details

Defined in Database.LSMTree.Internal.Merge

IsMergeType TreeMergeType Source # 
Instance details

Defined in Database.LSMTree.Internal.Merge

DecodeVersioned TreeMergeType Source # 
Instance details

Defined in Database.LSMTree.Internal.Snapshot.Codec

Encode TreeMergeType Source #

We start the tags for these merge types at an offset. This way, if we serialise MR.MergeMidLevel :: MR.LevelMergeType as 0 and then accidentally try deserialising it as a MR.TreeMergeType, that will fail.

However, LevelMergeType and TreeMergeType are only different (overlapping) subsets of MergeType. In particular, MergeLastLevel and MergeLevel are semantically the same. Encoding them as the same number leaves the door open to relaxing the restrictions on which merge types can occur where, e.g. decoding them as a general MergeType, without having to change the file format.

Instance details

Defined in Database.LSMTree.Internal.Snapshot.Codec

Credit tracking

The credits and debt concept we use here comes from amortised analysis of data structures (see the Bankers Method from Okasaki). Though here we use it not as an analysis method but within the code itself for tracking the state of the scheduled (i.e. incremental) merge.

There are two notions of credits (and corresponding debt) in this LSM implementation: nominal credits and merge credits. The merging run deals exclusively with merge credits. See IncomingRun for nominal credits.

A single merge credit corresponds with a merge step performed. Merge steps are measured by the number of entries in the input runs that are consumed. We measure the merge in terms of inputs, not outputs, because the number of inputs is known beforehand, whereas the number of outputs is not. The total merge debt is therefore defined to be the sum of the number of entries across the input runs to the merge. Once the merge credits spent equals the merge debt then the merge is (or rather must be) complete.

In both the prototype and implementation we accumulate unspent credits until they reach a threshold at which point we do a batch of merging work. We track both credits spent and credits as yet unspent.

In the prototype, the credits spent equals the merge steps performed. The same holds in the real implementation, but making it so is more complicated. When we spend credits on merging work, the number of steps we perform is not guaranteed to be the same as the credits supplied. For example we may ask to do 100 credits of merging work, but the merge code (for perfectly sensible efficiency reasons) will decide to do 102 units of merging work. The rule is that we may do (slightly) more work than the credits supplied but not less. To account for this we spend more credits, corresponding to the excess merging work performed. We spend them by borrowing them from the unspent credits, which may leave the unspent credits with a negative balance.

Furthermore, the real implementation has to cope with concurrency: multiple threads sharing the same MergingRun and calling supplyCredits concurrently. The credit accounting thus needs to define the state of the credits while merging work is in progress by some thread. The approach we take is to define spent credit to include credits that are in the process of being spent, leaving unspent credit as credits that are available for a thread to spend on merging work.

Thus we track three things:

  • spent credits (SpentCredits): credits supplied that have been or are in the process of being spent on performing merging steps;
  • unspent credits (UnspentCredits): credits supplied that are not yet spent and are thus available to spend; and
  • merge debt (MergeDebt): the sum of the sizes of the input runs, and thus the total merge credits that have to be spent for the merge to be complete.

And define a derived measure:

  • supplied credits: the sum of the spent and unspent credits. This is therefore also the sum of all the credits that have been (successfully) supplied to a merging run via supplyCredits.

The supplied credits increases monotonically, even in the presence of (a)synchronous exceptions.

We guarantee that the supplied credits never exceeds the total debt.

When the supplied credits equals the merge debt then we may not have actually completed the merge (since that requires spending the credits) but we have the potential to complete the merge whenever needed without supplying any more credits.

The credits spent and the steps performed (or in the process of being performed) will typically be equal. They are not guaranteed to be equal in the presence of exceptions (synchronous or asynchronous). In this case we offer a weaker guarantee: : a merge may progress more steps than the number of credits that were spent. If an exception happens at some point during merging work, we will "unspend" all the credits we intended to spend, but we will not revert all merging steps that we already successfully performed before the exception. Thus we may do more merging steps than the credits we accounted as spent. This makes the implementation simple, and merges will still finish in time. It would be bad if we did not put back credits, because then a merge might not finish in time, which will mess up the shape of the levels tree.

numEntriesToMergeDebt :: NumEntries -> MergeDebt Source #

The total debt of the merging run is exactly the sum total number of entries across all the input runs to be merged.

newtype MergeCredits Source #

Constructors

MergeCredits Int 

Instances

Instances details
Enum MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

Num MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

Integral MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

Real MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

Show MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

NFData MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

Methods

rnf :: MergeCredits -> () #

Eq MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

Ord MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

DecodeVersioned MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.Snapshot.Codec

Encode MergeCredits Source # 
Instance details

Defined in Database.LSMTree.Internal.Snapshot.Codec

newtype CreditThreshold Source #

Unspent credits are accumulated until they go over the CreditThreshold, after which a batch of merge work will be performed. Configuring this threshold should allow to achieve a nice balance between spreading out I/O and achieving good (concurrent) performance.

Note that ideally the batch size for different LSM levels should be co-prime so that merge work at different levels is not synchronised.

Instances

Instances details
Show CreditThreshold Source # 
Instance details

Defined in Database.LSMTree.Internal.MergingRun

newtype SpentCredits Source #

The spent credits are supplied credits that have been spent on performing merging steps plus the supplied credits that are in the process of being spent (by some thread calling supplyCredits).

newtype UnspentCredits Source #

The unspent credits are supplied credits that have not yet been spent on performing merging steps and are available to spend.

Note: unspent credits may be negative! This can occur when more merge steps were performed than there were credits to cover. In this case the credits are borrowed from the unspent credits, which may result in the current unspent credits being negative for a time.

Concurrency

Merging runs can be shared across tables, which means that multiple threads can contribute to the same merge concurrently. The design to contribute credits to the same merging run is largely lock-free. It ensures consistency of the unspent credits and the merge state, while allowing threads to progress without waiting on other threads.

The entry point for merging is supplyCredits. This may be called by concurrent threads that share the same merging run. No locks are held initially.

The credits to supply can be specified as either an absolute or relative value. That is, we can ask that the number of supplied credits be set to a value N, or we can specify an additional N credits. increasing so in the absolute case, there is no change if the requested new supplied credit value is less than the current value. Supplying credits from the levels (via incoming runs) uses absolute credits, while supplying credits from merging trees using relative credits.

The main lock we will discuss is the mergeState StrictMVar, and we will refer to it as the merge lock.

We get the easy things out of the way first: the mergeKnownCompleted variable is purely an optimisation. It starts out as MergeMaybeCompleted and is only ever modified once to MergeKnownCompleted. It is modified with the merge lock held, but read without the lock. It does not matter if a thread reads a stale value of MergeMaybeCompleted. We can analyse the remainder of the algorithm as if we were always in the MergeMaybeCompleted state.

Variable access and locks:

  • CreditsVar contains the pair of the current SpentCredits and UnspentCredits. Is only operated upon using transactions (atomic CAS), and most of these transactions are done without the merge lock held. The two constituent components can increase and decrease, but the total supplied credits (sum of spent and unspent) can only increase.
  • MergeState contains the state of the merge itself. It is protected by the merge lock.

First, we do a moderately complex transaction atomicDepositAndSpendCredits, which does the following:

  • Deposit credits to the unspent pot, while guaranteeing that the total supplied credits does not exceed the total debt for the merging run.
  • Compute any leftover credits (that would have exceeded the total debt).
  • Compute the credits to spend on performing merge steps, depending on which of three cases we are in:
  1. we have supplied enough credits to complete the merge;
  2. not case 1, but enough unspent credits have accumulated to do a batch of merge work;
  3. not case 1 or 2, not enough credits to do any merge work.
  • Update the spent and unspent pots
  • Return the credits to spend now and any leftover credits.

If there are now credits to spend, then we attempt to perform that number of merging steps. While doing the merging work, the (more expensive) merge lock is taken to ensure that the merging work itself is performed only sequentially.

Note that it is not guaranteed that the merge gets completed, even if the credits supplied has reached the total debt. It may be interrupted during the merge (by an async exception). This does not matter because the merge will be completed in expectCompleted. Completing early is an optimisation.

If an exception occurs during the merge then the credits that were in the process of being spent are transferred back from the spent to the unspent pot using atomicSpendCredits (with a negative amount). It is this case that implies that the spent credits may not increase monotonically, even though the supplied credits do increase monotonically.

Once performing merge steps is done, if it turns out that excess merge steps were performed then we must do a further accounting transaction: atomicSpendCredits to spend the excess credits. This is done without respect to the balance of the unspent credits, which may result in the unspent credit balance becoming negative. This is ok, and will result in more credits having to be supplied next time before reaching the credit batch threshold. The unspent credits can not be negative by the time the merge is complete because the performing of merge steps cannot do excess steps when it reaches the end of the merge.

Internal state

data MergingRunState t m h Source #

Constructors

CompletedMerge !(Ref (Run m h))

Output run

OngoingMerge 

Fields

newtype CreditsVar s Source #

This holds the pair of the SpentCredits and the UnspentCredits. All operations on this pair are atomic.

The model to think about is a TVar (SpentCredits, UnspentCredits) but the physical representation is a single mutable unboxed 64bit signed Int, using 40 bits for the spent credits and 24 for the unspent credits. The spent credits are unsigned, while the unspent credits are signed, so 40 bits and 23+1 bits respectively. This imposes a limit of just over 1 trillion for the spent credits and thus run size, and 8.3 million for the unspent credits (23 + sign bit).

If these limits ever become restrictive, then the implementation could be changed to use a TVar or a double-word CAS (DWCAS, i.e. 128bit).

Constructors

CreditsVar (PrimVar s Int) 

Errors