lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

ScheduledMerges

Description

A prototype of an LSM with explicitly scheduled incremental merges.

The scheduled incremental merges is about ensuring that the merging work (CPU and I/O) can be spread out over time evenly. This also means the LSM update operations have worst case complexity rather than amortised complexity, because they do a fixed amount of merging work each.

Another thing this prototype demonstrates is a design for duplicating tables and sharing ongoing incremental merges.

Finally, it demonstrates a design for table unions, including a representation for in-progress merging trees.

The merging policy that this prototype uses is power 4 "lazy levelling". Power 4 means each level is 4 times bigger than the previous level. Lazy levelling means we use tiering for every level except the last level which uses levelling. Though note that the first level always uses tiering, even if the first level is also the last level. This is to simplify flushing the write buffer: if we used levelling on the first level we would need a code path for merging the write buffer into the first level.

Synopsis

Main API

data LSM s Source #

newtype Key Source #

Constructors

K Int 

Instances

Instances details
Arbitrary Key Source # 
Instance details

Defined in ScheduledMerges

Enum Key Source # 
Instance details

Defined in ScheduledMerges

Methods

succ :: Key -> Key #

pred :: Key -> Key #

toEnum :: Int -> Key #

fromEnum :: Key -> Int #

enumFrom :: Key -> [Key] #

enumFromThen :: Key -> Key -> [Key] #

enumFromTo :: Key -> Key -> [Key] #

enumFromThenTo :: Key -> Key -> Key -> [Key] #

Show Key Source # 
Instance details

Defined in ScheduledMerges

Methods

showsPrec :: Int -> Key -> ShowS #

show :: Key -> String #

showList :: [Key] -> ShowS #

Eq Key Source # 
Instance details

Defined in ScheduledMerges

Methods

(==) :: Key -> Key -> Bool #

(/=) :: Key -> Key -> Bool #

Ord Key Source # 
Instance details

Defined in ScheduledMerges

Methods

compare :: Key -> Key -> Ordering #

(<) :: Key -> Key -> Bool #

(<=) :: Key -> Key -> Bool #

(>) :: Key -> Key -> Bool #

(>=) :: Key -> Key -> Bool #

max :: Key -> Key -> Key #

min :: Key -> Key -> Key #

newtype Value Source #

Constructors

V Int 

Instances

Instances details
Arbitrary Value Source # 
Instance details

Defined in ScheduledMerges

Show Value Source # 
Instance details

Defined in ScheduledMerges

Methods

showsPrec :: Int -> Value -> ShowS #

show :: Value -> String #

showList :: [Value] -> ShowS #

Eq Value Source # 
Instance details

Defined in ScheduledMerges

Methods

(==) :: Value -> Value -> Bool #

(/=) :: Value -> Value -> Bool #

newtype Blob Source #

Constructors

B Int 

Instances

Instances details
Arbitrary Blob Source # 
Instance details

Defined in ScheduledMerges

Show Blob Source # 
Instance details

Defined in ScheduledMerges

Methods

showsPrec :: Int -> Blob -> ShowS #

show :: Blob -> String #

showList :: [Blob] -> ShowS #

Eq Blob Source # 
Instance details

Defined in ScheduledMerges

Methods

(==) :: Blob -> Blob -> Bool #

(/=) :: Blob -> Blob -> Bool #

new :: ST s (LSM s) Source #

data LookupResult v b Source #

Constructors

NotFound 
Found !v !(Maybe b) 

Instances

Instances details
(Show v, Show b) => Show (LookupResult v b) Source # 
Instance details

Defined in ScheduledMerges

(Eq v, Eq b) => Eq (LookupResult v b) Source # 
Instance details

Defined in ScheduledMerges

Methods

(==) :: LookupResult v b -> LookupResult v b -> Bool #

(/=) :: LookupResult v b -> LookupResult v b -> Bool #

data Update v b Source #

Constructors

Insert !v !(Maybe b) 
Mupsert !v 
Delete 

Instances

Instances details
(Arbitrary v, Arbitrary b) => Arbitrary (Update v b) Source # 
Instance details

Defined in ScheduledMerges

Methods

arbitrary :: Gen (Update v b) Source #

shrink :: Update v b -> [Update v b] Source #

(Show b, Show v) => Show (Update v b) Source # 
Instance details

Defined in ScheduledMerges

Methods

showsPrec :: Int -> Update v b -> ShowS #

show :: Update v b -> String #

showList :: [Update v b] -> ShowS #

(Eq b, Eq v) => Eq (Update v b) Source # 
Instance details

Defined in ScheduledMerges

Methods

(==) :: Update v b -> Update v b -> Bool #

(/=) :: Update v b -> Update v b -> Bool #

update :: Tracer (ST s) Event -> LSM s -> Key -> Op -> ST s () Source #

updates :: Tracer (ST s) Event -> LSM s -> [(Key, Op)] -> ST s () Source #

insert :: Tracer (ST s) Event -> LSM s -> Key -> Value -> Maybe Blob -> ST s () Source #

inserts :: Tracer (ST s) Event -> LSM s -> [(Key, Value, Maybe Blob)] -> ST s () Source #

delete :: Tracer (ST s) Event -> LSM s -> Key -> ST s () Source #

deletes :: Tracer (ST s) Event -> LSM s -> [Key] -> ST s () Source #

mupsert :: Tracer (ST s) Event -> LSM s -> Key -> Value -> ST s () Source #

mupserts :: Tracer (ST s) Event -> LSM s -> [(Key, Value)] -> ST s () Source #

duplicate :: LSM s -> ST s (LSM s) Source #

unions :: [LSM s] -> ST s (LSM s) Source #

Similar to Data.Map.unionWith.

A call to union itself is not expensive, as the input tables are not immediately merged. Instead, it creates a representation of an in-progress merge that can be performed incrementally (somewhat similar to a thunk).

The more merge work remains, the more expensive are lookups on the table.

type Credit = Int Source #

Credits for keeping track of merge progress. These credits correspond directly to merge steps performed.

We also call these "physical" credits (since they correspond to steps done), and as opposed to "nominal" credits in NominalCredit and NominalDebt.

type Debt = Int Source #

Debt for keeping track of the total merge work to do.

remainingUnionDebt :: LSM s -> ST s UnionDebt Source #

Return the current union debt. This debt can be reduced until it is paid off using supplyUnionCredits.

supplyUnionCredits :: LSM s -> UnionCredits -> ST s UnionCredits Source #

Supply union credits to reduce union debt.

Supplying union credits leads to union merging work being performed in batches. This reduces the union debt returned by remainingUnionDebt. Union debt will be reduced by at least the number of supplied union credits. It is therefore advisable to query remainingUnionDebt every once in a while to see what the current debt is.

This function returns any surplus of union credits as leftover credits when a union has finished. In particular, if the returned number of credits is non-negative, then the union is finished.

Test and trace

data MTree r Source #

Constructors

MLeaf r 
MNode TreeMergeType [MTree r] 

Instances

Instances details
Foldable MTree Source # 
Instance details

Defined in ScheduledMerges

Methods

fold :: Monoid m => MTree m -> m #

foldMap :: Monoid m => (a -> m) -> MTree a -> m #

foldMap' :: Monoid m => (a -> m) -> MTree a -> m #

foldr :: (a -> b -> b) -> b -> MTree a -> b #

foldr' :: (a -> b -> b) -> b -> MTree a -> b #

foldl :: (b -> a -> b) -> b -> MTree a -> b #

foldl' :: (b -> a -> b) -> b -> MTree a -> b #

foldr1 :: (a -> a -> a) -> MTree a -> a #

foldl1 :: (a -> a -> a) -> MTree a -> a #

toList :: MTree a -> [a] #

null :: MTree a -> Bool #

length :: MTree a -> Int #

elem :: Eq a => a -> MTree a -> Bool #

maximum :: Ord a => MTree a -> a #

minimum :: Ord a => MTree a -> a #

sum :: Num a => MTree a -> a #

product :: Num a => MTree a -> a #

Functor MTree Source # 
Instance details

Defined in ScheduledMerges

Methods

fmap :: (a -> b) -> MTree a -> MTree b #

(<$) :: a -> MTree b -> MTree a #

Show r => Show (MTree r) Source # 
Instance details

Defined in ScheduledMerges

Methods

showsPrec :: Int -> MTree r -> ShowS #

show :: MTree r -> String #

showList :: [MTree r] -> ShowS #

Eq r => Eq (MTree r) Source # 
Instance details

Defined in ScheduledMerges

Methods

(==) :: MTree r -> MTree r -> Bool #

(/=) :: MTree r -> MTree r -> Bool #

type Representation = (Run, [LevelRepresentation], Maybe (MTree Run)) Source #

data EventAt e Source #

Constructors

EventAt 

Fields

Instances

Instances details
Show e => Show (EventAt e) Source # 
Instance details

Defined in ScheduledMerges

Methods

showsPrec :: Int -> EventAt e -> ShowS #

show :: EventAt e -> String #

showList :: [EventAt e] -> ShowS #

newtype MergingTree s Source #

A "merging tree" is a mutable representation of an incremental tree-shaped nested merge. This allows to represent union merges of entire tables, each of which itself first need to be merged to become a single run.

Trees have to support arbitrarily deep nesting, since each input to union might already contain an in-progress merging tree (which then becomes shared between multiple tables).

See Note [Table Unions].

Constructors

MergingTree (STRef s (MergingTreeState s)) 

data MergingTreeState s Source #

Constructors

CompletedTreeMerge !Run 
OngoingTreeMerge !(MergingRun TreeMergeType s)

Reuses MergingRun (with its STRef) to allow sharing existing merges.

PendingTreeMerge !(PendingMerge s) 

data PendingMerge s Source #

A merge that is waiting for its inputs to complete.

The inputs can themselves be MergingTrees (with its STRef) to allow sharing existing unions.

Constructors

PendingLevelMerge ![PreExistingRun s] !(Maybe (MergingTree s))

The inputs are entire content of a table, i.e. its (merging) runs and finally a union merge (if that table already contained a union).

PendingUnionMerge ![MergingTree s]

Each input is a level merge of the entire content of a table.

data PreExistingRun s Source #

This is much like an IncomingRun, and are created from them, but contain only the essential information needed in a PendingLevelMerge.

data MergingRun t s Source #

A "merging run" is a mutable representation of an incremental merge. It is also a unit of sharing between duplicated tables.

Constructors

MergingRun !t !MergeDebt !(STRef s MergingRunState) 

data MergingRunState Source #

Constructors

CompletedMerge !Run 
OngoingMerge 

Fields

data MergePolicy Source #

The merge policy for a LSM level can be either tiering or levelling. In this design we use levelling for the last level, and tiering for all other levels. The first level always uses tiering however, even if it's also the last level. So MergePolicy and LevelMergeType are orthogonal, all combinations are possible.

Instances

Instances details
Show MergePolicy Source # 
Instance details

Defined in ScheduledMerges

Eq MergePolicy Source # 
Instance details

Defined in ScheduledMerges

class Show t => IsMergeType t where Source #

Merges can exist in different parts of the LSM, each with different options for the exact merge operation performed.

Methods

isLastLevel :: t -> Bool Source #

isUnion :: t -> Bool Source #

data TreeMergeType Source #

Different types of merges created as part of the merging tree.

Union merges follow the semantics of Data.Map.unionWith (<>). Since the input runs are semantically treated like Data.Maps, deletes are ignored and inserts act like mupserts, so they need to be merged monoidally using resolveValue.

Trees can only exist on the union level, which is the last. Therefore, node merges can always drop deletes.

Constructors

MergeLevel 
MergeUnion 

data LevelMergeType Source #

Different types of merges created as part of a regular (non-union) level.

A last level merge behaves differently from a mid-level merge: last level merges can actually remove delete operations, whereas mid-level merges must preserve them. This is orthogonal to the MergePolicy.

data MergeCredit Source #

Constructors

MergeCredit 

Instances

Instances details
Show MergeCredit Source # 
Instance details

Defined in ScheduledMerges

newtype MergeDebt Source #

Constructors

MergeDebt 

Fields

Instances

Instances details
Show MergeDebt Source # 
Instance details

Defined in ScheduledMerges

newtype NominalCredit Source #

Nominal credit is the credit supplied to each level as we insert update operations, one credit per update operation inserted.

Nominal credit must be supplied up to the NominalDebt to ensure the merge is complete.

Nominal credits are a similar order of magnitude to physical credits (see Credit) but not the same, and we have to scale linearly to convert between them. Physical credits are the actual number of inputs to the merge, which may be somewhat more or somewhat less than the number of update operations we will insert before we need the merge to be complete.

Constructors

NominalCredit Credit 

Instances

Instances details
Show NominalCredit Source # 
Instance details

Defined in ScheduledMerges

newtype NominalDebt Source #

The nominal debt for a merging run is the worst case (minimum) number of update operations we expect to insert before we expect the merge to be complete.

We require that an equal amount of nominal credit is supplied before we can expect a merge to be complete.

We scale linearly to convert nominal credits to physical credits, such that the nominal debt and physical debt are both considered "100%", and so that both debts are paid off at exactly the same time.

Constructors

NominalDebt Credit 

Instances

Instances details
Show NominalDebt Source # 
Instance details

Defined in ScheduledMerges

type Run = Map Key Op Source #

newtype UnionCredits Source #

Credits are used to pay off UnionDebt, completing a union in the process.

A union credit corresponds to a single merging step being performed.

Constructors

UnionCredits Credit 

newtype UnionDebt Source #

The current upper bound on the number of UnionCredits that have to be supplied before a union is completed.

The union debt is the number of merging steps that need to be performed /at most/ until the delayed work of performing a union is completed. This includes the cost of completing merges that were part of the union's input tables.

Constructors

UnionDebt Debt 

mergek :: IsMergeType t => t -> [Run] -> Run Source #

Invariants