lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

FormatPage

Description

This accompanies the format-page.md documentation as a sanity check and a precise reference. It is intended to demonstrate that the page format works. It is also used as a reference implementation for tests of the real implementation.

Logically, a page is a sequence of key,operation pairs (with optional blobrefs), sorted by key, and its serialised form fits within a disk page.

This reference implementation covers serialisation and deserialisation (not lookups) which do not rely on (or enforce) the keys being sorted.

Synopsis

Page types

newtype Key Source #

Constructors

Key 

Fields

Instances

Instances details
Arbitrary Key Source # 
Instance details

Defined in FormatPage

Show Key Source # 
Instance details

Defined in FormatPage

Methods

showsPrec :: Int -> Key -> ShowS #

show :: Key -> String #

showList :: [Key] -> ShowS #

Eq Key Source # 
Instance details

Defined in FormatPage

Methods

(==) :: Key -> Key -> Bool #

(/=) :: Key -> Key -> Bool #

Ord Key Source # 
Instance details

Defined in FormatPage

Methods

compare :: Key -> Key -> Ordering #

(<) :: Key -> Key -> Bool #

(<=) :: Key -> Key -> Bool #

(>) :: Key -> Key -> Bool #

(>=) :: Key -> Key -> Bool #

max :: Key -> Key -> Key #

min :: Key -> Key -> Key #

data Operation Source #

Instances

Instances details
Arbitrary Operation Source # 
Instance details

Defined in FormatPage

Show Operation Source # 
Instance details

Defined in FormatPage

Eq Operation Source # 
Instance details

Defined in FormatPage

newtype Value Source #

Constructors

Value 

Fields

Instances

Instances details
Arbitrary Value Source # 
Instance details

Defined in FormatPage

Show Value Source # 
Instance details

Defined in FormatPage

Methods

showsPrec :: Int -> Value -> ShowS #

show :: Value -> String #

showList :: [Value] -> ShowS #

Eq Value Source # 
Instance details

Defined in FormatPage

Methods

(==) :: Value -> Value -> Bool #

(/=) :: Value -> Value -> Bool #

data BlobRef Source #

Constructors

BlobRef Word64 Word32 

Instances

Instances details
Arbitrary BlobRef Source # 
Instance details

Defined in FormatPage

Show BlobRef Source # 
Instance details

Defined in FormatPage

Eq BlobRef Source # 
Instance details

Defined in FormatPage

Methods

(==) :: BlobRef -> BlobRef -> Bool #

(/=) :: BlobRef -> BlobRef -> Bool #

type PageSerialised = ByteString Source #

A serialised page consists of either a single disk page or several disk pages. The latter is a primary page followed by one or more overflow pages. Each disk page (single or multi) uses the same DiskPageSize, which should be known from context (e.g. configuration).

Page size

data PageSize Source #

Instances

Instances details
Show PageSize Source # 
Instance details

Defined in FormatPage

Eq PageSize Source # 
Instance details

Defined in FormatPage

Encoding and decoding

Overflow pages

pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int) Source #

If a page uses overflow pages, return the:

  1. value prefix length (within the first page)
  2. value suffix length (within the overflow pages)

pageDiskPages :: PageIntermediate -> Int Source #

The total number of disk pages, including any overflow pages.

Tests and generators

Generators and shrinkers

genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #

Generate a test case consisting of a key/operation sequence that is guaranteed to fit into a given disk page size.

The distribution is designed to cover:

  • small pages
  • medium sized pages
  • nearly full pages
  • plus single key pages (possibly using one or more overflow pages)
  • a corner case of a single large key/operation pair followed by some small key op pairs.

The keys are not ordered: use orderdKeyOps to sort and de-duplicate them if that is needed (but note this will change the order of key sizes).

genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #

Generate a test case consisting of a key/operation sequence that is not guaranteed to fit into a given disk page size.

These test cases are useful for checking the boundary conditions for what can fit into a disk page. This covers a similar distribution to genPageContentFits but also includes about 20% of pages that are over full, including the corner case of a large key ops pair followed by smaller key op pairs (again possibly over full).

The keys are not ordered: use orderdKeyOps to sort and de-duplicate them if that is needed.

genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation) Source #

Generate a test case consisting of a single key/operation pair.

genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #

Generate only pages that are nearly full. This isn't the maximum possible size, but where adding one more randomly-chosen key/op pair would not fit (but perhaps a smaller pair would still fit).

Consider if you really need this: the genPageContentMedium also includes these cases naturally as part of its distribution. On the other hand, this can be good for generating benchmark data.

genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #

This generates a reasonable "middle" distribution of page sizes (relative to the given disk page size). In particular it covers:

  • small pages (~45% for 4k pages, ~15% for 64k pages)
  • near-maximum pages (~20% for 4k pages, ~20% for 64k pages)
  • some in between (~35% for 4k pages, ~60% for 64k pages)

The numbers above are when used with the normal arbitrary Key and Value generators. And with these generators, it tends to use lots of small-to-medium size keys and values, rather than a few huge ones.

newtype MinKeySize Source #

In some use cases it is necessary to generate Keys that are at least of some minimum length. Use noMinKeySize if no such constraint is need.

Constructors

MinKeySize Int 

Instances

Instances details
Show MinKeySize Source # 
Instance details

Defined in FormatPage

noMinKeySize :: MinKeySize Source #

No minimum key size: MinKeySize 0.

orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)] Source #

Sort and de-duplicate a key/operation sequence to ensure the sequence is strictly ascending by key.

If you need this in a QC generator, you will need shrinkOrderedKeyOps in the corresponding shrinker.

shrinkKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]] Source #

Shrink a key/operation sequence (without regard to key order).

shrinkOrderedKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]] Source #

Shrink a key/operation sequence, preserving key order.