| Safe Haskell | Safe-Inferred | 
|---|---|
| Language | GHC2021 | 
FormatPage
Description
This accompanies the format-page.md documentation as a sanity check and a precise reference. It is intended to demonstrate that the page format works. It is also used as a reference implementation for tests of the real implementation.
Logically, a page is a sequence of key,operation pairs (with optional blobrefs), sorted by key, and its serialised form fits within a disk page.
This reference implementation covers serialisation and deserialisation (not lookups) which do not rely on (or enforce) the keys being sorted.
Synopsis
- newtype Key = Key {
- unKey :: ByteString
 
 - data Operation
 - opHasBlobRef :: Operation -> Bool
 - newtype Value = Value {}
 - data BlobRef = BlobRef Word64 Word32
 - type PageSerialised = ByteString
 - data PageIntermediate = PageIntermediate {
- pageNumKeys :: !Word16
 - pageNumBlobs :: !Word16
 - pageSizesOffsets :: !PageSizesOffsets
 - pageBlobRefBitmap :: [Bool]
 - pageOperations :: [OperationEnum]
 - pageBlobRefs :: [BlobRef]
 - pageKeyOffsets :: [Word16]
 - pageValueOffsets :: Either [Word16] (Word16, Word32)
 - pageKeys :: !ByteString
 - pageValues :: !ByteString
 - pagePadding :: !ByteString
 - pageDiskPageSize :: !DiskPageSize
 
 - data PageSizesOffsets = PageSizesOffsets {
- sizeDirectory :: !Word16
 - sizeBlobRefBitmap :: !Word16
 - sizeOperations :: !Word16
 - sizeBlobRefs :: !Word16
 - sizeKeyOffsets :: !Word16
 - sizeValueOffsets :: !Word16
 - sizeKeys :: !Word16
 - sizeValues :: !Word32
 - offBlobRefBitmap :: !Word16
 - offOperations :: !Word16
 - offBlobRefs :: !Word16
 - offKeyOffsets :: !Word16
 - offValueOffsets :: !Word16
 - offKeys :: !Word16
 - offValues :: !Word16
 - sizePageUsed :: !Word32
 - sizePagePadding :: !Word32
 - sizePageDiskPage :: !Word32
 
 - data PageSize = PageSize {
- pageSizeElems :: !Int
 - pageSizeBlobs :: !Int
 - pageSizeBytes :: !Int
 - pageSizeDisk :: !DiskPageSize
 
 - pageSizeEmpty :: DiskPageSize -> PageSize
 - pageSizeAddElem :: (Key, Operation) -> PageSize -> Maybe PageSize
 - calcPageSize :: DiskPageSize -> [(Key, Operation)] -> Maybe PageSize
 - data DiskPageSize
 - diskPageSizeBytes :: DiskPageSize -> Int
 - encodePage :: DiskPageSize -> [(Key, Operation)] -> Maybe PageIntermediate
 - decodePage :: PageIntermediate -> [(Key, Operation)]
 - serialisePage :: PageIntermediate -> PageSerialised
 - deserialisePage :: DiskPageSize -> PageSerialised -> PageIntermediate
 - fromBitmap :: [Word64] -> [Bool]
 - toBitmap :: [Bool] -> [Word64]
 - pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int)
 - pageDiskPages :: PageIntermediate -> Int
 - pageSerialisedChunks :: DiskPageSize -> PageSerialised -> [ByteString]
 - data PageContentFits = PageContentFits DiskPageSize [(Key, Operation)]
 - genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)]
 - data PageContentMaybeOverfull = PageContentMaybeOverfull DiskPageSize [(Key, Operation)]
 - genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)]
 - data PageContentSingle = PageContentSingle DiskPageSize Key Operation
 - genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation)
 - genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)]
 - genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)]
 - newtype MinKeySize = MinKeySize Int
 - noMinKeySize :: MinKeySize
 - maxKeySize :: DiskPageSize -> Int
 - orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)]
 - shrinkKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]]
 - shrinkOrderedKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]]
 
Page types
Constructors
| Key | |
Fields 
  | |
opHasBlobRef :: Operation -> Bool Source #
type PageSerialised = ByteString Source #
A serialised page consists of either a single disk page or several
 disk pages. The latter is a primary page followed by one or more overflow
 pages. Each disk page (single or multi) uses the same DiskPageSize, which
 should be known from context (e.g. configuration).
data PageIntermediate Source #
Constructors
| PageIntermediate | |
Fields 
  | |
Instances
| Show PageIntermediate Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageIntermediate -> ShowS # show :: PageIntermediate -> String # showList :: [PageIntermediate] -> ShowS #  | |
| Eq PageIntermediate Source # | |
Defined in FormatPage Methods (==) :: PageIntermediate -> PageIntermediate -> Bool # (/=) :: PageIntermediate -> PageIntermediate -> Bool #  | |
data PageSizesOffsets Source #
Constructors
| PageSizesOffsets | |
Fields 
  | |
Instances
| Show PageSizesOffsets Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageSizesOffsets -> ShowS # show :: PageSizesOffsets -> String # showList :: [PageSizesOffsets] -> ShowS #  | |
| Eq PageSizesOffsets Source # | |
Defined in FormatPage Methods (==) :: PageSizesOffsets -> PageSizesOffsets -> Bool # (/=) :: PageSizesOffsets -> PageSizesOffsets -> Bool #  | |
Page size
Constructors
| PageSize | |
Fields 
  | |
pageSizeEmpty :: DiskPageSize -> PageSize Source #
calcPageSize :: DiskPageSize -> [(Key, Operation)] -> Maybe PageSize Source #
Encoding and decoding
data DiskPageSize Source #
A serialised page fits within chunks of memory of 4k, 8k, 16k, 32k or 64k.
Constructors
| DiskPage4k | |
| DiskPage8k | |
| DiskPage16k | |
| DiskPage32k | |
| DiskPage64k | 
Instances
| Arbitrary DiskPageSize Source # | |
Defined in FormatPage  | |
| Bounded DiskPageSize Source # | |
Defined in FormatPage  | |
| Enum DiskPageSize Source # | |
Defined in FormatPage Methods succ :: DiskPageSize -> DiskPageSize # pred :: DiskPageSize -> DiskPageSize # toEnum :: Int -> DiskPageSize # fromEnum :: DiskPageSize -> Int # enumFrom :: DiskPageSize -> [DiskPageSize] # enumFromThen :: DiskPageSize -> DiskPageSize -> [DiskPageSize] # enumFromTo :: DiskPageSize -> DiskPageSize -> [DiskPageSize] # enumFromThenTo :: DiskPageSize -> DiskPageSize -> DiskPageSize -> [DiskPageSize] #  | |
| Show DiskPageSize Source # | |
Defined in FormatPage Methods showsPrec :: Int -> DiskPageSize -> ShowS # show :: DiskPageSize -> String # showList :: [DiskPageSize] -> ShowS #  | |
| Eq DiskPageSize Source # | |
Defined in FormatPage  | |
diskPageSizeBytes :: DiskPageSize -> Int Source #
encodePage :: DiskPageSize -> [(Key, Operation)] -> Maybe PageIntermediate Source #
decodePage :: PageIntermediate -> [(Key, Operation)] Source #
fromBitmap :: [Word64] -> [Bool] Source #
Overflow pages
pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int) Source #
If a page uses overflow pages, return the:
- value prefix length (within the first page)
 - value suffix length (within the overflow pages)
 
pageDiskPages :: PageIntermediate -> Int Source #
The total number of disk pages, including any overflow pages.
pageSerialisedChunks :: DiskPageSize -> PageSerialised -> [ByteString] Source #
Generators and shrinkers
data PageContentFits Source #
Constructors
| PageContentFits DiskPageSize [(Key, Operation)] | 
Instances
| Arbitrary PageContentFits Source # | |
Defined in FormatPage Methods arbitrary :: Gen PageContentFits Source # shrink :: PageContentFits -> [PageContentFits] Source #  | |
| Show PageContentFits Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageContentFits -> ShowS # show :: PageContentFits -> String # showList :: [PageContentFits] -> ShowS #  | |
genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #
Generate a test case consisting of a key/operation sequence that is guaranteed to fit into a given disk page size.
The distribution is designed to cover:
- small pages
 - medium sized pages
 - nearly full pages
 - plus single key pages (possibly using one or more overflow pages)
 - a corner case of a single large key/operation pair followed by some small key op pairs.
 
The keys are not ordered: use orderdKeyOps to sort and de-duplicate them
 if that is needed (but note this will change the order of key sizes).
data PageContentMaybeOverfull Source #
Constructors
| PageContentMaybeOverfull DiskPageSize [(Key, Operation)] | 
Instances
| Arbitrary PageContentMaybeOverfull Source # | |
Defined in FormatPage Methods arbitrary :: Gen PageContentMaybeOverfull Source # shrink :: PageContentMaybeOverfull -> [PageContentMaybeOverfull] Source #  | |
| Show PageContentMaybeOverfull Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageContentMaybeOverfull -> ShowS # show :: PageContentMaybeOverfull -> String # showList :: [PageContentMaybeOverfull] -> ShowS #  | |
genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #
Generate a test case consisting of a key/operation sequence that is not guaranteed to fit into a given disk page size.
These test cases are useful for checking the boundary conditions for what
 can fit into a disk page. This covers a similar distribution to
 genPageContentFits but also includes about 20% of pages that are over full,
 including the corner case of a large key ops pair followed by smaller key op
 pairs (again possibly over full).
The keys are not ordered: use orderdKeyOps to sort and de-duplicate them
 if that is needed.
data PageContentSingle Source #
Constructors
| PageContentSingle DiskPageSize Key Operation | 
Instances
| Arbitrary PageContentSingle Source # | |
Defined in FormatPage Methods arbitrary :: Gen PageContentSingle Source # shrink :: PageContentSingle -> [PageContentSingle] Source #  | |
| Show PageContentSingle Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageContentSingle -> ShowS # show :: PageContentSingle -> String # showList :: [PageContentSingle] -> ShowS #  | |
genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation) Source #
Generate a test case consisting of a single key/operation pair.
genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #
Generate only pages that are nearly full. This isn't the maximum possible size, but where adding one more randomly-chosen key/op pair would not fit (but perhaps a smaller pair would still fit).
Consider if you really need this: the genPageContentMedium also includes
 these cases naturally as part of its distribution. On the other hand, this
 can be good for generating benchmark data.
genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #
This generates a reasonable "middle" distribution of page sizes (relative to the given disk page size). In particular it covers:
- small pages (~45% for 4k pages, ~15% for 64k pages)
 - near-maximum pages (~20% for 4k pages, ~20% for 64k pages)
 - some in between (~35% for 4k pages, ~60% for 64k pages)
 
The numbers above are when used with the normal arbitrary Key and
 Value generators. And with these generators, it tends to use lots of
 small-to-medium size keys and values, rather than a few huge ones.
newtype MinKeySize Source #
In some use cases it is necessary to generate Keys that are at least of
 some minimum length. Use noMinKeySize if no such constraint is need.
Constructors
| MinKeySize Int | 
Instances
| Show MinKeySize Source # | |
Defined in FormatPage Methods showsPrec :: Int -> MinKeySize -> ShowS # show :: MinKeySize -> String # showList :: [MinKeySize] -> ShowS #  | |
noMinKeySize :: MinKeySize Source #
No minimum key size: MinKeySize 0.
maxKeySize :: DiskPageSize -> Int Source #
The maximum size of key that is guaranteed to always fit in an empty 4k page. So this is a worst case maximum size: this size key will fit irrespective of the corresponding operation, including the possibility that the key/op pair has a blob reference.
orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)] Source #
Sort and de-duplicate a key/operation sequence to ensure the sequence is strictly ascending by key.
If you need this in a QC generator, you will need shrinkOrderedKeyOps in
 the corresponding shrinker.