Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
FormatPage
Description
This accompanies the format-page.md documentation as a sanity check and a precise reference. It is intended to demonstrate that the page format works. It is also used as a reference implementation for tests of the real implementation.
Logically, a page is a sequence of key,operation pairs (with optional blobrefs), sorted by key, and its serialised form fits within a disk page.
This reference implementation covers serialisation and deserialisation (not lookups) which do not rely on (or enforce) the keys being sorted.
Synopsis
- newtype Key = Key {
- unKey :: ByteString
- data Operation
- opHasBlobRef :: Operation -> Bool
- newtype Value = Value {}
- data BlobRef = BlobRef Word64 Word32
- type PageSerialised = ByteString
- data PageIntermediate = PageIntermediate {
- pageNumKeys :: !Word16
- pageNumBlobs :: !Word16
- pageSizesOffsets :: !PageSizesOffsets
- pageBlobRefBitmap :: [Bool]
- pageOperations :: [OperationEnum]
- pageBlobRefs :: [BlobRef]
- pageKeyOffsets :: [Word16]
- pageValueOffsets :: Either [Word16] (Word16, Word32)
- pageKeys :: !ByteString
- pageValues :: !ByteString
- pagePadding :: !ByteString
- pageDiskPageSize :: !DiskPageSize
- data PageSizesOffsets = PageSizesOffsets {
- sizeDirectory :: !Word16
- sizeBlobRefBitmap :: !Word16
- sizeOperations :: !Word16
- sizeBlobRefs :: !Word16
- sizeKeyOffsets :: !Word16
- sizeValueOffsets :: !Word16
- sizeKeys :: !Word16
- sizeValues :: !Word32
- offBlobRefBitmap :: !Word16
- offOperations :: !Word16
- offBlobRefs :: !Word16
- offKeyOffsets :: !Word16
- offValueOffsets :: !Word16
- offKeys :: !Word16
- offValues :: !Word16
- sizePageUsed :: !Word32
- sizePagePadding :: !Word32
- sizePageDiskPage :: !Word32
- data PageSize = PageSize {
- pageSizeElems :: !Int
- pageSizeBlobs :: !Int
- pageSizeBytes :: !Int
- pageSizeDisk :: !DiskPageSize
- pageSizeEmpty :: DiskPageSize -> PageSize
- pageSizeAddElem :: (Key, Operation) -> PageSize -> Maybe PageSize
- calcPageSize :: DiskPageSize -> [(Key, Operation)] -> Maybe PageSize
- data DiskPageSize
- diskPageSizeBytes :: DiskPageSize -> Int
- encodePage :: DiskPageSize -> [(Key, Operation)] -> Maybe PageIntermediate
- decodePage :: PageIntermediate -> [(Key, Operation)]
- serialisePage :: PageIntermediate -> PageSerialised
- deserialisePage :: DiskPageSize -> PageSerialised -> PageIntermediate
- fromBitmap :: [Word64] -> [Bool]
- toBitmap :: [Bool] -> [Word64]
- pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int)
- pageDiskPages :: PageIntermediate -> Int
- pageSerialisedChunks :: DiskPageSize -> PageSerialised -> [ByteString]
- data PageContentFits = PageContentFits DiskPageSize [(Key, Operation)]
- genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)]
- data PageContentMaybeOverfull = PageContentMaybeOverfull DiskPageSize [(Key, Operation)]
- genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)]
- data PageContentSingle = PageContentSingle DiskPageSize Key Operation
- genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation)
- genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)]
- genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)]
- newtype MinKeySize = MinKeySize Int
- noMinKeySize :: MinKeySize
- maxKeySize :: DiskPageSize -> Int
- orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)]
- shrinkKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]]
- shrinkOrderedKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]]
Page types
Constructors
Key | |
Fields
|
opHasBlobRef :: Operation -> Bool Source #
type PageSerialised = ByteString Source #
A serialised page consists of either a single disk page or several
disk pages. The latter is a primary page followed by one or more overflow
pages. Each disk page (single or multi) uses the same DiskPageSize
, which
should be known from context (e.g. configuration).
data PageIntermediate Source #
Constructors
PageIntermediate | |
Fields
|
Instances
Show PageIntermediate Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageIntermediate -> ShowS # show :: PageIntermediate -> String # showList :: [PageIntermediate] -> ShowS # | |
Eq PageIntermediate Source # | |
Defined in FormatPage Methods (==) :: PageIntermediate -> PageIntermediate -> Bool # (/=) :: PageIntermediate -> PageIntermediate -> Bool # |
data PageSizesOffsets Source #
Constructors
PageSizesOffsets | |
Fields
|
Instances
Show PageSizesOffsets Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageSizesOffsets -> ShowS # show :: PageSizesOffsets -> String # showList :: [PageSizesOffsets] -> ShowS # | |
Eq PageSizesOffsets Source # | |
Defined in FormatPage Methods (==) :: PageSizesOffsets -> PageSizesOffsets -> Bool # (/=) :: PageSizesOffsets -> PageSizesOffsets -> Bool # |
Page size
Constructors
PageSize | |
Fields
|
pageSizeEmpty :: DiskPageSize -> PageSize Source #
calcPageSize :: DiskPageSize -> [(Key, Operation)] -> Maybe PageSize Source #
Encoding and decoding
data DiskPageSize Source #
A serialised page fits within chunks of memory of 4k, 8k, 16k, 32k or 64k.
Constructors
DiskPage4k | |
DiskPage8k | |
DiskPage16k | |
DiskPage32k | |
DiskPage64k |
Instances
Arbitrary DiskPageSize Source # | |
Defined in FormatPage | |
Bounded DiskPageSize Source # | |
Defined in FormatPage | |
Enum DiskPageSize Source # | |
Defined in FormatPage Methods succ :: DiskPageSize -> DiskPageSize # pred :: DiskPageSize -> DiskPageSize # toEnum :: Int -> DiskPageSize # fromEnum :: DiskPageSize -> Int # enumFrom :: DiskPageSize -> [DiskPageSize] # enumFromThen :: DiskPageSize -> DiskPageSize -> [DiskPageSize] # enumFromTo :: DiskPageSize -> DiskPageSize -> [DiskPageSize] # enumFromThenTo :: DiskPageSize -> DiskPageSize -> DiskPageSize -> [DiskPageSize] # | |
Show DiskPageSize Source # | |
Defined in FormatPage Methods showsPrec :: Int -> DiskPageSize -> ShowS # show :: DiskPageSize -> String # showList :: [DiskPageSize] -> ShowS # | |
Eq DiskPageSize Source # | |
Defined in FormatPage |
diskPageSizeBytes :: DiskPageSize -> Int Source #
encodePage :: DiskPageSize -> [(Key, Operation)] -> Maybe PageIntermediate Source #
decodePage :: PageIntermediate -> [(Key, Operation)] Source #
fromBitmap :: [Word64] -> [Bool] Source #
Overflow pages
pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int) Source #
If a page uses overflow pages, return the:
- value prefix length (within the first page)
- value suffix length (within the overflow pages)
pageDiskPages :: PageIntermediate -> Int Source #
The total number of disk pages, including any overflow pages.
pageSerialisedChunks :: DiskPageSize -> PageSerialised -> [ByteString] Source #
Generators and shrinkers
data PageContentFits Source #
Constructors
PageContentFits DiskPageSize [(Key, Operation)] |
Instances
Arbitrary PageContentFits Source # | |
Defined in FormatPage Methods arbitrary :: Gen PageContentFits Source # shrink :: PageContentFits -> [PageContentFits] Source # | |
Show PageContentFits Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageContentFits -> ShowS # show :: PageContentFits -> String # showList :: [PageContentFits] -> ShowS # |
genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #
Generate a test case consisting of a key/operation sequence that is guaranteed to fit into a given disk page size.
The distribution is designed to cover:
- small pages
- medium sized pages
- nearly full pages
- plus single key pages (possibly using one or more overflow pages)
- a corner case of a single large key/operation pair followed by some small key op pairs.
The keys are not ordered: use orderdKeyOps
to sort and de-duplicate them
if that is needed (but note this will change the order of key sizes).
data PageContentMaybeOverfull Source #
Constructors
PageContentMaybeOverfull DiskPageSize [(Key, Operation)] |
Instances
Arbitrary PageContentMaybeOverfull Source # | |
Defined in FormatPage Methods arbitrary :: Gen PageContentMaybeOverfull Source # shrink :: PageContentMaybeOverfull -> [PageContentMaybeOverfull] Source # | |
Show PageContentMaybeOverfull Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageContentMaybeOverfull -> ShowS # show :: PageContentMaybeOverfull -> String # showList :: [PageContentMaybeOverfull] -> ShowS # |
genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #
Generate a test case consisting of a key/operation sequence that is not guaranteed to fit into a given disk page size.
These test cases are useful for checking the boundary conditions for what
can fit into a disk page. This covers a similar distribution to
genPageContentFits
but also includes about 20% of pages that are over full,
including the corner case of a large key ops pair followed by smaller key op
pairs (again possibly over full).
The keys are not ordered: use orderdKeyOps
to sort and de-duplicate them
if that is needed.
data PageContentSingle Source #
Constructors
PageContentSingle DiskPageSize Key Operation |
Instances
Arbitrary PageContentSingle Source # | |
Defined in FormatPage Methods arbitrary :: Gen PageContentSingle Source # shrink :: PageContentSingle -> [PageContentSingle] Source # | |
Show PageContentSingle Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageContentSingle -> ShowS # show :: PageContentSingle -> String # showList :: [PageContentSingle] -> ShowS # |
genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation) Source #
Generate a test case consisting of a single key/operation pair.
genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #
Generate only pages that are nearly full. This isn't the maximum possible size, but where adding one more randomly-chosen key/op pair would not fit (but perhaps a smaller pair would still fit).
Consider if you really need this: the genPageContentMedium
also includes
these cases naturally as part of its distribution. On the other hand, this
can be good for generating benchmark data.
genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #
This generates a reasonable "middle" distribution of page sizes (relative to the given disk page size). In particular it covers:
- small pages (~45% for 4k pages, ~15% for 64k pages)
- near-maximum pages (~20% for 4k pages, ~20% for 64k pages)
- some in between (~35% for 4k pages, ~60% for 64k pages)
The numbers above are when used with the normal arbitrary
Key
and
Value
generators. And with these generators, it tends to use lots of
small-to-medium size keys and values, rather than a few huge ones.
newtype MinKeySize Source #
In some use cases it is necessary to generate Keys
that are at least of
some minimum length. Use noMinKeySize
if no such constraint is need.
Constructors
MinKeySize Int |
Instances
Show MinKeySize Source # | |
Defined in FormatPage Methods showsPrec :: Int -> MinKeySize -> ShowS # show :: MinKeySize -> String # showList :: [MinKeySize] -> ShowS # |
noMinKeySize :: MinKeySize Source #
No minimum key size: MinKeySize 0
.
maxKeySize :: DiskPageSize -> Int Source #
The maximum size of key that is guaranteed to always fit in an empty 4k page. So this is a worst case maximum size: this size key will fit irrespective of the corresponding operation, including the possibility that the key/op pair has a blob reference.
orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)] Source #
Sort and de-duplicate a key/operation sequence to ensure the sequence is strictly ascending by key.
If you need this in a QC generator, you will need shrinkOrderedKeyOps
in
the corresponding shrinker.