Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
FormatPage
Description
This accompanies the format-page.md documentation as a sanity check and a precise reference. It is intended to demonstrate that the page format works. It is also used as a reference implementation for tests of the real implementation.
Logically, a page is a sequence of key,operation pairs (with optional blobrefs), sorted by key, and its serialised form fits within a disk page.
This reference implementation covers serialisation and deserialisation (not lookups) which do not rely on (or enforce) the keys being sorted.
Synopsis
- newtype Key = Key {
- unKey :: ByteString
- data Operation
- newtype Value = Value {}
- data BlobRef = BlobRef Word64 Word32
- type PageSerialised = ByteString
- data PageIntermediate
- data PageSize = PageSize {
- pageSizeElems :: !Int
- pageSizeBlobs :: !Int
- pageSizeBytes :: !Int
- pageSizeDisk :: !DiskPageSize
- pageSizeEmpty :: DiskPageSize -> PageSize
- pageSizeAddElem :: (Key, Operation) -> PageSize -> Maybe PageSize
- calcPageSize :: DiskPageSize -> [(Key, Operation)] -> Maybe PageSize
- data DiskPageSize
- encodePage :: DiskPageSize -> [(Key, Operation)] -> Maybe PageIntermediate
- decodePage :: PageIntermediate -> [(Key, Operation)]
- serialisePage :: PageIntermediate -> PageSerialised
- deserialisePage :: DiskPageSize -> PageSerialised -> PageIntermediate
- pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int)
- pageDiskPages :: PageIntermediate -> Int
- pageSerialisedChunks :: DiskPageSize -> PageSerialised -> [ByteString]
- tests :: TestTree
- genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)]
- genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)]
- genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation)
- genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)]
- genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)]
- newtype MinKeySize = MinKeySize Int
- noMinKeySize :: MinKeySize
- orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)]
- shrinkKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]]
- shrinkOrderedKeyOps :: [(Key, Operation)] -> [[(Key, Operation)]]
Page types
Constructors
Key | |
Fields
|
type PageSerialised = ByteString Source #
A serialised page consists of either a single disk page or several
disk pages. The latter is a primary page followed by one or more overflow
pages. Each disk page (single or multi) uses the same DiskPageSize
, which
should be known from context (e.g. configuration).
data PageIntermediate Source #
Instances
Show PageIntermediate Source # | |
Defined in FormatPage Methods showsPrec :: Int -> PageIntermediate -> ShowS # show :: PageIntermediate -> String # showList :: [PageIntermediate] -> ShowS # | |
Eq PageIntermediate Source # | |
Defined in FormatPage Methods (==) :: PageIntermediate -> PageIntermediate -> Bool # (/=) :: PageIntermediate -> PageIntermediate -> Bool # |
Page size
Constructors
PageSize | |
Fields
|
pageSizeEmpty :: DiskPageSize -> PageSize Source #
calcPageSize :: DiskPageSize -> [(Key, Operation)] -> Maybe PageSize Source #
Encoding and decoding
data DiskPageSize Source #
A serialised page fits within chunks of memory of 4k, 8k, 16k, 32k or 64k.
Constructors
DiskPage4k | |
DiskPage8k | |
DiskPage16k | |
DiskPage32k | |
DiskPage64k |
Instances
Arbitrary DiskPageSize Source # | |
Defined in FormatPage | |
Bounded DiskPageSize Source # | |
Defined in FormatPage | |
Enum DiskPageSize Source # | |
Defined in FormatPage Methods succ :: DiskPageSize -> DiskPageSize # pred :: DiskPageSize -> DiskPageSize # toEnum :: Int -> DiskPageSize # fromEnum :: DiskPageSize -> Int # enumFrom :: DiskPageSize -> [DiskPageSize] # enumFromThen :: DiskPageSize -> DiskPageSize -> [DiskPageSize] # enumFromTo :: DiskPageSize -> DiskPageSize -> [DiskPageSize] # enumFromThenTo :: DiskPageSize -> DiskPageSize -> DiskPageSize -> [DiskPageSize] # | |
Show DiskPageSize Source # | |
Defined in FormatPage Methods showsPrec :: Int -> DiskPageSize -> ShowS # show :: DiskPageSize -> String # showList :: [DiskPageSize] -> ShowS # | |
Eq DiskPageSize Source # | |
Defined in FormatPage |
encodePage :: DiskPageSize -> [(Key, Operation)] -> Maybe PageIntermediate Source #
decodePage :: PageIntermediate -> [(Key, Operation)] Source #
Overflow pages
pageOverflowPrefixSuffixLen :: PageIntermediate -> Maybe (Int, Int) Source #
If a page uses overflow pages, return the:
- value prefix length (within the first page)
- value suffix length (within the overflow pages)
pageDiskPages :: PageIntermediate -> Int Source #
The total number of disk pages, including any overflow pages.
pageSerialisedChunks :: DiskPageSize -> PageSerialised -> [ByteString] Source #
Tests and generators
Generators and shrinkers
genPageContentFits :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #
Generate a test case consisting of a key/operation sequence that is guaranteed to fit into a given disk page size.
The distribution is designed to cover:
- small pages
- medium sized pages
- nearly full pages
- plus single key pages (possibly using one or more overflow pages)
- a corner case of a single large key/operation pair followed by some small key op pairs.
The keys are not ordered: use orderdKeyOps
to sort and de-duplicate them
if that is needed (but note this will change the order of key sizes).
genPageContentMaybeOverfull :: DiskPageSize -> MinKeySize -> Gen [(Key, Operation)] Source #
Generate a test case consisting of a key/operation sequence that is not guaranteed to fit into a given disk page size.
These test cases are useful for checking the boundary conditions for what
can fit into a disk page. This covers a similar distribution to
genPageContentFits
but also includes about 20% of pages that are over full,
including the corner case of a large key ops pair followed by smaller key op
pairs (again possibly over full).
The keys are not ordered: use orderdKeyOps
to sort and de-duplicate them
if that is needed.
genPageContentSingle :: DiskPageSize -> MinKeySize -> Gen (Key, Operation) Source #
Generate a test case consisting of a single key/operation pair.
genPageContentNearFull :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #
Generate only pages that are nearly full. This isn't the maximum possible size, but where adding one more randomly-chosen key/op pair would not fit (but perhaps a smaller pair would still fit).
Consider if you really need this: the genPageContentMedium
also includes
these cases naturally as part of its distribution. On the other hand, this
can be good for generating benchmark data.
genPageContentMedium :: DiskPageSize -> Gen Key -> Gen Value -> Gen [(Key, Operation)] Source #
This generates a reasonable "middle" distribution of page sizes (relative to the given disk page size). In particular it covers:
- small pages (~45% for 4k pages, ~15% for 64k pages)
- near-maximum pages (~20% for 4k pages, ~20% for 64k pages)
- some in between (~35% for 4k pages, ~60% for 64k pages)
The numbers above are when used with the normal arbitrary
Key
and
Value
generators. And with these generators, it tends to use lots of
small-to-medium size keys and values, rather than a few huge ones.
newtype MinKeySize Source #
In some use cases it is necessary to generate Keys
that are at least of
some minimum length. Use noMinKeySize
if no such constraint is need.
Constructors
MinKeySize Int |
Instances
Show MinKeySize Source # | |
Defined in FormatPage Methods showsPrec :: Int -> MinKeySize -> ShowS # show :: MinKeySize -> String # showList :: [MinKeySize] -> ShowS # |
noMinKeySize :: MinKeySize Source #
No minimum key size: MinKeySize 0
.
orderdKeyOps :: [(Key, Operation)] -> [(Key, Operation)] Source #
Sort and de-duplicate a key/operation sequence to ensure the sequence is strictly ascending by key.
If you need this in a QC generator, you will need shrinkOrderedKeyOps
in
the corresponding shrinker.