lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

Database.LSMTree

Description

This module is experimental. It is mainly used for testing purposes.

See the Normal and Monoidal modules for documentation.

TODO: we have to decide whether we want to duplicate haddocks across public API modules, or if we want to pick a single source of reference for haddocks. Until then, the documentation on definitions in this module is omitted.

Synopsis

Exceptions

data TableUnionNotCompatibleError Source #

A table union was constructed with two tables that are not compatible.

Constructors

ErrTableUnionHandleTypeMismatch 

Fields

  • !Int

    The index of the first table.

  • !TypeRep

    The type of the filesystem handle of the first table.

  • !Int

    The index of the second table.

  • !TypeRep

    The type of the filesystem handle of the second table.

ErrTableUnionSessionMismatch 

Fields

  • !Int

    The index of the first table.

  • !FsErrorPath

    The session directory of the first table.

  • !Int

    The index of the second table.

  • !FsErrorPath

    The session directory of the second table.

data SnapshotNotCompatibleError Source #

The named snapshot is not compatible with the expected type.

Constructors

ErrSnapshotWrongTableType

The named snapshot is not compatible with the current public API module. For instance, the snapshot was created using the simple API, but was opened using the full API.

Fields

ErrSnapshotWrongLabel

The named snapshot is not compatible with the given label.

Fields

data BlobRefInvalidError Source #

A BlobRef used with retrieveBlobs was invalid.

BlobRefs are obtained from lookups in a Table, but they may be invalidated by subsequent changes in that Table. In general the reliable way to retrieve blobs is not to change the Table before retrieving the blobs. To allow later retrievals, duplicate the table before making modifications and keep the table open until all blob retrievals are complete.

Constructors

ErrBlobRefInvalid !Int

The Int index indicates the first invalid BlobRef.

data FileCorruptedError Source #

The file is corrupted.

Constructors

ErrFileFormatInvalid

The file fails to parse.

Fields

ErrFileChecksumMismatch

The file CRC32 checksum is invalid.

Fields

Tracing

data MergeTrace Source #

Constructors

TraceFlushWriteBuffer 

Fields

TraceAddLevel 
TraceAddRun 

Fields

TraceNewMerge 
TraceNewMergeSingleRun 

Fields

TraceCompletedMerge 

Fields

TraceExpectCompletedMerge RunNumber

This is traced at the latest point the merge could complete.

Instances

Instances details
Show MergeTrace Source # 
Instance details

Defined in Database.LSMTree.Internal.MergeSchedule

Table sessions

type Session = Session' Source #

A session provides context that is shared across multiple tables.

Sessions are needed to support sharing between multiple table instances. Sharing occurs when tables are duplicated using duplicate, or when tables are combined using union. Sharing is not fully preserved by snapshots: existing runs are shared, but ongoing merges are not. As such, restoring of snapshots (using open) is expensive, but snapshotting (using snapshot) is relatively cheap.

The "monoidal" table types support a union operation, which has the constraint that the two input tables must be from the same Session.

Each session places files for table data under a given directory. It is not permitted to open multiple sessions for the same directory at once. Instead a session should be opened once and shared for all uses of tables. This restriction implies that tables cannot be shared between OS processes. The restriction is enforced using file locks.

Sessions support both related and unrelated tables. Related tables are created using duplicate, while unrelated tables can be created using new. It is possible to have multiple unrelated tables with different configuration and key and value types in the same session. Similarly, a session can have both "normal" and "monoidal" tables. For unrelated tables (that are not involved in a union) one has a choice between using multiple sessions or a shared session. Using multiple sessions requires using separate directories, while a shared session will place all files under one directory.

withSession :: (IOLike m, Typeable h) => Tracer m LSMTreeTrace -> HasFS m h -> HasBlockIO m h -> FsPath -> (Session m -> m a) -> m a Source #

(Asynchronous) exception-safe, bracketed opening and closing of a session.

If possible, it is recommended to use this function instead of openSession and closeSession.

openSession Source #

Arguments

:: forall m h. (IOLike m, Typeable h) 
=> Tracer m LSMTreeTrace 
-> HasFS m h 
-> HasBlockIO m h 
-> FsPath

Path to the session directory

-> m (Session m) 

Create either a new empty table session or open an existing table session, given the path to the session directory.

A new empty table session is created if the given directory is entirely empty. Otherwise it is intended to open an existing table session.

Sessions should be closed using closeSession when no longer needed. Consider using withSession instead.

Exceptions:

  • Throws an exception if the directory does not exist (regardless of whether it is empty or not).
  • This can throw exceptions if the directory does not have the expected file layout for a table session
  • It will throw an exception if the session is already open (in the current process or another OS process)

closeSession :: IOLike m => Session m -> m () Source #

Close the table session. closeSession is idempotent. All subsequent operations on the session or the tables within it will throw an exception.

This also closes any open tables and cursors in the session. It would typically be good practice however to close all tables first rather than relying on this for cleanup.

Closing a table session allows the session to be opened again elsewhere, for example in a different process. Note that the session will be closed automatically if the process is terminated (in particular the session file lock will be released).

Table

data TableConfig Source #

Table configuration parameters, including LSM tree tuning parameters.

Some config options are fixed (for now):

  • Merge policy: Tiering
  • Size ratio: 4

Constructors

TableConfig 

Fields

defaultTableConfig :: TableConfig Source #

A reasonable default TableConfig.

This uses a write buffer with up to 20,000 elements and a generous amount of memory for Bloom filters (FPR of 2%).

data MergePolicy Source #

Constructors

MergePolicyLazyLevelling

Use tiering on intermediate levels, and levelling on the last level. This makes it easier for delete operations to disappear on the last level.

data WriteBufferAlloc Source #

Allocation method for the write buffer.

Constructors

AllocNumEntries !NumEntries

Total number of key/value pairs that can be present in the write buffer before flushing the write buffer to disk.

NOTE: if the sizes of values vary greatly, this can lead to wonky runs on disk, and therefore unpredictable performance.

newtype NumEntries Source #

A count of entries, for example the number of entries in a run.

This number is limited by the machine's word size. On 32-bit systems, the maximum number we can represent is 2^31 which is roughly 2 billion. This should be a sufficiently large limit that we never reach it in practice. By extension for 64-bit and higher-bit systems this limit is also sufficiently large.

Constructors

NumEntries Int 

Instances

Instances details
Monoid NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Entry

Semigroup NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Entry

Show NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Entry

NFData NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Entry

Methods

rnf :: NumEntries -> () #

Eq NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Entry

Ord NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Entry

DecodeVersioned NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Snapshot.Codec

Encode NumEntries Source # 
Instance details

Defined in Database.LSMTree.Internal.Snapshot.Codec

data BloomFilterAlloc Source #

Allocation method for bloom filters.

NOTE: a physical database entry is a key/operation pair that exists in a file, i.e., a run. Multiple physical entries that have the same key constitute a logical database entry.

Constructors

AllocFixed

Allocate a fixed number of bits per physical entry in each bloom filter.

Fields

  • !Word64

    Bits per physical entry.

AllocRequestFPR

Allocate as many bits as required per physical entry to get the requested false-positive rate. Do this for each bloom filter.

Fields

data FencePointerIndexType Source #

Configure the type of fence pointer index.

Constructors

CompactIndex

Use a compact fence pointer index.

Compact indexes are designed to work with keys that are large (for example, 32 bytes long) cryptographic hashes.

When using a compact index, it is vital that the serialiseKey function satisfies the following law:

Minimal size
size (serialiseKey x) >= 8

Use serialiseKeyMinimalSize to test this law.

OrdinaryIndex

Use an ordinary fence pointer index

Ordinary indexes do not have any constraints on keys other than that their serialised forms may not be 64 KiB or more in size.

data DiskCachePolicy Source #

The policy for caching data from disk in memory (using the OS page cache).

Caching data in memory can improve performance if the access pattern has good access locality or if the overall data size fits within memory. On the other hand, caching is determental to performance and wastes memory if the access pattern has poor spatial or temporal locality.

This implementation is designed to have good performance using a cacheless policy, where main memory is used only to cache Bloom filters and indexes, but none of the key/value data itself. Nevertheless, some use cases will be faster if some or all of the key/value data is also cached in memory. This implementation does not do any custom caching of key/value data, relying simply on the OS page cache. Thus caching is done in units of 4kb disk pages (as opposed to individual key/value pairs for example).

Constructors

DiskCacheAll

Use the OS page cache to cache any/all key/value data in the table.

Use this policy if the expected access pattern for the table has a good spatial or temporal locality.

DiskCacheLevelsAtOrBelow !Int

Use the OS page cache to cache data in all LSMT levels at or below a given level number. For example, use 1 to cache the first level. (The write buffer is considered to be level 0.)

Use this policy if the expected access pattern for the table has good temporal locality for recently inserted keys.

DiskCacheNone

Do not cache any key/value data in any level (except the write buffer).

Use this policy if expected access pattern for the table has poor spatial or temporal locality, such as uniform random access.

data MergeSchedule Source #

A configuration option that determines how merges are stepped to completion. This does not affect the amount of work that is done by merges, only how the work is spread out over time.

Constructors

OneShot

Complete merges immediately when started.

The OneShot option will make the merging algorithm perform big batches of work in one go, so intermittent slow-downs can be expected. For use cases where unresponsiveness is unacceptable, e.g. in real-time systems, use Incremental instead.

Incremental

Schedule merges for incremental construction, and step the merge when updates are performed on a table.

The Incremental option spreads out merging work over time. More specifically, updates to a table can cause a small batch of merge work to be performed. The scheduling of these batches is designed such that merges are fully completed in time for when new merges are started on the same level.

defaultMergeSchedule :: MergeSchedule Source #

The default MergeSchedule.

>>> defaultMergeSchedule
Incremental

withTable :: IOLike m => Session m -> TableConfig -> (Table m k v b -> m a) -> m a Source #

new :: IOLike m => Session m -> TableConfig -> m (Table m k v b) Source #

close :: IOLike m => Table m k v b -> m () Source #

Table queries and updates

Queries

lookups :: forall m k v b. (IOLike m, SerialiseKey k, SerialiseValue v, ResolveValue v) => Table m k v b -> Vector k -> m (Vector (LookupResult v (BlobRef m b))) Source #

data LookupResult v b Source #

Result of a single point lookup.

Constructors

NotFound 
Found !v 
FoundWithBlob !v !b 

Instances

Instances details
Bifunctor LookupResult Source # 
Instance details

Defined in Database.LSMTree

Methods

bimap :: (a -> b) -> (c -> d) -> LookupResult a c -> LookupResult b d #

first :: (a -> b) -> LookupResult a c -> LookupResult b c #

second :: (b -> c) -> LookupResult a b -> LookupResult a c #

Foldable (LookupResult v) Source # 
Instance details

Defined in Database.LSMTree

Methods

fold :: Monoid m => LookupResult v m -> m #

foldMap :: Monoid m => (a -> m) -> LookupResult v a -> m #

foldMap' :: Monoid m => (a -> m) -> LookupResult v a -> m #

foldr :: (a -> b -> b) -> b -> LookupResult v a -> b #

foldr' :: (a -> b -> b) -> b -> LookupResult v a -> b #

foldl :: (b -> a -> b) -> b -> LookupResult v a -> b #

foldl' :: (b -> a -> b) -> b -> LookupResult v a -> b #

foldr1 :: (a -> a -> a) -> LookupResult v a -> a #

foldl1 :: (a -> a -> a) -> LookupResult v a -> a #

toList :: LookupResult v a -> [a] #

null :: LookupResult v a -> Bool #

length :: LookupResult v a -> Int #

elem :: Eq a => a -> LookupResult v a -> Bool #

maximum :: Ord a => LookupResult v a -> a #

minimum :: Ord a => LookupResult v a -> a #

sum :: Num a => LookupResult v a -> a #

product :: Num a => LookupResult v a -> a #

Traversable (LookupResult v) Source # 
Instance details

Defined in Database.LSMTree

Methods

traverse :: Applicative f => (a -> f b) -> LookupResult v a -> f (LookupResult v b) #

sequenceA :: Applicative f => LookupResult v (f a) -> f (LookupResult v a) #

mapM :: Monad m => (a -> m b) -> LookupResult v a -> m (LookupResult v b) #

sequence :: Monad m => LookupResult v (m a) -> m (LookupResult v a) #

Functor (LookupResult v) Source # 
Instance details

Defined in Database.LSMTree

Methods

fmap :: (a -> b) -> LookupResult v a -> LookupResult v b #

(<$) :: a -> LookupResult v b -> LookupResult v a #

(Show v, Show b) => Show (LookupResult v b) Source # 
Instance details

Defined in Database.LSMTree

(Eq v, Eq b) => Eq (LookupResult v b) Source # 
Instance details

Defined in Database.LSMTree

Methods

(==) :: LookupResult v b -> LookupResult v b -> Bool #

(/=) :: LookupResult v b -> LookupResult v b -> Bool #

rangeLookup :: forall m k v b. (IOLike m, SerialiseKey k, SerialiseValue v, ResolveValue v) => Table m k v b -> Range k -> m (Vector (QueryResult k v (BlobRef m b))) Source #

data Range k Source #

A range of keys.

Constructors

FromToExcluding k k

Inclusive lower bound, exclusive upper bound

FromToIncluding k k

Inclusive lower bound, inclusive upper bound

Instances

Instances details
Functor Range Source # 
Instance details

Defined in Database.LSMTree.Internal.Range

Methods

fmap :: (a -> b) -> Range a -> Range b #

(<$) :: a -> Range b -> Range a #

Show k => Show (Range k) Source # 
Instance details

Defined in Database.LSMTree.Internal.Range

Methods

showsPrec :: Int -> Range k -> ShowS #

show :: Range k -> String #

showList :: [Range k] -> ShowS #

NFData k => NFData (Range k) Source # 
Instance details

Defined in Database.LSMTree.Internal.Range

Methods

rnf :: Range k -> () #

Eq k => Eq (Range k) Source # 
Instance details

Defined in Database.LSMTree.Internal.Range

Methods

(==) :: Range k -> Range k -> Bool #

(/=) :: Range k -> Range k -> Bool #

data QueryResult k v b Source #

Constructors

FoundInQuery !k !v 
FoundInQueryWithBlob !k !v !b 

Instances

Instances details
Bifunctor (QueryResult k) Source # 
Instance details

Defined in Database.LSMTree

Methods

bimap :: (a -> b) -> (c -> d) -> QueryResult k a c -> QueryResult k b d #

first :: (a -> b) -> QueryResult k a c -> QueryResult k b c #

second :: (b -> c) -> QueryResult k a b -> QueryResult k a c #

Foldable (QueryResult k v) Source # 
Instance details

Defined in Database.LSMTree

Methods

fold :: Monoid m => QueryResult k v m -> m #

foldMap :: Monoid m => (a -> m) -> QueryResult k v a -> m #

foldMap' :: Monoid m => (a -> m) -> QueryResult k v a -> m #

foldr :: (a -> b -> b) -> b -> QueryResult k v a -> b #

foldr' :: (a -> b -> b) -> b -> QueryResult k v a -> b #

foldl :: (b -> a -> b) -> b -> QueryResult k v a -> b #

foldl' :: (b -> a -> b) -> b -> QueryResult k v a -> b #

foldr1 :: (a -> a -> a) -> QueryResult k v a -> a #

foldl1 :: (a -> a -> a) -> QueryResult k v a -> a #

toList :: QueryResult k v a -> [a] #

null :: QueryResult k v a -> Bool #

length :: QueryResult k v a -> Int #

elem :: Eq a => a -> QueryResult k v a -> Bool #

maximum :: Ord a => QueryResult k v a -> a #

minimum :: Ord a => QueryResult k v a -> a #

sum :: Num a => QueryResult k v a -> a #

product :: Num a => QueryResult k v a -> a #

Traversable (QueryResult k v) Source # 
Instance details

Defined in Database.LSMTree

Methods

traverse :: Applicative f => (a -> f b) -> QueryResult k v a -> f (QueryResult k v b) #

sequenceA :: Applicative f => QueryResult k v (f a) -> f (QueryResult k v a) #

mapM :: Monad m => (a -> m b) -> QueryResult k v a -> m (QueryResult k v b) #

sequence :: Monad m => QueryResult k v (m a) -> m (QueryResult k v a) #

Functor (QueryResult k v) Source # 
Instance details

Defined in Database.LSMTree

Methods

fmap :: (a -> b) -> QueryResult k v a -> QueryResult k v b #

(<$) :: a -> QueryResult k v b -> QueryResult k v a #

(Show k, Show v, Show b) => Show (QueryResult k v b) Source # 
Instance details

Defined in Database.LSMTree

Methods

showsPrec :: Int -> QueryResult k v b -> ShowS #

show :: QueryResult k v b -> String #

showList :: [QueryResult k v b] -> ShowS #

(Eq k, Eq v, Eq b) => Eq (QueryResult k v b) Source # 
Instance details

Defined in Database.LSMTree

Methods

(==) :: QueryResult k v b -> QueryResult k v b -> Bool #

(/=) :: QueryResult k v b -> QueryResult k v b -> Bool #

Cursor

withCursor :: IOLike m => Table m k v b -> (Cursor m k v b -> m a) -> m a Source #

withCursorAtOffset :: (IOLike m, SerialiseKey k) => k -> Table m k v b -> (Cursor m k v b -> m a) -> m a Source #

newCursor :: IOLike m => Table m k v b -> m (Cursor m k v b) Source #

newCursorAtOffset :: (IOLike m, SerialiseKey k) => k -> Table m k v b -> m (Cursor m k v b) Source #

closeCursor :: IOLike m => Cursor m k v b -> m () Source #

readCursor :: forall m k v b. (IOLike m, SerialiseKey k, SerialiseValue v, ResolveValue v) => Int -> Cursor m k v b -> m (Vector (QueryResult k v (BlobRef m b))) Source #

Updates

inserts :: (IOLike m, SerialiseKey k, SerialiseValue v, SerialiseValue b, ResolveValue v) => Table m k v b -> Vector (k, v, Maybe b) -> m () Source #

mupserts :: (IOLike m, SerialiseKey k, SerialiseValue v, SerialiseValue b, ResolveValue v) => Table m k v b -> Vector (k, v) -> m () Source #

updates :: forall m k v b. (IOLike m, SerialiseKey k, SerialiseValue v, SerialiseValue b, ResolveValue v) => Table m k v b -> Vector (k, Update v b) -> m () Source #

data Update v b Source #

Constructors

Insert !v !(Maybe b) 
Delete 
Mupsert !v 

Instances

Instances details
(Show b, Show v) => Show (Update v b) Source # 
Instance details

Defined in Database.LSMTree

Methods

showsPrec :: Int -> Update v b -> ShowS #

show :: Update v b -> String #

showList :: [Update v b] -> ShowS #

(NFData v, NFData b) => NFData (Update v b) Source # 
Instance details

Defined in Database.LSMTree

Methods

rnf :: Update v b -> () #

(Eq b, Eq v) => Eq (Update v b) Source # 
Instance details

Defined in Database.LSMTree

Methods

(==) :: Update v b -> Update v b -> Bool #

(/=) :: Update v b -> Update v b -> Bool #

Blobs

data BlobRef m b Source #

A handle-like reference to an on-disk blob. The blob can be retrieved based on the reference.

Blob comes from the acronym Binary Large OBject (BLOB) and in many database implementations refers to binary data that is larger than usual values and is handled specially. In our context we will allow optionally a blob associated with each value in the table.

Though blob references are handle-like, they do not keep files open. As such, when a blob reference is returned by a lookup, modifying the corresponding table, cursor, or session may cause the blob reference to be invalidated (i.e.., the blob has gone missing because the blob file was removed). These operations include:

  • Updates (e.g., inserts, deletes, mupserts)
  • Closing tables
  • Closing cursors
  • Closing sessions

An invalidated blob reference will throw an exception when used to look up a blob. Note that operations such as snapshotting, duplication and cursor reads do not invalidate blob references. These operations do not modify the logical contents or state of a table.

Blob reference validity
as long as the table or cursor that the blob reference originated from is not updated or closed, the blob reference will be valid.

Exception: currently the snapshot operation also invalidates BlobRefs, but it should not do. See https://github.com/IntersectMBO/lsm-tree/issues/392

TODO: get rid of the m parameter?

Instances

Instances details
Show (BlobRef m b) Source # 
Instance details

Defined in Database.LSMTree.Common

Methods

showsPrec :: Int -> BlobRef m b -> ShowS #

show :: BlobRef m b -> String #

showList :: [BlobRef m b] -> ShowS #

Durability (snapshots)

isValidSnapshotName :: String -> Bool Source #

Check if a String would be a valid snapshot name.

Snapshot names consist of lowercase characters, digits, dashes -, and underscores _, and must be between 1 and 64 characters long. >>> isValidSnapshotName "main" True

>>> isValidSnapshotName "temporary-123-test_"
True
>>> isValidSnapshotName "UPPER"
False
>>> isValidSnapshotName "dir/dot.exe"
False
>>> isValidSnapshotName ".."
False
>>> isValidSnapshotName "\\"
False
>>> isValidSnapshotName ""
False
>>> isValidSnapshotName (replicate 100 'a')
False

Snapshot names must be valid directory on both POSIX and Windows. This rules out the following reserved file and directory names on Windows:

>>> isValidSnapshotName "con"
False
>>> isValidSnapshotName "prn"
False
>>> isValidSnapshotName "aux"
False
>>> isValidSnapshotName "nul"
False
>>> isValidSnapshotName "com1" -- "com2", "com3", etc.
False
>>> isValidSnapshotName "lpt1" -- "lpt2", "lpt3", etc.
False

See, e.g., the VBA docs for the "Bad file name or number" error.

toSnapshotName :: String -> SnapshotName Source #

Create snapshot name.

The given string must satsify isValidSnapshotName.

Throws the following exceptions:

InvalidSnapshotNameError
If the given string is not a valid snapshot name.

newtype SnapshotLabel Source #

Custom, user-supplied text that is included in the metadata.

The main use case for a SnapshotLabel is for the user to supply textual information about the key/value/blob type for the table that corresponds to the snapshot. This information is used to dynamically check that a snapshot is opened at the correct key/value/blob type.

Constructors

SnapshotLabel Text 

createSnapshot :: forall m k v b. IOLike m => SnapshotLabel -> SnapshotName -> Table m k v b -> m () Source #

openSnapshot Source #

Arguments

:: forall m k v b. (IOLike m, ResolveValue v) 
=> Session m 
-> TableConfigOverride

Optional config override

-> SnapshotLabel 
-> SnapshotName 
-> m (Table m k v b) 

data TableConfigOverride Source #

Override configuration options in TableConfig that can be changed dynamically.

Some parts of the TableConfig are considered fixed after a table is created. That is, these options should (i) should stay the same over the lifetime of a table, and (ii) these options should not be changed when a snapshot is created or loaded. Other options can be changed dynamically without sacrificing correctness.

This type has Semigroup and Monoid instances for composing override options.

deleteSnapshot :: IOLike m => Session m -> SnapshotName -> m () Source #

Delete a named snapshot.

NOTE: has similar behaviour to removeDirectory.

Exceptions:

  • Deleting a snapshot that doesn't exist is an error.

listSnapshots :: IOLike m => Session m -> m [SnapshotName] Source #

List snapshots by name.

Persistence

duplicate :: IOLike m => Table m k v b -> m (Table m k v b) Source #

Table union

union :: forall m k v b. IOLike m => Table m k v b -> Table m k v b -> m (Table m k v b) Source #

unions :: forall m k v b. IOLike m => NonEmpty (Table m k v b) -> m (Table m k v b) Source #

newtype UnionDebt Source #

The current upper bound on the number of UnionCredits that have to be supplied before a union is completed.

The union debt is the number of merging steps that need to be performed /at most/ until the delayed work of performing a union is completed. This includes the cost of completing merges that were part of the union's input tables.

Constructors

UnionDebt Int 

newtype UnionCredits Source #

Credits are used to pay off UnionDebt, completing a union in the process.

A union credit corresponds to a single merging step being performed.

Constructors

UnionCredits Int 

supplyUnionCredits :: forall m k v b. (IOLike m, ResolveValue v) => Table m k v b -> UnionCredits -> m UnionCredits Source #

Serialisation

class SerialiseKey k Source #

Serialisation of keys.

Instances should satisfy the following:

Identity
deserialiseKey (serialiseKey x) == x
Identity up to slicing
deserialiseKey (packSlice prefix (serialiseKey x) suffix) == x

Instances may satisfy the following:

Ordering-preserving
x `compare` y == serialiseKey x `compare` serialiseKey y

Raw bytes are lexicographically ordered, so in particular this means that values should be serialised into big-endian formats. This constraint mainly exists for range queries, where the range is specified in terms of unserialised values, but the internal implementation works on the serialised representation.

Minimal complete definition

serialiseKey, deserialiseKey

class SerialiseValue v Source #

Serialisation of values and blobs.

Instances should satisfy the following:

Identity
deserialiseValue (serialiseValue x) == x
Identity up to slicing
deserialiseValue (packSlice prefix (serialiseValue x) suffix) == x

Minimal complete definition

serialiseValue, deserialiseValue

Instances

Instances details
SerialiseValue ByteArray Source #

The Ord instance of ByteArray is not lexicographic, so there cannot be an order-preserving instance of SerialiseKey. Use ShortByteString instead.

Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue Void Source #

The deserialiseValue of this instance throws. (as does e.g. Word64 instance on invalid input.)

This instance is useful for tables without blobs.

Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue Word64 Source # 
Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue ByteString Source #

Placeholder instance, not optimised

Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue ByteString Source #

Placeholder instance, not optimised

Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue ShortByteString Source # 
Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue a => SerialiseValue (Sum a) Source #

An instance for Sum which is transparent to the serialisation of a.

Note: If you want to serialize Sum a differently than a, then you should create another newtype over Sum and define your alternative serialization.

Instance details

Defined in Database.LSMTree.Internal.Serialise.Class

SerialiseValue v => SerialiseValue (ResolveAsFirst v) Source # 
Instance details

Defined in Database.LSMTree

Monoidal value resolution

class ResolveValue v where Source #

A class to specify how to resolve/merge values when using monoidal updates (Mupsert). This is required for merging entries during compaction and also for doing lookups, resolving multiple entries of the same key on the fly. The class has some laws, which should be tested (e.g. with QuickCheck).

It is okay to assume that the input bytes can be deserialised using deserialiseValue, as they are produced by either serialiseValue or resolveValue itself, which are required to produce deserialisable output.

Prerequisites:

Future opportunities for optimisations:

  • Include a function that determines whether it is safe to remove an Update from the last level of an LSM tree.
  • Include a function v -> RawBytes -> RawBytes, which can then be used when inserting into the write buffer. Currently, using resolveDeserialised means that the inserted value is serialised and (if there is another value with the same key in the write buffer) immediately deserialised again.

TODO: The laws depend on SerialiseValue, should we make it a superclass? TODO: should we additionally require Totality (for any input RawBytes, resolution should successfully provide a result)? This could reduce the risk of encountering errors during a run merge.

Instances

Instances details
(Num a, SerialiseValue a) => ResolveValue (Sum a :: Type) Source #

Mostly to give an example instance (plus the property tests for it). Additionally, this instance for Sum provides a nice monoidal, numerical aggregator.

Instance details

Defined in Database.LSMTree.Monoidal

ResolveValue (ResolveAsFirst v :: Type) Source # 
Instance details

Defined in Database.LSMTree

resolveDeserialised :: forall v. SerialiseValue v => (v -> v -> v) -> Proxy v -> RawBytes -> RawBytes -> RawBytes Source #

A helper function to implement resolveValue by operating on the deserialised representation. Note that especially for simple types it should be possible to provide a more efficient implementation by directly operating on the RawBytes.

This function could potentially be used to provide a default implementation for resolveValue, but it's probably best to be explicit about instances.

To satisfy the prerequisites of ResolveValue, the function provided to resolveDeserialised should itself satisfy some properties:

  • [Associativity] The provided function should be associative.
  • [Totality] The provided function should be total.

Properties

resolveValueValidOutput :: forall v. (SerialiseValue v, ResolveValue v, NFData v) => v -> v -> Bool Source #

Test the Valid Output law for the ResolveValue class

resolveValueAssociativity :: forall v. (SerialiseValue v, ResolveValue v) => v -> v -> v -> Bool Source #

Test the Associativity law for the ResolveValue class

DerivingVia wrappers

newtype ResolveAsFirst v Source #

Newtype wrapper for values, so that Mupserts behave like Inserts.

If there is no intent to use Mupserts, then the user still has to define a ResolveValue instance for their values, unless they use this newtype, which provides a sensible default instance.

Constructors

ResolveAsFirst v 

Utility types

type IOLike m = (MonadAsync m, MonadMVar m, MonadThrow (STM m), MonadThrow m, MonadCatch m, MonadMask m, PrimMonad m, MonadST m) Source #

Utility class for grouping io-classes constraints.