Safe Haskell	Safe-Inferred
Language	GHC2021

Database.LSMTree.Internal.WriteBufferBlobs

Contents

For tests

Description

An on-disk store for blobs for the write buffer.

For table inserts with blobs, the blob get written out immediately to a file, while the rest of the Entry goes into the WriteBuffer. The WriteBufferBlobs manages the storage of the blobs.

A single write buffer blob file can be shared between multiple tables. As a consequence, the lifetime of the WriteBufferBlobs must be managed using new, and with the Ref API: releaseRef and dupRef. When a table is duplicated, the new table needs its own reference, so use dupRef upon duplication.

Blobs are copied from the write buffer blob file when the write buffer is flushed to make a run. This is needed since the blob file is shared and so not stable by the time one table wants to flush it.

Not all tables need a blob file so we defer opening the file until it is needed.

Synopsis

data WriteBufferBlobs m h = WriteBufferBlobs {
- blobFile :: !(Ref (BlobFile m h))
- blobFilePointer :: !(FilePointer m)
- writeBufRefCounter :: !(RefCounter m)
}
new :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> m (Ref (WriteBufferBlobs m h))
open :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> AllowExisting -> m (Ref (WriteBufferBlobs m h))
addBlob :: (PrimMonad m, MonadThrow m) => HasFS m h -> Ref (WriteBufferBlobs m h) -> SerialisedBlob -> m BlobSpan
mkRawBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> RawBlobRef m h
mkWeakBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> WeakBlobRef m h
newtype FilePointer m = FilePointer (PrimVar (PrimState m) Int)

Documentation

data WriteBufferBlobs m h Source #

A single WriteBufferBlobs may be shared between multiple tables. As a consequence of being shared, the management of the shared state has to be quite careful.

In particular there is the blob file itself. We may have to write to this blob file from multiple threads on behalf of independent tables. The offset at which we write is thus shared mutable state. Our strategy for the write offset is to explicitly track it (since we need to know the offset to return correct BlobSpans) and then to not use the file's own file pointer. We do this by always writing at specific file offsets rather than writing at the open file's normal file pointer. We use a PrimVar with atomic operations to manage the file offset.

A consequence of the blob file being shared between the write buffers of many tables is that the blobs in the file will not all belong to one table. The write buffer blob file is unsuitable to use as-is as the blob file for a run when the write buffer is flushed. The run blob file must be immutable and with a known CRC. Whereas because the write buffer blob file is shared, it can still be appended to via inserts in one table while another is trying to flush the write buffer. So there is no stable CRC for the whole file (as required by the snapshot format). Further more we cannot even incrementally calculate the blob file CRC without additional expensive serialisation. To solve this we follow the design that the open file handle for the blob file is only shared between multiple write buffers, and is not shared with the runs once flushed. This separates the lifetimes of the files. Correspondingly, the reference counter is only for tracking the lifetime of the read/write mode file handle.

One concern with sharing blob files and the open blob file handle between multiple write buffers is: can we guarantee that the blob file is eventually closed?

A problematic example would be, starting from a root handle and then repeatedly: duplicating; inserting (with blobs) into the duplicate; and then closing it. This would use only a fixed number of tables at once, but would keep inserting into the same the write buffer blob file. This could be done indefinitely.

On the other hand, provided that there's a bound on the number of duplicates that are created from any point, and each table is eventually closed, then each write buffer blob file will eventually be closed.

The latter seems like the more realistic use case, and so the design here is probably reasonable.

If not, an entirely different approach would be to manage blobs across all runs (and the write buffer) differently: avoiding copying when blobs are merged and using some kind of GC algorithm to recover space for blobs that are not longer needed. There are LSM algorithms that do this for values (i.e. copying keys only during merge and referring to values managed in a separate disk heap), so the same could be applied to blobs.

Constructors

WriteBufferBlobs
Fields blobFile :: !(Ref (BlobFile m h)) The blob file INVARIANT: the file may contain garbage bytes, but no blob reference (`RawBlobRef`, `WeakBlobRef`, or 'StrongBlobRef) will reference these bytes. blobFilePointer :: !(FilePointer m) The manually tracked file pointer. INVARIANT: the file pointer points to a file offset at or beyond the file size. writeBufRefCounter :: !(RefCounter m)

Instances

Instances details

RefCounted m (WriteBufferBlobs m h) Source #
Instance details Defined in Database.LSMTree.Internal.WriteBufferBlobs Methods getRefCounter :: WriteBufferBlobs m h -> RefCounter m Source #
NFData h => NFData (WriteBufferBlobs m h) Source #
Instance details Defined in Database.LSMTree.Internal.WriteBufferBlobs Methods rnf :: WriteBufferBlobs m h -> () #

new :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> m (Ref (WriteBufferBlobs m h)) Source #

Create a new WriteBufferBlobs with a new file.

REF: the resulting reference must be released once it is no longer used.

ASYNC: this should be called with asynchronous exceptions masked because it allocates/creates resources.

open :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> AllowExisting -> m (Ref (WriteBufferBlobs m h)) Source #

Open a WriteBufferBlobs file and sets the file pointer to the end of the file.

REF: the resulting reference must be released once it is no longer used.

ASYNC: this should be called with asynchronous exceptions masked because it allocates/creates resources.

addBlob :: (PrimMonad m, MonadThrow m) => HasFS m h -> Ref (WriteBufferBlobs m h) -> SerialisedBlob -> m BlobSpan Source #

Append a blob.

If no exception is returned, then the file pointer will be set to exactly the file size.

If an exception is returned, the file pointer points to a file offset at or beyond the file size. The bytes between the old and new offset might be garbage or missing.

mkRawBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> RawBlobRef m h Source #

Helper function to make a RawBlobRef that points into a WriteBufferBlobs.

This function should only be used on the result of addBlob on the same WriteBufferBlobs. For example:

 addBlob hfs wbb blob >>= \span -> pure (mkRawBlobRef wbb span)

mkWeakBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> WeakBlobRef m h Source #

Helper function to make a WeakBlobRef that points into a WriteBufferBlobs.

This function should only be used on the result of addBlob on the same WriteBufferBlobs. For example:

 addBlob hfs wbb blob >>= \span -> pure (mkWeakBlobRef wbb span)

For tests

newtype FilePointer m Source #

A mutable file offset, suitable to share between threads.

This pointer is limited to 31-bit file offsets on 32-bit systems. This should be a sufficiently large limit that we never reach it in practice.

Constructors

FilePointer (PrimVar (PrimState m) Int)

Instances

Instances details

NFData (FilePointer m) Source #
Instance details Defined in Database.LSMTree.Internal.WriteBufferBlobs Methods rnf :: FilePointer m -> () #