Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Database.LSMTree.Internal.WriteBufferBlobs
Contents
Description
An on-disk store for blobs for the write buffer.
For table inserts with blobs, the blob get written out immediately to a
file, while the rest of the Entry
goes into the WriteBuffer
. The
WriteBufferBlobs
manages the storage of the blobs.
A single write buffer blob file can be shared between multiple tables. As a
consequence, the lifetime of the WriteBufferBlobs
must be managed using
new
, and with the Ref
API: releaseRef
and dupRef
. When a table is
duplicated, the new table needs its own reference, so use dupRef
upon
duplication.
Blobs are copied from the write buffer blob file when the write buffer is flushed to make a run. This is needed since the blob file is shared and so not stable by the time one table wants to flush it.
Not all tables need a blob file so we defer opening the file until it is needed.
Synopsis
- data WriteBufferBlobs m h = WriteBufferBlobs {
- blobFile :: !(Ref (BlobFile m h))
- blobFilePointer :: !(FilePointer m)
- writeBufRefCounter :: !(RefCounter m)
- new :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> m (Ref (WriteBufferBlobs m h))
- open :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> AllowExisting -> m (Ref (WriteBufferBlobs m h))
- addBlob :: (PrimMonad m, MonadThrow m) => HasFS m h -> Ref (WriteBufferBlobs m h) -> SerialisedBlob -> m BlobSpan
- mkRawBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> RawBlobRef m h
- mkWeakBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> WeakBlobRef m h
- newtype FilePointer m = FilePointer (PrimVar (PrimState m) Int)
Documentation
data WriteBufferBlobs m h Source #
A single WriteBufferBlobs
may be shared between multiple tables.
As a consequence of being shared, the management of the shared state has to
be quite careful.
In particular there is the blob file itself. We may have to write to this
blob file from multiple threads on behalf of independent tables.
The offset at which we write is thus shared mutable state. Our strategy for
the write offset is to explicitly track it (since we need to know the offset
to return correct BlobSpan
s) and then to not use the file's own file
pointer. We do this by always writing at specific file offsets rather than
writing at the open file's normal file pointer. We use a PrimVar
with
atomic operations to manage the file offset.
A consequence of the blob file being shared between the write buffers of many tables is that the blobs in the file will not all belong to one table. The write buffer blob file is unsuitable to use as-is as the blob file for a run when the write buffer is flushed. The run blob file must be immutable and with a known CRC. Whereas because the write buffer blob file is shared, it can still be appended to via inserts in one table while another is trying to flush the write buffer. So there is no stable CRC for the whole file (as required by the snapshot format). Further more we cannot even incrementally calculate the blob file CRC without additional expensive serialisation. To solve this we follow the design that the open file handle for the blob file is only shared between multiple write buffers, and is not shared with the runs once flushed. This separates the lifetimes of the files. Correspondingly, the reference counter is only for tracking the lifetime of the read/write mode file handle.
One concern with sharing blob files and the open blob file handle between multiple write buffers is: can we guarantee that the blob file is eventually closed?
A problematic example would be, starting from a root handle and then repeatedly: duplicating; inserting (with blobs) into the duplicate; and then closing it. This would use only a fixed number of tables at once, but would keep inserting into the same the write buffer blob file. This could be done indefinitely.
On the other hand, provided that there's a bound on the number of duplicates that are created from any point, and each table is eventually closed, then each write buffer blob file will eventually be closed.
The latter seems like the more realistic use case, and so the design here is probably reasonable.
If not, an entirely different approach would be to manage blobs across all runs (and the write buffer) differently: avoiding copying when blobs are merged and using some kind of GC algorithm to recover space for blobs that are not longer needed. There are LSM algorithms that do this for values (i.e. copying keys only during merge and referring to values managed in a separate disk heap), so the same could be applied to blobs.
Constructors
WriteBufferBlobs | |
Fields
|
Instances
RefCounted m (WriteBufferBlobs m h) Source # | |
Defined in Database.LSMTree.Internal.WriteBufferBlobs Methods getRefCounter :: WriteBufferBlobs m h -> RefCounter m Source # | |
NFData h => NFData (WriteBufferBlobs m h) Source # | |
Defined in Database.LSMTree.Internal.WriteBufferBlobs Methods rnf :: WriteBufferBlobs m h -> () # |
new :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> m (Ref (WriteBufferBlobs m h)) Source #
Create a new WriteBufferBlobs
with a new file.
REF: the resulting reference must be released once it is no longer used.
ASYNC: this should be called with asynchronous exceptions masked because it allocates/creates resources.
open :: (PrimMonad m, MonadMask m) => HasFS m h -> FsPath -> AllowExisting -> m (Ref (WriteBufferBlobs m h)) Source #
Open a WriteBufferBlobs
file and sets the file pointer to the end of the file.
REF: the resulting reference must be released once it is no longer used.
ASYNC: this should be called with asynchronous exceptions masked because it allocates/creates resources.
addBlob :: (PrimMonad m, MonadThrow m) => HasFS m h -> Ref (WriteBufferBlobs m h) -> SerialisedBlob -> m BlobSpan Source #
Append a blob.
If no exception is returned, then the file pointer will be set to exactly the file size.
If an exception is returned, the file pointer points to a file offset at or beyond the file size. The bytes between the old and new offset might be garbage or missing.
mkRawBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> RawBlobRef m h Source #
Helper function to make a RawBlobRef
that points into a
WriteBufferBlobs
.
This function should only be used on the result of addBlob
on the same
WriteBufferBlobs
. For example:
addBlob
hfs wbb blob >>= \span -> pure (mkRawBlobRef
wbb span)
mkWeakBlobRef :: Ref (WriteBufferBlobs m h) -> BlobSpan -> WeakBlobRef m h Source #
Helper function to make a WeakBlobRef
that points into a
WriteBufferBlobs
.
This function should only be used on the result of addBlob
on the same
WriteBufferBlobs
. For example:
addBlob
hfs wbb blob >>= \span -> pure (mkWeakBlobRef
wbb span)
For tests
newtype FilePointer m Source #
A mutable file offset, suitable to share between threads.
This pointer is limited to 31-bit file offsets on 32-bit systems. This should be a sufficiently large limit that we never reach it in practice.
Constructors
FilePointer (PrimVar (PrimState m) Int) |
Instances
NFData (FilePointer m) Source # | |
Defined in Database.LSMTree.Internal.WriteBufferBlobs Methods rnf :: FilePointer m -> () # |