lsm-tree-0.1.0.0: Log-structured merge-trees
Safe HaskellSafe-Inferred
LanguageGHC2021

Database.LSMTree.Internal.Config

Synopsis

Documentation

Table configuration

data TableConfig Source #

A collection of configuration parameters for tables, which can be used to tune the performance of the table. To construct a TableConfig, modify the defaultTableConfig, which defines reasonable defaults for all parameters.

For a detailed discussion of fine-tuning the table configuration, see Fine-tuning Table Configuration.

confMergePolicy :: MergePolicy
The merge policy balances the performance of lookups against the performance of updates. Levelling favours lookups. Tiering favours updates. Lazy levelling strikes a middle ground between levelling and tiering, and moderately favours updates. This parameter is explicitly referenced in the documentation of those operations it affects.
confSizeRatio :: SizeRatio
The size ratio pushes the effects of the merge policy to the extreme. If the size ratio is higher, levelling favours lookups more, and tiering and lazy levelling favour updates more. This parameter is referred to as \(T\) in the disk I/O cost of operations.
confWriteBufferAlloc :: WriteBufferAlloc
The write buffer capacity balances the performance of lookups and updates against the in-memory size of the database. If the write buffer is larger, it takes up more memory, but lookups and updates are more efficient. This parameter is referred to as \(B\) in the disk I/O cost of operations. Irrespective of this parameter, the write buffer size cannot exceed 4GiB.
confMergeSchedule :: MergeSchedule
The merge schedule balances the performance of lookups and updates against the consistency of updates. With the one-shot merge schedule, lookups and updates are more efficient overall, but some updates may take much longer than others. With the incremental merge schedule, lookups and updates are less efficient overall, but each update does a similar amount of work. This parameter is explicitly referenced in the documentation of those operations it affects. The merge schedule does not affect the way that table unions are computed. However, any table union must complete all outstanding incremental updates.
confBloomFilterAlloc :: BloomFilterAlloc
The Bloom filter size balances the performance of lookups against the in-memory size of the database. If the Bloom filters are larger, they take up more memory, but lookup operations are more efficient.
confFencePointerIndex :: FencePointerIndexType
The fence-pointer index type supports two types of indexes. The ordinary indexes are designed to work with any key. The compact indexes are optimised for the case where the keys in the database are uniformly distributed, e.g., when the keys are hashes.
confDiskCachePolicy :: DiskCachePolicy
The disk cache policy supports caching lookup operations using the OS page cache. Caching may improve the performance of lookups and updates if database access follows certain patterns.
confMergeBatchSize :: MergeBatchSize
The merge batch size balances the maximum latency of individual update operations, versus the latency of a sequence of update operations. Bigger batches improves overall performance but some updates will take a lot longer than others. The default is to use a large batch size.

defaultTableConfig :: TableConfig Source #

The defaultTableConfig defines reasonable defaults for all TableConfig parameters.

>>> confMergePolicy defaultTableConfig
LazyLevelling
>>> confMergeSchedule defaultTableConfig
Incremental
>>> confSizeRatio defaultTableConfig
Four
>>> confWriteBufferAlloc defaultTableConfig
AllocNumEntries 20000
>>> confBloomFilterAlloc defaultTableConfig
AllocRequestFPR 1.0e-3
>>> confFencePointerIndex defaultTableConfig
OrdinaryIndex
>>> confDiskCachePolicy defaultTableConfig
DiskCacheAll
>>> confMergeBatchSize defaultTableConfig
MergeBatchSize 20000

Merge policy

data MergePolicy Source #

The merge policy balances the performance of lookups against the performance of updates. Levelling favours lookups. Tiering favours updates. Lazy levelling strikes a middle ground between levelling and tiering, and moderately favours updates. This parameter is explicitly referenced in the documentation of those operations it affects.

NOTE: This package only supports lazy levelling.

For a detailed discussion of the merge policy, see Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size.

Constructors

LazyLevelling 

Size ratio

data SizeRatio Source #

The size ratio pushes the effects of the merge policy to the extreme. If the size ratio is higher, levelling favours lookups more, and tiering and lazy levelling favour updates more. This parameter is referred to as \(T\) in the disk I/O cost of operations.

NOTE: This package only supports a size ratio of four.

For a detailed discussion of the size ratio, see Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size.

Constructors

Four 

Write buffer allocation

data WriteBufferAlloc Source #

The write buffer capacity balances the performance of lookups and updates against the in-memory size of the table. If the write buffer is larger, it takes up more memory, but lookups and updates are more efficient. Irrespective of this parameter, the write buffer size cannot exceed 4GiB.

For a detailed discussion of the size ratio, see Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size.

Constructors

AllocNumEntries !Int

Allocate space for the in-memory write buffer to fit the requested number of entries. This parameter is referred to as \(B\) in the disk I/O cost of operations.

Bloom filter allocation

data BloomFilterAlloc Source #

The Bloom filter size balances the performance of lookups against the in-memory size of the table. If the Bloom filters are larger, they take up more memory, but lookup operations are more efficient.

For a detailed discussion of the Bloom filter size, see Fine-tuning: Bloom Filter Size.

Constructors

AllocFixed !Double

Allocate the requested number of bits per entry in the table.

The value must strictly positive, but fractional values are permitted. The recommended range is \([2, 24]\).

AllocRequestFPR !Double

Allocate the required number of bits per entry to get the requested false-positive rate.

The value must be in the range \((0, 1)\). The recommended range is \([1\mathrm{e}{ -5 },1\mathrm{e}{ -2 }]\).

Fence pointer index

data FencePointerIndexType Source #

The fence-pointer index type supports two types of indexes. The ordinary indexes are designed to work with any key. The compact indexes are optimised for the case where the keys in the database are uniformly distributed, e.g., when the keys are hashes.

For a detailed discussion the fence-pointer index types, see Fine-tuning: Fence-Pointer Index Type.

Constructors

OrdinaryIndex

Ordinary indexes are designed to work with any key.

When using an ordinary index, the serialiseKey function cannot produce output larger than 64KiB.

CompactIndex

Compact indexes are designed for the case where the keys in the database are uniformly distributed, e.g., when the keys are hashes.

When using a compact index, the serialiseKey function must satisfy the following additional law:

Minimal size
size (serialiseKey x) >= 8

Use serialiseKeyMinimalSize to test this law.

serialiseKeyMinimalSize :: SerialiseKey k => k -> Bool Source #

Test the Minimal size law for the CompactIndex option.

Disk cache policy

data DiskCachePolicy Source #

The disk cache policy determines if lookup operations use the OS page cache. Caching may improve the performance of lookups if database access follows certain patterns.

For a detailed discussion the disk cache policy, see Fine-tuning: Disk Cache Policy.

Constructors

DiskCacheAll

Cache all data in the table.

Use this policy if the database's access pattern has either good spatial locality or both good spatial and temporal locality.

DiskCacheLevelOneTo !Int

Cache the data in the freshest l levels.

Use this policy if the database's access pattern only has good temporal locality.

The variable l determines the number of levels that are cached. For a description of levels, see Merge Policy, Size Ratio, and Write Buffer Size. With this setting, the database can be expected to cache up to \(\frac{k}{P}\) pages of memory, where \(k\) refers to the number of entries that fit in levels \([1,l]\) and is defined as \(\sum_{i=1}^{l}BT^{i}\).

DiskCacheNone

Do not cache any table data.

Use this policy if the database's access pattern has does not have good spatial or temporal locality. For instance, if the access pattern is uniformly random.

diskCachePolicyForLevel :: DiskCachePolicy -> RunLevelNo -> RunDataCaching Source #

Interpret the DiskCachePolicy for a level: should we cache data in runs at this level.

Merge schedule

data MergeSchedule Source #

The merge schedule balances the performance of lookups and updates against the consistency of updates. The merge schedule does not affect the performance of table unions. With the one-shot merge schedule, lookups and updates are more efficient overall, but some updates may take much longer than others. With the incremental merge schedule, lookups and updates are less efficient overall, but each update does a similar amount of work. This parameter is explicitly referenced in the documentation of those operations it affects.

For a detailed discussion of the effect of the merge schedule, see Fine-tuning: Merge Schedule.

Constructors

OneShot

The OneShot merge schedule causes the merging algorithm to complete merges immediately. This is more efficient than the Incremental merge schedule, but has an inconsistent workload. Using the OneShot merge schedule, the worst-case disk I/O complexity of the update operations is linear in the size of the table. For real-time systems and other use cases where unresponsiveness is unacceptable, use the Incremental merge schedule.

Incremental

The Incremental merge schedule spreads out the merging work over time. This is less efficient than the OneShot merge schedule, but has a consistent workload. Using the Incremental merge schedule, the worst-case disk I/O complexity of the update operations is logarithmic in the size of the table. This Incremental merge schedule still uses batching to improve performance. The batch size can be controlled using the MergeBatchSize.

Merge batch size

newtype MergeBatchSize Source #

The merge batch size is a micro-tuning parameter, and in most cases you do need to think about it and can leave it at its default.

When using the Incremental merge schedule, merging is done in batches. This is a trade-off: larger batches tends to mean better overall performance but the downside is that while most updates (inserts, deletes, upserts) are fast, some are slower (when a batch of merging work has to be done).

If you care most about the maximum latency of updates, then use a small batch size. If you don't care about latency of individual operations, just the latency of the overall sequence of operations then use a large batch size. The default is to use a large batch size, the same size as the write buffer itself. The minimum batch size is 1. The maximum batch size is the size of the write buffer confWriteBufferAlloc.

Note that the actual batch size is the minimum of this configuration parameter and the size of the batch of operations performed (e.g. inserts). So if you consistently use large batches, you can use a batch size of 1 and the merge batch size will always be determined by the operation batch size.

A further reason why it may be preferable to use minimal batch sizes is to get good parallel work balance, when using parallelism.

Constructors

MergeBatchSize Int