kevo/docs/CONFIG_GUIDE.md
Jeremy Tregunna 6fc3be617d
Some checks failed
Go Tests / Run Tests (1.24.2) (push) Has been cancelled
feat: Initial release of kevo storage engine.
Adds a complete LSM-based storage engine with these features:
- Single-writer based architecture for the storage engine
- WAL for durability, and hey it's configurable
- MemTable with skip list implementation for fast read/writes
- SSTable with block-based structure for on-disk level-based storage
- Background compaction with tiered strategy
- ACID transactions
- Good documentation (I hope)
2025-04-20 14:06:50 -06:00

6.8 KiB

Kevo Engine Configuration Guide

This guide provides recommendations for configuring the Kevo Engine for various workloads and environments.

Configuration Parameters

The Kevo Engine can be configured through the config.Config struct. Here are the most important parameters:

WAL Configuration

Parameter Description Default Range
WALDir Directory for Write-Ahead Log files <dbPath>/wal Any valid directory path
WALSyncMode Synchronization mode for WAL writes SyncBatch SyncNone, SyncBatch, SyncImmediate
WALSyncBytes Bytes written before sync in batch mode 1MB 64KB-16MB

MemTable Configuration

Parameter Description Default Range
MemTableSize Maximum size of a MemTable before flush 32MB 4MB-128MB
MaxMemTables Maximum number of MemTables in memory 4 2-8
MaxMemTableAge Maximum age of a MemTable before flush (seconds) 600 60-3600

SSTable Configuration

Parameter Description Default Range
SSTDir Directory for SSTable files <dbPath>/sst Any valid directory path
SSTableBlockSize Size of data blocks in SSTable 16KB 4KB-64KB
SSTableIndexSize Approximate size between index entries 64KB 16KB-256KB
SSTableMaxSize Maximum size of an SSTable file 64MB 16MB-256MB
SSTableRestartSize Number of keys between restart points 16 8-64

Compaction Configuration

Parameter Description Default Range
CompactionLevels Number of compaction levels 7 3-10
CompactionRatio Size ratio between adjacent levels 10 5-20
CompactionThreads Number of compaction worker threads 2 1-8
CompactionInterval Time between compaction checks (seconds) 30 5-300
MaxLevelWithTombstones Maximum level to keep tombstones 1 0-3

Workload-Based Recommendations

Balanced Workload (Default)

For a balanced mix of reads and writes:

config := config.NewDefaultConfig(dbPath)

The default configuration is optimized for a good balance between read and write performance, with reasonable durability guarantees.

Write-Intensive Workload

For workloads with many writes (e.g., logging, event streaming):

config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 64 * 1024 * 1024     // 64MB
config.WALSyncMode = config.SyncBatch      // Batch mode for better write throughput
config.WALSyncBytes = 4 * 1024 * 1024      // 4MB between syncs
config.SSTableBlockSize = 32 * 1024        // 32KB
config.CompactionRatio = 5                 // More frequent compactions

Read-Intensive Workload

For workloads with many reads (e.g., content serving, lookups):

config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 16 * 1024 * 1024     // 16MB
config.SSTableBlockSize = 8 * 1024         // 8KB for better read performance
config.SSTableIndexSize = 32 * 1024        // 32KB for more index points
config.CompactionRatio = 20                // Less frequent compactions

Low-Latency Workload

For workloads requiring minimal latency spikes:

config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 8 * 1024 * 1024      // 8MB for quicker flushes
config.CompactionInterval = 5              // More frequent compaction checks
config.CompactionThreads = 1               // Reduce contention

High-Durability Workload

For workloads where data durability is critical:

config := config.NewDefaultConfig(dbPath)
config.WALSyncMode = config.SyncImmediate  // Immediate sync after each write
config.MaxMemTableAge = 60                 // Flush MemTables more frequently

Memory-Constrained Environment

For environments with limited memory:

config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 4 * 1024 * 1024      // 4MB
config.MaxMemTables = 2                    // Only keep 2 MemTables in memory
config.SSTableBlockSize = 4 * 1024         // 4KB blocks

Environmental Considerations

SSD vs HDD Storage

For SSD storage:

  • Consider using larger block sizes (16KB-32KB)
  • Batch WAL syncs are generally sufficient

For HDD storage:

  • Use larger block sizes (32KB-64KB) to reduce seeks
  • Consider more aggressive compaction to reduce fragmentation

Client-Side vs Server-Side

For client-side applications:

  • Reduce memory usage with smaller MemTable sizes
  • Consider using SyncNone or SyncBatch modes for better performance

For server-side applications:

  • Configure based on workload characteristics
  • Allocate more memory for MemTables in high-throughput scenarios

Performance Impact of Key Parameters

WALSyncMode

  • SyncNone: Highest write throughput, but risk of data loss on crash
  • SyncBatch: Good balance of throughput and durability
  • SyncImmediate: Highest durability, but lowest write throughput

MemTableSize

  • Larger: Better write throughput, higher memory usage, potentially longer pauses
  • Smaller: Lower memory usage, more frequent compaction, potentially lower throughput

SSTableBlockSize

  • Larger: Better scan performance, slightly higher space usage
  • Smaller: Better point lookup performance, potentially higher index overhead

CompactionRatio

  • Larger: Less frequent compaction, higher read amplification
  • Smaller: More frequent compaction, lower read amplification

Tuning Process

To find the optimal configuration for your specific workload:

  1. Run the benchmarking tool with your expected workload:

    go run ./cmd/storage-bench/... -tune
    
  2. The tool will generate a recommendations report based on the benchmark results

  3. Adjust the configuration based on the recommendations and your specific requirements

  4. Validate with your application workload

Example Custom Configuration

// Example custom configuration for a write-heavy time-series database
func CustomTimeSeriesConfig(dbPath string) *config.Config {
    cfg := config.NewDefaultConfig(dbPath)
    
    // Optimize for write throughput
    cfg.MemTableSize = 64 * 1024 * 1024
    cfg.WALSyncMode = config.SyncBatch
    cfg.WALSyncBytes = 4 * 1024 * 1024
    
    // Optimize for sequential scans
    cfg.SSTableBlockSize = 32 * 1024
    
    // Optimize for compaction
    cfg.CompactionRatio = 5
    
    return cfg
}

Conclusion

The Kevo Engine provides a flexible configuration system that can be tailored to various workloads and environments. By understanding the impact of each configuration parameter, you can optimize the engine for your specific needs.

For most applications, the default configuration provides a good starting point, but tuning can significantly improve performance for specific workloads.