Adds a complete LSM-based storage engine with these features: - Single-writer based architecture for the storage engine - WAL for durability, and hey it's configurable - MemTable with skip list implementation for fast read/writes - SSTable with block-based structure for on-disk level-based storage - Background compaction with tiered strategy - ACID transactions - Good documentation (I hope)
9.7 KiB
Configuration Package Documentation
The config
package implements the configuration management system for the Kevo engine. It provides a structured way to define, validate, persist, and load configuration parameters, ensuring consistent behavior across storage engine instances and restarts.
Overview
Configuration in the Kevo engine is handled through a versioned manifest system. This approach allows for tracking configuration changes over time and ensures that all components operate with consistent settings.
Key responsibilities of the config package include:
- Defining and validating configuration parameters
- Persisting configuration to disk in a manifest file
- Loading configuration during engine startup
- Tracking engine state across restarts
- Providing versioning and backward compatibility
Configuration Parameters
WAL Configuration
Parameter | Type | Default | Description |
---|---|---|---|
WALDir |
string | <dbPath>/wal |
Directory for Write-Ahead Log files |
WALSyncMode |
SyncMode | SyncBatch |
Synchronization mode (None, Batch, Immediate) |
WALSyncBytes |
int64 | 1MB | Bytes written before sync in batch mode |
WALMaxSize |
int64 | 0 (dynamic) | Maximum size of a WAL file before rotation |
MemTable Configuration
Parameter | Type | Default | Description |
---|---|---|---|
MemTableSize |
int64 | 32MB | Maximum size of a MemTable before flush |
MaxMemTables |
int | 4 | Maximum number of MemTables in memory |
MaxMemTableAge |
int64 | 600 (seconds) | Maximum age of a MemTable before flush |
MemTablePoolCap |
int | 4 | Capacity of the MemTable pool |
SSTable Configuration
Parameter | Type | Default | Description |
---|---|---|---|
SSTDir |
string | <dbPath>/sst |
Directory for SSTable files |
SSTableBlockSize |
int | 16KB | Size of data blocks in SSTable |
SSTableIndexSize |
int | 64KB | Approximate size between index entries |
SSTableMaxSize |
int64 | 64MB | Maximum size of an SSTable file |
SSTableRestartSize |
int | 16 | Number of keys between restart points |
Compaction Configuration
Parameter | Type | Default | Description |
---|---|---|---|
CompactionLevels |
int | 7 | Number of compaction levels |
CompactionRatio |
float64 | 10.0 | Size ratio between adjacent levels |
CompactionThreads |
int | 2 | Number of compaction worker threads |
CompactionInterval |
int64 | 30 (seconds) | Time between compaction checks |
MaxLevelWithTombstones |
int | 1 | Maximum level to keep tombstones |
Manifest Format
The manifest is a JSON file that stores configuration and state information for the engine.
Structure
The manifest contains an array of entries, each representing a point-in-time snapshot of the engine configuration:
[
{
"timestamp": 1619123456,
"version": 1,
"config": {
"version": 1,
"wal_dir": "/path/to/data/wal",
"wal_sync_mode": 1,
"wal_sync_bytes": 1048576,
...
},
"filesystem": {
"/path/to/data/sst/0_000001_00000123456789.sst": 1,
"/path/to/data/sst/1_000002_00000123456790.sst": 2
}
},
{
"timestamp": 1619123789,
"version": 1,
"config": {
...updated configuration...
},
"filesystem": {
...updated file list...
}
}
]
Components
- Timestamp: When the entry was created
- Version: The format version of the manifest
- Config: The complete configuration at that point in time
- FileSystem: A map of file paths to sequence numbers
The last entry in the array represents the current state of the engine.
Implementation Details
Configuration Structure
The Config
struct contains all tunable parameters for the storage engine:
-
Core Fields:
Version
: The configuration format version- Various parameter fields organized by component
-
Synchronization:
- Mutex to protect concurrent access
- Thread-safe update methods
-
Validation:
- Comprehensive validation of all parameters
- Prevents invalid configurations from being used
Manifest Management
The Manifest
struct manages configuration persistence and tracking:
-
Entry Tracking:
- List of historical configuration entries
- Current entry pointer for easy access
-
File System State:
- Tracks SSTable files and their sequence numbers
- Enables recovery after restart
-
Persistence:
- Atomic updates via temporary files
- Concurrent access protection
SyncMode Enum
The SyncMode
enum defines the WAL synchronization behavior:
-
SyncNone (0):
- No explicit synchronization
- Fastest performance, lowest durability
-
SyncBatch (1):
- Synchronize after a certain amount of data
- Good balance of performance and durability
-
SyncImmediate (2):
- Synchronize after every write
- Highest durability, lowest performance
Versioning and Compatibility
Current Version
The current manifest format version is 1, defined by CurrentManifestVersion
.
Versioning Strategy
The configuration system supports forward and backward compatibility:
-
Version Field:
- Each config and manifest has a version field
- Used to detect format changes
-
Backward Compatibility:
- New versions can read old formats
- Default values apply for missing parameters
-
Forward Compatibility:
- Unknown fields are preserved during updates
- Allows safe rollback to older versions
Common Usage Patterns
Creating Default Configuration
// Create a default configuration for a specific database path
config := config.NewDefaultConfig("/path/to/data")
// Validate the configuration
if err := config.Validate(); err != nil {
log.Fatal(err)
}
Loading Configuration from Manifest
// Load configuration from an existing manifest
config, err := config.LoadConfigFromManifest("/path/to/data")
if err != nil {
if errors.Is(err, config.ErrManifestNotFound) {
// Create a new configuration if manifest doesn't exist
config = config.NewDefaultConfig("/path/to/data")
} else {
log.Fatal(err)
}
}
Modifying Configuration
// Update configuration parameters
config.Update(func(cfg *config.Config) {
// Modify parameters
cfg.MemTableSize = 64 * 1024 * 1024 // 64MB
cfg.WALSyncMode = config.SyncBatch
cfg.CompactionInterval = 60 // 60 seconds
})
// Save the updated configuration
if err := config.SaveManifest("/path/to/data"); err != nil {
log.Fatal(err)
}
Working with Full Manifest
// Load or create a manifest
var manifest *config.Manifest
manifest, err := config.LoadManifest("/path/to/data")
if err != nil {
if errors.Is(err, config.ErrManifestNotFound) {
// Create a new manifest
manifest, err = config.NewManifest("/path/to/data", nil)
if err != nil {
log.Fatal(err)
}
} else {
log.Fatal(err)
}
}
// Update configuration
manifest.UpdateConfig(func(cfg *config.Config) {
cfg.CompactionRatio = 8.0
})
// Track files
manifest.AddFile("/path/to/data/sst/0_000001_00000123456789.sst", 1)
// Save changes
if err := manifest.Save(); err != nil {
log.Fatal(err)
}
Performance Considerations
Memory Impact
The configuration system has minimal memory footprint:
-
Static Structure:
- Fixed size in memory
- No dynamic growth during operation
-
Sharing:
- Single configuration instance shared among components
- No duplication of configuration data
I/O Patterns
Configuration I/O is infrequent and optimized:
-
Read Once:
- Configuration is read once at startup
- Kept in memory during operation
-
Write Rarely:
- Written only when configuration changes
- No impact on normal operation
-
Atomic Updates:
- Uses atomic file operations
- Prevents corruption during crashes
Configuration Recommendations
Production Environment
For production use:
-
WAL Settings:
WALSyncMode
:SyncBatch
for most workloadsWALSyncBytes
: 1-4MB for good throughput with reasonable durability
-
Memory Management:
MemTableSize
: 64-128MB for high-throughput systemsMaxMemTables
: 4-8 based on available memory
-
Compaction:
CompactionRatio
: 8-12 (higher means less frequent but larger compactions)CompactionThreads
: 2-4 for multi-core systems
Development/Testing
For development and testing:
-
WAL Settings:
WALSyncMode
:SyncNone
for maximum performance- Small database directory for easier management
-
Memory Settings:
- Smaller
MemTableSize
(4-8MB) for more frequent flushes - Reduced
MaxMemTables
to limit memory usage
- Smaller
-
Compaction:
- More frequent compaction for testing (
CompactionInterval
: 5-10 seconds) - Fewer
CompactionLevels
(3-5) for simpler behavior
- More frequent compaction for testing (
Limitations and Future Enhancements
Current Limitations
-
Limited Runtime Changes:
- Some parameters can't be changed while the engine is running
- May require restart for some configuration changes
-
No Hot Reload:
- No automatic detection of configuration changes
- Changes require explicit engine reload
-
Simple Versioning:
- Basic version number without semantic versioning
- No complex migration paths between versions
Potential Enhancements
-
Hot Configuration Updates:
- Ability to update more parameters at runtime
- Notification system for configuration changes
-
Configuration Profiles:
- Predefined configurations for common use cases
- Easy switching between profiles
-
Enhanced Validation:
- Interdependent parameter validation
- Workload-specific recommendations