kevo/docs/CONFIG_GUIDE.md
Jeremy Tregunna 6fc3be617d
Some checks failed
Go Tests / Run Tests (1.24.2) (push) Has been cancelled
feat: Initial release of kevo storage engine.
Adds a complete LSM-based storage engine with these features:
- Single-writer based architecture for the storage engine
- WAL for durability, and hey it's configurable
- MemTable with skip list implementation for fast read/writes
- SSTable with block-based structure for on-disk level-based storage
- Background compaction with tiered strategy
- ACID transactions
- Good documentation (I hope)
2025-04-20 14:06:50 -06:00

200 lines
6.8 KiB
Markdown

# Kevo Engine Configuration Guide
This guide provides recommendations for configuring the Kevo Engine for various workloads and environments.
## Configuration Parameters
The Kevo Engine can be configured through the `config.Config` struct. Here are the most important parameters:
### WAL Configuration
| Parameter | Description | Default | Range |
|-----------|-------------|---------|-------|
| `WALDir` | Directory for Write-Ahead Log files | `<dbPath>/wal` | Any valid directory path |
| `WALSyncMode` | Synchronization mode for WAL writes | `SyncBatch` | `SyncNone`, `SyncBatch`, `SyncImmediate` |
| `WALSyncBytes` | Bytes written before sync in batch mode | 1MB | 64KB-16MB |
### MemTable Configuration
| Parameter | Description | Default | Range |
|-----------|-------------|---------|-------|
| `MemTableSize` | Maximum size of a MemTable before flush | 32MB | 4MB-128MB |
| `MaxMemTables` | Maximum number of MemTables in memory | 4 | 2-8 |
| `MaxMemTableAge` | Maximum age of a MemTable before flush (seconds) | 600 | 60-3600 |
### SSTable Configuration
| Parameter | Description | Default | Range |
|-----------|-------------|---------|-------|
| `SSTDir` | Directory for SSTable files | `<dbPath>/sst` | Any valid directory path |
| `SSTableBlockSize` | Size of data blocks in SSTable | 16KB | 4KB-64KB |
| `SSTableIndexSize` | Approximate size between index entries | 64KB | 16KB-256KB |
| `SSTableMaxSize` | Maximum size of an SSTable file | 64MB | 16MB-256MB |
| `SSTableRestartSize` | Number of keys between restart points | 16 | 8-64 |
### Compaction Configuration
| Parameter | Description | Default | Range |
|-----------|-------------|---------|-------|
| `CompactionLevels` | Number of compaction levels | 7 | 3-10 |
| `CompactionRatio` | Size ratio between adjacent levels | 10 | 5-20 |
| `CompactionThreads` | Number of compaction worker threads | 2 | 1-8 |
| `CompactionInterval` | Time between compaction checks (seconds) | 30 | 5-300 |
| `MaxLevelWithTombstones` | Maximum level to keep tombstones | 1 | 0-3 |
## Workload-Based Recommendations
### Balanced Workload (Default)
For a balanced mix of reads and writes:
```go
config := config.NewDefaultConfig(dbPath)
```
The default configuration is optimized for a good balance between read and write performance, with reasonable durability guarantees.
### Write-Intensive Workload
For workloads with many writes (e.g., logging, event streaming):
```go
config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 64 * 1024 * 1024 // 64MB
config.WALSyncMode = config.SyncBatch // Batch mode for better write throughput
config.WALSyncBytes = 4 * 1024 * 1024 // 4MB between syncs
config.SSTableBlockSize = 32 * 1024 // 32KB
config.CompactionRatio = 5 // More frequent compactions
```
### Read-Intensive Workload
For workloads with many reads (e.g., content serving, lookups):
```go
config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 16 * 1024 * 1024 // 16MB
config.SSTableBlockSize = 8 * 1024 // 8KB for better read performance
config.SSTableIndexSize = 32 * 1024 // 32KB for more index points
config.CompactionRatio = 20 // Less frequent compactions
```
### Low-Latency Workload
For workloads requiring minimal latency spikes:
```go
config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 8 * 1024 * 1024 // 8MB for quicker flushes
config.CompactionInterval = 5 // More frequent compaction checks
config.CompactionThreads = 1 // Reduce contention
```
### High-Durability Workload
For workloads where data durability is critical:
```go
config := config.NewDefaultConfig(dbPath)
config.WALSyncMode = config.SyncImmediate // Immediate sync after each write
config.MaxMemTableAge = 60 // Flush MemTables more frequently
```
### Memory-Constrained Environment
For environments with limited memory:
```go
config := config.NewDefaultConfig(dbPath)
config.MemTableSize = 4 * 1024 * 1024 // 4MB
config.MaxMemTables = 2 // Only keep 2 MemTables in memory
config.SSTableBlockSize = 4 * 1024 // 4KB blocks
```
## Environmental Considerations
### SSD vs HDD Storage
For SSD storage:
- Consider using larger block sizes (16KB-32KB)
- Batch WAL syncs are generally sufficient
For HDD storage:
- Use larger block sizes (32KB-64KB) to reduce seeks
- Consider more aggressive compaction to reduce fragmentation
### Client-Side vs Server-Side
For client-side applications:
- Reduce memory usage with smaller MemTable sizes
- Consider using SyncNone or SyncBatch modes for better performance
For server-side applications:
- Configure based on workload characteristics
- Allocate more memory for MemTables in high-throughput scenarios
## Performance Impact of Key Parameters
### WALSyncMode
- **SyncNone**: Highest write throughput, but risk of data loss on crash
- **SyncBatch**: Good balance of throughput and durability
- **SyncImmediate**: Highest durability, but lowest write throughput
### MemTableSize
- **Larger**: Better write throughput, higher memory usage, potentially longer pauses
- **Smaller**: Lower memory usage, more frequent compaction, potentially lower throughput
### SSTableBlockSize
- **Larger**: Better scan performance, slightly higher space usage
- **Smaller**: Better point lookup performance, potentially higher index overhead
### CompactionRatio
- **Larger**: Less frequent compaction, higher read amplification
- **Smaller**: More frequent compaction, lower read amplification
## Tuning Process
To find the optimal configuration for your specific workload:
1. Run the benchmarking tool with your expected workload:
```
go run ./cmd/storage-bench/... -tune
```
2. The tool will generate a recommendations report based on the benchmark results
3. Adjust the configuration based on the recommendations and your specific requirements
4. Validate with your application workload
## Example Custom Configuration
```go
// Example custom configuration for a write-heavy time-series database
func CustomTimeSeriesConfig(dbPath string) *config.Config {
cfg := config.NewDefaultConfig(dbPath)
// Optimize for write throughput
cfg.MemTableSize = 64 * 1024 * 1024
cfg.WALSyncMode = config.SyncBatch
cfg.WALSyncBytes = 4 * 1024 * 1024
// Optimize for sequential scans
cfg.SSTableBlockSize = 32 * 1024
// Optimize for compaction
cfg.CompactionRatio = 5
return cfg
}
```
## Conclusion
The Kevo Engine provides a flexible configuration system that can be tailored to various workloads and environments. By understanding the impact of each configuration parameter, you can optimize the engine for your specific needs.
For most applications, the default configuration provides a good starting point, but tuning can significantly improve performance for specific workloads.