7.2 KiB
Go Storage Engine Todo List
This document outlines the implementation tasks for the Go Storage Engine, organized by development phases. Follow these guidelines:
- Work on tasks in the order they appear
- Check off exactly one item (✓) before moving to the next unchecked item
- Each phase must be completed before starting the next phase
- Test thoroughly before marking an item complete
Phase A: Foundation
-
Setup project structure and Go module
- Create directory structure following the package layout in PLAN.md
- Initialize Go module and dependencies
- Set up testing framework
-
Implement config package
- Define configuration struct with serialization/deserialization
- Include configurable parameters for durability, compaction, memory usage
- Create manifest loading/saving functionality
- Add versioning support for config changes
-
Build Write-Ahead Log (WAL)
- Implement append-only file with atomic operations
- Add Put/Delete operation encoding
- Create replay functionality with error recovery
- Implement both synchronous (default) and batched fsync modes
- Add checksumming for entries
-
Write WAL tests
- Test durability with simulated crashes
- Verify replay correctness
- Benchmark write performance with different sync options
- Test error handling and recovery
Phase B: In-Memory Layer
-
Implement MemTable
- Create skip list data structure aligned to 64-byte cache lines
- Add key/value insertion and lookup operations
- Implement sorted key iteration
- Add size tracking for flush threshold detection
-
Connect WAL replay to MemTable
- Create recovery logic to rebuild MemTable from WAL
- Implement consistent snapshot reads during recovery
- Handle errors during replay with appropriate fallbacks
-
Test concurrent read/write scenarios
- Verify reader isolation during writes
- Test snapshot consistency guarantees
- Benchmark read/write performance under load
Phase C: Persistent Storage
-
Design SSTable format
- Define 16KB block structure with restart points
- Create checksumming for blocks (xxHash64)
- Define index structure with entries every ~64KB
- Design file footer with metadata (version, timestamp, key count, etc.)
-
Implement SSTable writer
- Add functionality to convert MemTable to blocks
- Create sparse index generator
- Implement footer writing with checksums
- Add atomic file creation for crash safety
-
Build SSTable reader
- Implement block loading with validation
- Create binary search through index
- Develop iterator interface for scanning
- Add error handling for corrupted files
Phase D: Basic Engine Integration
-
Implement Level 0 flush mechanism
- Create MemTable to SSTable conversion process
- Implement file management and naming scheme
- Add background flush triggering based on size
-
Create read path that merges data sources
- Implement read from current MemTable
- Add reads from immutable MemTables awaiting flush
- Create mechanism to read from Level 0 SSTable files
- Build priority-based lookup across all sources
Phase E: Compaction
-
Implement tiered compaction strategy
- Create file selection algorithm based on overlap/size
- Implement merge-sorted reading from input files
- Add atomic output file generation
- Create size ratio and file count based triggering
-
Handle tombstones and key deletion
- Implement tombstone markers
- Create logic for tombstone garbage collection
- Test deletion correctness across compactions
-
Manage file obsolescence and cleanup
- Implement safe file deletion after compaction
- Create consistent file tracking
- Add error handling for cleanup failures
-
Build background compaction
- Implement worker pool for compaction tasks
- Add rate limiting to prevent I/O saturation
- Create metrics for monitoring compaction progress
- Implement priority scheduling for urgent compactions
Phase F: Basic Atomicity and Features
-
Implement merged iterator across all levels
- Create priority merging iterator
- Add efficient seeking capabilities
- Implement proper cleanup for resources
-
Add snapshot capability
- Create point-in-time view mechanism
- Implement consistent reads across all data sources
- Add resource tracking and cleanup
- Test isolation guarantees
-
Implement atomic batch operations
- Create batch data structure for multiple operations
- Implement atomic batch commit to WAL
- Add crash recovery for batches
- Design extensible interfaces for future transaction support
-
Add basic statistics and metrics
- Implement counters for operations
- Add timing measurements for critical paths
- Create exportable metrics interface
- Test accuracy of metrics
Phase G: Optimization and Benchmarking
-
Develop benchmark suite
- Create random/sequential write benchmarks
- Implement point read and range scan benchmarks
- Add compaction overhead measurements
- Build reproducible benchmark harness
-
Optimize critical paths
- Profile and identify bottlenecks
- Optimize memory usage patterns
- Improve cache efficiency in hot paths
- Reduce GC pressure for large operations
-
Tune default configuration
- Benchmark with different parameters
- Determine optimal defaults for general use cases
- Document configuration recommendations
Phase H: Optional Enhancements
-
Add Bloom filters
- Implement configurable Bloom filter
- Add to SSTable format
- Create adaptive sizing based on false positive rates
- Benchmark improvement in read performance
-
Create monitoring hooks
- Add detailed internal event tracking
- Implement exportable metrics
- Create health check mechanisms
- Add performance alerts
-
Add crash recovery testing
- Build fault injection framework
- Create randomized crash scenarios
- Implement validation for post-recovery state
- Test edge cases in recovery
API Implementation
-
Implement Engine interface
Put(ctx context.Context, key, value []byte, opts ...WriteOption) error
Get(ctx context.Context, key []byte, opts ...ReadOption) ([]byte, error)
Delete(ctx context.Context, key []byte, opts ...WriteOption) error
Batch(ctx context.Context, ops []Operation, opts ...WriteOption) error
NewIterator(opts IteratorOptions) Iterator
Snapshot() Snapshot
Close() error
-
Implement error types
ErrIO
- I/O errors with recovery proceduresErrCorruption
- Data integrity issuesErrConfig
- Configuration errorsErrResource
- Resource exhaustionErrConcurrency
- Race conditionsErrNotFound
- Key not found
-
Create comprehensive documentation
- API usage examples
- Configuration guidelines
- Performance characteristics
- Error handling recommendations