kevo/TODO.md
Jeremy Tregunna 8dc23e5573
feat: implement engine package with Level 0 flush and read path
- Create Engine structure to manage memtables, SSTables and WAL
- Implement MemTable to SSTable flush mechanism
- Add background flush goroutine for periodic flushing
- Build iterator system for reading from multiple data sources
- Create range-bounded iterators for queries
- Implement unified hierarchical iterator in iterator package
- Update TODO.md to mark Phase D as complete
2025-04-19 18:44:37 -06:00

204 lines
7.7 KiB
Markdown

# Go Storage Engine Todo List
This document outlines the implementation tasks for the Go Storage Engine, organized by development phases. Follow these guidelines:
- Work on tasks in the order they appear
- Check off exactly one item (✓) before moving to the next unchecked item
- Each phase must be completed before starting the next phase
- Test thoroughly before marking an item complete
## Phase A: Foundation
- [✓] Setup project structure and Go module
- [✓] Create directory structure following the package layout in PLAN.md
- [✓] Initialize Go module and dependencies
- [✓] Set up testing framework
- [✓] Implement config package
- [✓] Define configuration struct with serialization/deserialization
- [✓] Include configurable parameters for durability, compaction, memory usage
- [✓] Create manifest loading/saving functionality
- [✓] Add versioning support for config changes
- [✓] Build Write-Ahead Log (WAL)
- [✓] Implement append-only file with atomic operations
- [✓] Add Put/Delete operation encoding
- [✓] Create replay functionality with error recovery
- [✓] Implement both synchronous (default) and batched fsync modes
- [✓] Add checksumming for entries
- [✓] Write WAL tests
- [✓] Test durability with simulated crashes
- [✓] Verify replay correctness
- [✓] Benchmark write performance with different sync options
- [✓] Test error handling and recovery
## Phase B: In-Memory Layer
- [✓] Implement MemTable
- [✓] Create skip list data structure aligned to 64-byte cache lines
- [✓] Add key/value insertion and lookup operations
- [✓] Implement sorted key iteration
- [✓] Add size tracking for flush threshold detection
- [✓] Connect WAL replay to MemTable
- [✓] Create recovery logic to rebuild MemTable from WAL
- [✓] Implement consistent snapshot reads during recovery
- [✓] Handle errors during replay with appropriate fallbacks
- [✓] Test concurrent read/write scenarios
- [✓] Verify reader isolation during writes
- [✓] Test snapshot consistency guarantees
- [✓] Benchmark read/write performance under load
## Phase C: Persistent Storage
- [✓] Design SSTable format
- [✓] Define 16KB block structure with restart points
- [✓] Create checksumming for blocks (xxHash64)
- [✓] Define index structure with entries every ~64KB
- [✓] Design file footer with metadata (version, timestamp, key count, etc.)
- [✓] Implement SSTable writer
- [✓] Add functionality to convert MemTable to blocks
- [✓] Create sparse index generator
- [✓] Implement footer writing with checksums
- [✓] Add atomic file creation for crash safety
- [✓] Build SSTable reader
- [✓] Implement block loading with validation
- [✓] Create binary search through index
- [✓] Develop iterator interface for scanning
- [✓] Add error handling for corrupted files
## Phase D: Basic Engine Integration
- [✓] Implement Level 0 flush mechanism
- [✓] Create MemTable to SSTable conversion process
- [✓] Implement file management and naming scheme
- [✓] Add background flush triggering based on size
- [✓] Create read path that merges data sources
- [✓] Implement read from current MemTable
- [✓] Add reads from immutable MemTables awaiting flush
- [✓] Create mechanism to read from Level 0 SSTable files
- [✓] Build priority-based lookup across all sources
- [✓] Implement unified iterator interface for all data sources
- [✓] Refactoring (to be done after completing Phase D)
- [✓] Create a common iterator interface in the iterator package
- [✓] Rename component-specific iterators (BlockIterator, MemTableIterator, etc.)
- [✓] Update all iterators to implement the common interface directly
## Phase E: Compaction
- [ ] Implement tiered compaction strategy
- [ ] Create file selection algorithm based on overlap/size
- [ ] Implement merge-sorted reading from input files
- [ ] Add atomic output file generation
- [ ] Create size ratio and file count based triggering
- [ ] Handle tombstones and key deletion
- [ ] Implement tombstone markers
- [ ] Create logic for tombstone garbage collection
- [ ] Test deletion correctness across compactions
- [ ] Manage file obsolescence and cleanup
- [ ] Implement safe file deletion after compaction
- [ ] Create consistent file tracking
- [ ] Add error handling for cleanup failures
- [ ] Build background compaction
- [ ] Implement worker pool for compaction tasks
- [ ] Add rate limiting to prevent I/O saturation
- [ ] Create metrics for monitoring compaction progress
- [ ] Implement priority scheduling for urgent compactions
## Phase F: Basic Atomicity and Features
- [ ] Implement merged iterator across all levels
- [ ] Create priority merging iterator
- [ ] Add efficient seeking capabilities
- [ ] Implement proper cleanup for resources
- [ ] Add snapshot capability
- [ ] Create point-in-time view mechanism
- [ ] Implement consistent reads across all data sources
- [ ] Add resource tracking and cleanup
- [ ] Test isolation guarantees
- [ ] Implement atomic batch operations
- [ ] Create batch data structure for multiple operations
- [ ] Implement atomic batch commit to WAL
- [ ] Add crash recovery for batches
- [ ] Design extensible interfaces for future transaction support
- [ ] Add basic statistics and metrics
- [ ] Implement counters for operations
- [ ] Add timing measurements for critical paths
- [ ] Create exportable metrics interface
- [ ] Test accuracy of metrics
## Phase G: Optimization and Benchmarking
- [ ] Develop benchmark suite
- [ ] Create random/sequential write benchmarks
- [ ] Implement point read and range scan benchmarks
- [ ] Add compaction overhead measurements
- [ ] Build reproducible benchmark harness
- [ ] Optimize critical paths
- [ ] Profile and identify bottlenecks
- [ ] Optimize memory usage patterns
- [ ] Improve cache efficiency in hot paths
- [ ] Reduce GC pressure for large operations
- [ ] Tune default configuration
- [ ] Benchmark with different parameters
- [ ] Determine optimal defaults for general use cases
- [ ] Document configuration recommendations
## Phase H: Optional Enhancements
- [ ] Add Bloom filters
- [ ] Implement configurable Bloom filter
- [ ] Add to SSTable format
- [ ] Create adaptive sizing based on false positive rates
- [ ] Benchmark improvement in read performance
- [ ] Create monitoring hooks
- [ ] Add detailed internal event tracking
- [ ] Implement exportable metrics
- [ ] Create health check mechanisms
- [ ] Add performance alerts
- [ ] Add crash recovery testing
- [ ] Build fault injection framework
- [ ] Create randomized crash scenarios
- [ ] Implement validation for post-recovery state
- [ ] Test edge cases in recovery
## API Implementation
- [ ] Implement Engine interface
- [ ] `Put(ctx context.Context, key, value []byte, opts ...WriteOption) error`
- [ ] `Get(ctx context.Context, key []byte, opts ...ReadOption) ([]byte, error)`
- [ ] `Delete(ctx context.Context, key []byte, opts ...WriteOption) error`
- [ ] `Batch(ctx context.Context, ops []Operation, opts ...WriteOption) error`
- [ ] `NewIterator(opts IteratorOptions) Iterator`
- [ ] `Snapshot() Snapshot`
- [ ] `Close() error`
- [ ] Implement error types
- [ ] `ErrIO` - I/O errors with recovery procedures
- [ ] `ErrCorruption` - Data integrity issues
- [ ] `ErrConfig` - Configuration errors
- [ ] `ErrResource` - Resource exhaustion
- [ ] `ErrConcurrency` - Race conditions
- [ ] `ErrNotFound` - Key not found
- [ ] Create comprehensive documentation
- [ ] API usage examples
- [ ] Configuration guidelines
- [ ] Performance characteristics
- [ ] Error handling recommendations