198 lines
7.3 KiB
Markdown
198 lines
7.3 KiB
Markdown
# Go Storage Engine Todo List
|
|
|
|
This document outlines the implementation tasks for the Go Storage Engine, organized by development phases. Follow these guidelines:
|
|
|
|
- Work on tasks in the order they appear
|
|
- Check off exactly one item (✓) before moving to the next unchecked item
|
|
- Each phase must be completed before starting the next phase
|
|
- Test thoroughly before marking an item complete
|
|
|
|
## Phase A: Foundation
|
|
|
|
- [✓] Setup project structure and Go module
|
|
- [✓] Create directory structure following the package layout in PLAN.md
|
|
- [✓] Initialize Go module and dependencies
|
|
- [✓] Set up testing framework
|
|
|
|
- [✓] Implement config package
|
|
- [✓] Define configuration struct with serialization/deserialization
|
|
- [✓] Include configurable parameters for durability, compaction, memory usage
|
|
- [✓] Create manifest loading/saving functionality
|
|
- [✓] Add versioning support for config changes
|
|
|
|
- [✓] Build Write-Ahead Log (WAL)
|
|
- [✓] Implement append-only file with atomic operations
|
|
- [✓] Add Put/Delete operation encoding
|
|
- [✓] Create replay functionality with error recovery
|
|
- [✓] Implement both synchronous (default) and batched fsync modes
|
|
- [✓] Add checksumming for entries
|
|
|
|
- [✓] Write WAL tests
|
|
- [✓] Test durability with simulated crashes
|
|
- [✓] Verify replay correctness
|
|
- [✓] Benchmark write performance with different sync options
|
|
- [✓] Test error handling and recovery
|
|
|
|
## Phase B: In-Memory Layer
|
|
|
|
- [✓] Implement MemTable
|
|
- [✓] Create skip list data structure aligned to 64-byte cache lines
|
|
- [✓] Add key/value insertion and lookup operations
|
|
- [✓] Implement sorted key iteration
|
|
- [✓] Add size tracking for flush threshold detection
|
|
|
|
- [✓] Connect WAL replay to MemTable
|
|
- [✓] Create recovery logic to rebuild MemTable from WAL
|
|
- [✓] Implement consistent snapshot reads during recovery
|
|
- [✓] Handle errors during replay with appropriate fallbacks
|
|
|
|
- [✓] Test concurrent read/write scenarios
|
|
- [✓] Verify reader isolation during writes
|
|
- [✓] Test snapshot consistency guarantees
|
|
- [✓] Benchmark read/write performance under load
|
|
|
|
## Phase C: Persistent Storage
|
|
|
|
- [ ] Design SSTable format
|
|
- [ ] Define 16KB block structure with restart points
|
|
- [ ] Create checksumming for blocks (xxHash64)
|
|
- [ ] Define index structure with entries every ~64KB
|
|
- [ ] Design file footer with metadata (version, timestamp, key count, etc.)
|
|
|
|
- [ ] Implement SSTable writer
|
|
- [ ] Add functionality to convert MemTable to blocks
|
|
- [ ] Create sparse index generator
|
|
- [ ] Implement footer writing with checksums
|
|
- [ ] Add atomic file creation for crash safety
|
|
|
|
- [ ] Build SSTable reader
|
|
- [ ] Implement block loading with validation
|
|
- [ ] Create binary search through index
|
|
- [ ] Develop iterator interface for scanning
|
|
- [ ] Add error handling for corrupted files
|
|
|
|
## Phase D: Basic Engine Integration
|
|
|
|
- [ ] Implement Level 0 flush mechanism
|
|
- [ ] Create MemTable to SSTable conversion process
|
|
- [ ] Implement file management and naming scheme
|
|
- [ ] Add background flush triggering based on size
|
|
|
|
- [ ] Create read path that merges data sources
|
|
- [ ] Implement read from current MemTable
|
|
- [ ] Add reads from immutable MemTables awaiting flush
|
|
- [ ] Create mechanism to read from Level 0 SSTable files
|
|
- [ ] Build priority-based lookup across all sources
|
|
|
|
## Phase E: Compaction
|
|
|
|
- [ ] Implement tiered compaction strategy
|
|
- [ ] Create file selection algorithm based on overlap/size
|
|
- [ ] Implement merge-sorted reading from input files
|
|
- [ ] Add atomic output file generation
|
|
- [ ] Create size ratio and file count based triggering
|
|
|
|
- [ ] Handle tombstones and key deletion
|
|
- [ ] Implement tombstone markers
|
|
- [ ] Create logic for tombstone garbage collection
|
|
- [ ] Test deletion correctness across compactions
|
|
|
|
- [ ] Manage file obsolescence and cleanup
|
|
- [ ] Implement safe file deletion after compaction
|
|
- [ ] Create consistent file tracking
|
|
- [ ] Add error handling for cleanup failures
|
|
|
|
- [ ] Build background compaction
|
|
- [ ] Implement worker pool for compaction tasks
|
|
- [ ] Add rate limiting to prevent I/O saturation
|
|
- [ ] Create metrics for monitoring compaction progress
|
|
- [ ] Implement priority scheduling for urgent compactions
|
|
|
|
## Phase F: Basic Atomicity and Features
|
|
|
|
- [ ] Implement merged iterator across all levels
|
|
- [ ] Create priority merging iterator
|
|
- [ ] Add efficient seeking capabilities
|
|
- [ ] Implement proper cleanup for resources
|
|
|
|
- [ ] Add snapshot capability
|
|
- [ ] Create point-in-time view mechanism
|
|
- [ ] Implement consistent reads across all data sources
|
|
- [ ] Add resource tracking and cleanup
|
|
- [ ] Test isolation guarantees
|
|
|
|
- [ ] Implement atomic batch operations
|
|
- [ ] Create batch data structure for multiple operations
|
|
- [ ] Implement atomic batch commit to WAL
|
|
- [ ] Add crash recovery for batches
|
|
- [ ] Design extensible interfaces for future transaction support
|
|
|
|
- [ ] Add basic statistics and metrics
|
|
- [ ] Implement counters for operations
|
|
- [ ] Add timing measurements for critical paths
|
|
- [ ] Create exportable metrics interface
|
|
- [ ] Test accuracy of metrics
|
|
|
|
## Phase G: Optimization and Benchmarking
|
|
|
|
- [ ] Develop benchmark suite
|
|
- [ ] Create random/sequential write benchmarks
|
|
- [ ] Implement point read and range scan benchmarks
|
|
- [ ] Add compaction overhead measurements
|
|
- [ ] Build reproducible benchmark harness
|
|
|
|
- [ ] Optimize critical paths
|
|
- [ ] Profile and identify bottlenecks
|
|
- [ ] Optimize memory usage patterns
|
|
- [ ] Improve cache efficiency in hot paths
|
|
- [ ] Reduce GC pressure for large operations
|
|
|
|
- [ ] Tune default configuration
|
|
- [ ] Benchmark with different parameters
|
|
- [ ] Determine optimal defaults for general use cases
|
|
- [ ] Document configuration recommendations
|
|
|
|
## Phase H: Optional Enhancements
|
|
|
|
- [ ] Add Bloom filters
|
|
- [ ] Implement configurable Bloom filter
|
|
- [ ] Add to SSTable format
|
|
- [ ] Create adaptive sizing based on false positive rates
|
|
- [ ] Benchmark improvement in read performance
|
|
|
|
- [ ] Create monitoring hooks
|
|
- [ ] Add detailed internal event tracking
|
|
- [ ] Implement exportable metrics
|
|
- [ ] Create health check mechanisms
|
|
- [ ] Add performance alerts
|
|
|
|
- [ ] Add crash recovery testing
|
|
- [ ] Build fault injection framework
|
|
- [ ] Create randomized crash scenarios
|
|
- [ ] Implement validation for post-recovery state
|
|
- [ ] Test edge cases in recovery
|
|
|
|
## API Implementation
|
|
|
|
- [ ] Implement Engine interface
|
|
- [ ] `Put(ctx context.Context, key, value []byte, opts ...WriteOption) error`
|
|
- [ ] `Get(ctx context.Context, key []byte, opts ...ReadOption) ([]byte, error)`
|
|
- [ ] `Delete(ctx context.Context, key []byte, opts ...WriteOption) error`
|
|
- [ ] `Batch(ctx context.Context, ops []Operation, opts ...WriteOption) error`
|
|
- [ ] `NewIterator(opts IteratorOptions) Iterator`
|
|
- [ ] `Snapshot() Snapshot`
|
|
- [ ] `Close() error`
|
|
|
|
- [ ] Implement error types
|
|
- [ ] `ErrIO` - I/O errors with recovery procedures
|
|
- [ ] `ErrCorruption` - Data integrity issues
|
|
- [ ] `ErrConfig` - Configuration errors
|
|
- [ ] `ErrResource` - Resource exhaustion
|
|
- [ ] `ErrConcurrency` - Race conditions
|
|
- [ ] `ErrNotFound` - Key not found
|
|
|
|
- [ ] Create comprehensive documentation
|
|
- [ ] API usage examples
|
|
- [ ] Configuration guidelines
|
|
- [ ] Performance characteristics
|
|
- [ ] Error handling recommendations |