- Remove redundant MergedIterator (was just an alias for HierarchicalIterator) - Add IsTombstone method to all iterators to detect deletion markers - Enhance tombstone tracking in compaction manager with preservation options - Fix SSTable reader to properly handle tombstone entries - Update engine tests to directly verify tombstone behavior - Update TODO.md to mark merged iterator task as complete
7.6 KiB
Go Storage Engine Todo List
This document outlines the implementation tasks for the Go Storage Engine, organized by development phases. Follow these guidelines:
- Work on tasks in the order they appear
- Check off exactly one item (x) before moving to the next unchecked item
- Each phase must be completed before starting the next phase
- Test thoroughly before marking an item complete
Phase A: Foundation
-
Setup project structure and Go module
- Create directory structure following the package layout in PLAN.md
- Initialize Go module and dependencies
- Set up testing framework
-
Implement config package
- Define configuration struct with serialization/deserialization
- Include configurable parameters for durability, compaction, memory usage
- Create manifest loading/saving functionality
- Add versioning support for config changes
-
Build Write-Ahead Log (WAL)
- Implement append-only file with atomic operations
- Add Put/Delete operation encoding
- Create replay functionality with error recovery
- Implement both synchronous (default) and batched fsync modes
- Add checksumming for entries
-
Write WAL tests
- Test durability with simulated crashes
- Verify replay correctness
- Benchmark write performance with different sync options
- Test error handling and recovery
Phase B: In-Memory Layer
-
Implement MemTable
- Create skip list data structure aligned to 64-byte cache lines
- Add key/value insertion and lookup operations
- Implement sorted key iteration
- Add size tracking for flush threshold detection
-
Connect WAL replay to MemTable
- Create recovery logic to rebuild MemTable from WAL
- Implement consistent snapshot reads during recovery
- Handle errors during replay with appropriate fallbacks
-
Test concurrent read/write scenarios
- Verify reader isolation during writes
- Test snapshot consistency guarantees
- Benchmark read/write performance under load
Phase C: Persistent Storage
-
Design SSTable format
- Define 16KB block structure with restart points
- Create checksumming for blocks (xxHash64)
- Define index structure with entries every ~64KB
- Design file footer with metadata (version, timestamp, key count, etc.)
-
Implement SSTable writer
- Add functionality to convert MemTable to blocks
- Create sparse index generator
- Implement footer writing with checksums
- Add atomic file creation for crash safety
-
Build SSTable reader
- Implement block loading with validation
- Create binary search through index
- Develop iterator interface for scanning
- Add error handling for corrupted files
Phase D: Basic Engine Integration
-
Implement Level 0 flush mechanism
- Create MemTable to SSTable conversion process
- Implement file management and naming scheme
- Add background flush triggering based on size
-
Create read path that merges data sources
- Implement read from current MemTable
- Add reads from immutable MemTables awaiting flush
- Create mechanism to read from Level 0 SSTable files
- Build priority-based lookup across all sources
- Implement unified iterator interface for all data sources
-
Refactoring (to be done after completing Phase D)
- Create a common iterator interface in the iterator package
- Rename component-specific iterators (BlockIterator, MemTableIterator, etc.)
- Update all iterators to implement the common interface directly
Phase E: Compaction
-
Implement tiered compaction strategy
- Create file selection algorithm based on overlap/size
- Implement merge-sorted reading from input files
- Add atomic output file generation
- Create size ratio and file count based triggering
-
Handle tombstones and key deletion
- Implement tombstone markers
- Create logic for tombstone garbage collection
- Test deletion correctness across compactions
-
Manage file obsolescence and cleanup
- Implement safe file deletion after compaction
- Create consistent file tracking
- Add error handling for cleanup failures
-
Build background compaction
- Implement worker pool for compaction tasks
- Add rate limiting to prevent I/O saturation
- Create metrics for monitoring compaction progress
- Implement priority scheduling for urgent compactions
Phase F: Basic Atomicity and Features
-
Implement merged iterator across all levels
- Create priority merging iterator
- Add efficient seeking capabilities
- Implement proper cleanup for resources
-
Add snapshot capability
- Create point-in-time view mechanism
- Implement consistent reads across all data sources
- Add resource tracking and cleanup
- Test isolation guarantees
-
Implement atomic batch operations
- Create batch data structure for multiple operations
- Implement atomic batch commit to WAL
- Add crash recovery for batches
- Design extensible interfaces for future transaction support
-
Add basic statistics and metrics
- Implement counters for operations
- Add timing measurements for critical paths
- Create exportable metrics interface
- Test accuracy of metrics
Phase G: Optimization and Benchmarking
-
Develop benchmark suite
- Create random/sequential write benchmarks
- Implement point read and range scan benchmarks
- Add compaction overhead measurements
- Build reproducible benchmark harness
-
Optimize critical paths
- Profile and identify bottlenecks
- Optimize memory usage patterns
- Improve cache efficiency in hot paths
- Reduce GC pressure for large operations
-
Tune default configuration
- Benchmark with different parameters
- Determine optimal defaults for general use cases
- Document configuration recommendations
Phase H: Optional Enhancements
-
Add Bloom filters
- Implement configurable Bloom filter
- Add to SSTable format
- Create adaptive sizing based on false positive rates
- Benchmark improvement in read performance
-
Create monitoring hooks
- Add detailed internal event tracking
- Implement exportable metrics
- Create health check mechanisms
- Add performance alerts
-
Add crash recovery testing
- Build fault injection framework
- Create randomized crash scenarios
- Implement validation for post-recovery state
- Test edge cases in recovery
API Implementation
-
Implement Engine interface
Put(ctx context.Context, key, value []byte, opts ...WriteOption) error
Get(ctx context.Context, key []byte, opts ...ReadOption) ([]byte, error)
Delete(ctx context.Context, key []byte, opts ...WriteOption) error
Batch(ctx context.Context, ops []Operation, opts ...WriteOption) error
NewIterator(opts IteratorOptions) Iterator
Snapshot() Snapshot
Close() error
-
Implement error types
ErrIO
- I/O errors with recovery proceduresErrCorruption
- Data integrity issuesErrConfig
- Configuration errorsErrResource
- Resource exhaustionErrConcurrency
- Race conditionsErrNotFound
- Key not found
-
Create comprehensive documentation
- API usage examples
- Configuration guidelines
- Performance characteristics
- Error handling recommendations