From ee23a47a74bf5f9ea14044d874800450b092388d Mon Sep 17 00:00:00 2001
From: Jeremy Tregunna <jeremy@tregunna.ca>
Date: Sat, 19 Apr 2025 14:06:53 -0600
Subject: [PATCH] docs: added idea, plan, and todo docs

---
 IDEA.md |  52 +++++++++++++++
 PLAN.md | 154 +++++++++++++++++++++++++++++++++++++++++++
 TODO.md | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 go.mod  |   3 +
 4 files changed, 407 insertions(+)
 create mode 100644 IDEA.md
 create mode 100644 PLAN.md
 create mode 100644 TODO.md
 create mode 100644 go.mod

diff --git a/IDEA.md b/IDEA.md
new file mode 100644
index 0000000..2c69200
--- /dev/null
+++ b/IDEA.md
@@ -0,0 +1,52 @@
+# Go Storage: A Minimalist LSM Storage Engine
+
+## Vision
+
+Build a clean, composable, and educational storage engine in Go that follows Log-Structured Merge Tree (LSM) principles, focusing on simplicity while providing the building blocks needed for higher-level database implementations.
+
+## Goals
+
+### 1. Extreme Simplicity
+- Create minimal but complete primitives that can support various database paradigms (KV, relational, graph)
+- Prioritize readability and educational value over hyper-optimization
+- Use idiomatic Go with clear interfaces and documentation
+- Implement a single-writer architecture for simplicity and reduced concurrency complexity
+
+### 2. Durability + Performance
+- Implement the LSM architecture pattern: Write-Ahead Log → MemTable → SSTables
+- Provide configurable durability guarantees (sync vs. batched fsync)
+- Optimize for both point lookups and range scans
+
+### 3. Configurability
+- Store all configuration parameters in a versioned, persistent manifest
+- Allow tuning of memory usage, compaction behavior, and durability settings
+- Support reproducible startup states across restarts
+
+### 4. Composable Primitives
+- Design clean interfaces for fundamental operations (reads, writes, snapshots, iteration)
+- Enable building of higher-level abstractions (SQL, Gremlin, custom query languages)
+- Support both transactional and analytical workloads
+- Provide simple atomic write primitives that can be built upon:
+  - Leverage read snapshots from immutable LSM structure
+  - Support basic atomic batch operations
+  - Ensure crash recovery through proper WAL handling
+
+## Target Use Cases
+
+1. **Educational Tool**: Learn and teach storage engine internals
+2. **Embedded Storage**: Applications needing local, durable storage with predictable performance
+3. **Prototype Foundation**: Base layer for experimenting with novel database designs
+4. **Go Ecosystem Component**: Reusable storage layer for Go applications and services
+
+## Non-Goals
+
+1. **Feature Parity with Production Engines**: Not trying to compete with RocksDB, LevelDB, etc.
+2. **Multi-Node Distribution**: Focusing on single-node operation
+3. **Complex Query Planning**: Leaving higher-level query features to layers built on top
+
+## Success Criteria
+
+1. **Correctness**: Data is never lost or corrupted, even during crashes
+2. **Understandability**: Code is clear enough to serve as an educational reference
+3. **Performance**: Reasonable throughput and latency for common operations
+4. **Extensibility**: Can be built upon to create specialized database engines
\ No newline at end of file
diff --git a/PLAN.md b/PLAN.md
new file mode 100644
index 0000000..7b34e37
--- /dev/null
+++ b/PLAN.md
@@ -0,0 +1,154 @@
+# Implementation Plan for Go Storage Engine
+
+## Architecture Overview
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────────────────┐
+│ Client API  │────▶│  MemTable   │────▶│ Immutable SSTable Files │
+└─────────────┘     └─────────────┘     └─────────────────────────┘
+       │                   ▲                         ▲
+       │                   │                         │
+       ▼                   │                         │
+┌─────────────┐            │            ┌─────────────────────────┐
+│  Write-     │────────────┘            │ Background Compaction   │
+│  Ahead Log  │                         │ Process                 │
+└─────────────┘                         └─────────────────────────┘
+       │                                            │
+       │                                            │
+       ▼                                            ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                       Persistent Storage                         │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Package Structure
+
+```
+go-storage/
+├── cmd/
+│   └── storage-bench/       # Benchmarking tool
+│
+├── pkg/
+│   ├── config/              # Configuration and manifest
+│   ├── wal/                 # Write-ahead logging with transaction markers
+│   ├── memtable/            # In-memory table implementation
+│   ├── sstable/             # SSTable read/write
+│   │   ├── block/           # Block format implementation
+│   │   └── footer/          # File footer and metadata
+│   ├── compaction/          # Compaction strategies
+│   ├── iterator/            # Merged iterator implementation
+│   ├── transaction/         # Transaction management with Snapshot + WAL
+│   │   ├── snapshot/        # Read snapshot implementation
+│   │   └── txbuffer/        # Transaction write buffer
+│   └── engine/              # Main engine implementation with single-writer architecture
+│
+└── internal/
+    ├── checksum/            # Checksum utilities (xxHash64)
+    └── utils/               # Shared internal utilities
+```
+
+## Development Phases
+
+### Phase A: Foundation (1-2 weeks)
+1. Set up project structure and Go module
+2. Implement config package with serialization/deserialization
+3. Build basic WAL with:
+   - Append operations (Put/Delete)
+   - Replay functionality
+   - Configurable fsync modes
+4. Write comprehensive tests for WAL durability
+
+### Phase B: In-Memory Layer (1 week)
+1. Implement MemTable with:
+   - Skip list data structure
+   - Sorted key iteration
+   - Size tracking for flush threshold
+2. Connect WAL replay to MemTable restore
+3. Test concurrent read/write scenarios
+
+### Phase C: Persistent Storage (2 weeks)
+1. Design and implement SSTable format:
+   - Block-based layout with restart points
+   - Checksummed blocks
+   - Index and metadata in footer
+2. Build SSTable writer:
+   - Convert MemTable to blocks
+   - Generate sparse index
+   - Write footer with checksums
+3. Implement SSTable reader:
+   - Block loading and validation
+   - Binary search through index
+   - Iterator interface
+
+### Phase D: Basic Engine Integration (1 week)
+1. Implement Level 0 flush mechanism:
+   - MemTable to SSTable conversion
+   - File management and naming
+2. Create read path that merges:
+   - Current MemTable
+   - Immutable MemTables awaiting flush
+   - Level 0 SSTable files
+
+### Phase E: Compaction (2 weeks)
+1. Implement a single, efficient compaction strategy:
+   - Simple tiered compaction approach
+2. Handle tombstones and key deletion
+3. Manage file obsolescence and cleanup
+4. Build background compaction scheduling
+
+### Phase F: Basic Atomicity and Advanced Features (2-3 weeks)
+1. Implement merged iterator across all levels
+2. Add snapshot capability for reads:
+   - Point-in-time view of the database
+   - Consistent reads across MemTable and SSTables
+3. Implement simple atomic batch operations:
+   - Support atomic multi-key writes
+   - Ensure proper crash recovery for batch operations
+   - Design interfaces that can be extended for full transactions
+4. Add basic statistics and metrics
+
+### Phase G: Optimization and Benchmarking (1 week)
+1. Develop benchmark suite for:
+   - Random vs sequential writes
+   - Point reads vs range scans
+   - Compaction overhead and pauses
+2. Optimize critical paths based on profiling
+3. Tune default configuration parameters
+
+### Phase H: Optional Enhancements (as needed)
+1. Add Bloom filters to reduce disk reads
+2. Create monitoring hooks and detailed metrics
+3. Add crash recovery testing
+
+## Testing Strategy
+
+1. **Unit Tests**: Each component thoroughly tested in isolation
+2. **Integration Tests**: End-to-end tests for complete workflows
+3. **Property Tests**: Generate randomized operations and verify correctness
+4. **Crash Tests**: Simulate crashes and verify recovery
+5. **Benchmarks**: Measure performance across different workloads
+
+## Implementation Notes
+
+### Error Handling
+- Use descriptive error types and wrap errors with context
+- Implement recovery mechanisms for all critical operations
+- Validate checksums at every read opportunity
+
+### Concurrency
+- Implement single-writer architecture for the main write path
+- Allow concurrent readers (snapshots) to proceed without blocking
+- Use appropriate synchronization for reader-writer coordination
+- Ensure proper isolation between transactions
+
+### Batch Operation Management
+- Use WAL for atomic batch operation durability
+- Leverage LSM's natural versioning for snapshots
+- Provide simple interfaces that can be built upon for transactions
+- Ensure proper crash recovery for batch operations
+
+### Go Idioms
+- Follow standard Go project layout
+- Use interfaces for component boundaries
+- Rely on Go's GC but manage large memory allocations carefully
+- Use context for cancellation where appropriate
\ No newline at end of file
diff --git a/TODO.md b/TODO.md
new file mode 100644
index 0000000..f9487a7
--- /dev/null
+++ b/TODO.md
@@ -0,0 +1,198 @@
+# Go Storage Engine Todo List
+
+This document outlines the implementation tasks for the Go Storage Engine, organized by development phases. Follow these guidelines:
+
+- Work on tasks in the order they appear
+- Check off exactly one item (✓) before moving to the next unchecked item
+- Each phase must be completed before starting the next phase
+- Test thoroughly before marking an item complete
+
+## Phase A: Foundation
+
+- [ ] Setup project structure and Go module
+  - [ ] Create directory structure following the package layout in PLAN.md
+  - [ ] Initialize Go module and dependencies
+  - [ ] Set up testing framework
+
+- [ ] Implement config package
+  - [ ] Define configuration struct with serialization/deserialization
+  - [ ] Include configurable parameters for durability, compaction, memory usage
+  - [ ] Create manifest loading/saving functionality
+  - [ ] Add versioning support for config changes
+
+- [ ] Build Write-Ahead Log (WAL)
+  - [ ] Implement append-only file with atomic operations
+  - [ ] Add Put/Delete operation encoding
+  - [ ] Create replay functionality with error recovery
+  - [ ] Implement both synchronous (default) and batched fsync modes
+  - [ ] Add checksumming for entries
+
+- [ ] Write WAL tests
+  - [ ] Test durability with simulated crashes
+  - [ ] Verify replay correctness
+  - [ ] Benchmark write performance with different sync options
+  - [ ] Test error handling and recovery
+
+## Phase B: In-Memory Layer
+
+- [ ] Implement MemTable
+  - [ ] Create skip list data structure aligned to 64-byte cache lines
+  - [ ] Add key/value insertion and lookup operations
+  - [ ] Implement sorted key iteration
+  - [ ] Add size tracking for flush threshold detection
+
+- [ ] Connect WAL replay to MemTable
+  - [ ] Create recovery logic to rebuild MemTable from WAL
+  - [ ] Implement consistent snapshot reads during recovery
+  - [ ] Handle errors during replay with appropriate fallbacks
+
+- [ ] Test concurrent read/write scenarios
+  - [ ] Verify reader isolation during writes
+  - [ ] Test snapshot consistency guarantees
+  - [ ] Benchmark read/write performance under load
+
+## Phase C: Persistent Storage
+
+- [ ] Design SSTable format
+  - [ ] Define 16KB block structure with restart points
+  - [ ] Create checksumming for blocks (xxHash64)
+  - [ ] Define index structure with entries every ~64KB
+  - [ ] Design file footer with metadata (version, timestamp, key count, etc.)
+
+- [ ] Implement SSTable writer
+  - [ ] Add functionality to convert MemTable to blocks
+  - [ ] Create sparse index generator
+  - [ ] Implement footer writing with checksums
+  - [ ] Add atomic file creation for crash safety
+
+- [ ] Build SSTable reader
+  - [ ] Implement block loading with validation
+  - [ ] Create binary search through index
+  - [ ] Develop iterator interface for scanning
+  - [ ] Add error handling for corrupted files
+
+## Phase D: Basic Engine Integration
+
+- [ ] Implement Level 0 flush mechanism
+  - [ ] Create MemTable to SSTable conversion process
+  - [ ] Implement file management and naming scheme
+  - [ ] Add background flush triggering based on size
+
+- [ ] Create read path that merges data sources
+  - [ ] Implement read from current MemTable
+  - [ ] Add reads from immutable MemTables awaiting flush
+  - [ ] Create mechanism to read from Level 0 SSTable files
+  - [ ] Build priority-based lookup across all sources
+
+## Phase E: Compaction
+
+- [ ] Implement tiered compaction strategy
+  - [ ] Create file selection algorithm based on overlap/size
+  - [ ] Implement merge-sorted reading from input files
+  - [ ] Add atomic output file generation
+  - [ ] Create size ratio and file count based triggering
+
+- [ ] Handle tombstones and key deletion
+  - [ ] Implement tombstone markers
+  - [ ] Create logic for tombstone garbage collection
+  - [ ] Test deletion correctness across compactions
+
+- [ ] Manage file obsolescence and cleanup
+  - [ ] Implement safe file deletion after compaction
+  - [ ] Create consistent file tracking
+  - [ ] Add error handling for cleanup failures
+
+- [ ] Build background compaction
+  - [ ] Implement worker pool for compaction tasks
+  - [ ] Add rate limiting to prevent I/O saturation
+  - [ ] Create metrics for monitoring compaction progress
+  - [ ] Implement priority scheduling for urgent compactions
+
+## Phase F: Basic Atomicity and Features
+
+- [ ] Implement merged iterator across all levels
+  - [ ] Create priority merging iterator
+  - [ ] Add efficient seeking capabilities
+  - [ ] Implement proper cleanup for resources
+
+- [ ] Add snapshot capability
+  - [ ] Create point-in-time view mechanism
+  - [ ] Implement consistent reads across all data sources
+  - [ ] Add resource tracking and cleanup
+  - [ ] Test isolation guarantees
+
+- [ ] Implement atomic batch operations
+  - [ ] Create batch data structure for multiple operations
+  - [ ] Implement atomic batch commit to WAL
+  - [ ] Add crash recovery for batches
+  - [ ] Design extensible interfaces for future transaction support
+
+- [ ] Add basic statistics and metrics
+  - [ ] Implement counters for operations
+  - [ ] Add timing measurements for critical paths
+  - [ ] Create exportable metrics interface
+  - [ ] Test accuracy of metrics
+
+## Phase G: Optimization and Benchmarking
+
+- [ ] Develop benchmark suite
+  - [ ] Create random/sequential write benchmarks
+  - [ ] Implement point read and range scan benchmarks
+  - [ ] Add compaction overhead measurements
+  - [ ] Build reproducible benchmark harness
+
+- [ ] Optimize critical paths
+  - [ ] Profile and identify bottlenecks
+  - [ ] Optimize memory usage patterns
+  - [ ] Improve cache efficiency in hot paths
+  - [ ] Reduce GC pressure for large operations
+
+- [ ] Tune default configuration
+  - [ ] Benchmark with different parameters
+  - [ ] Determine optimal defaults for general use cases
+  - [ ] Document configuration recommendations
+
+## Phase H: Optional Enhancements
+
+- [ ] Add Bloom filters
+  - [ ] Implement configurable Bloom filter
+  - [ ] Add to SSTable format
+  - [ ] Create adaptive sizing based on false positive rates
+  - [ ] Benchmark improvement in read performance
+
+- [ ] Create monitoring hooks
+  - [ ] Add detailed internal event tracking
+  - [ ] Implement exportable metrics
+  - [ ] Create health check mechanisms
+  - [ ] Add performance alerts
+
+- [ ] Add crash recovery testing
+  - [ ] Build fault injection framework
+  - [ ] Create randomized crash scenarios
+  - [ ] Implement validation for post-recovery state
+  - [ ] Test edge cases in recovery
+
+## API Implementation
+
+- [ ] Implement Engine interface
+  - [ ] `Put(ctx context.Context, key, value []byte, opts ...WriteOption) error`
+  - [ ] `Get(ctx context.Context, key []byte, opts ...ReadOption) ([]byte, error)`
+  - [ ] `Delete(ctx context.Context, key []byte, opts ...WriteOption) error`
+  - [ ] `Batch(ctx context.Context, ops []Operation, opts ...WriteOption) error`
+  - [ ] `NewIterator(opts IteratorOptions) Iterator`
+  - [ ] `Snapshot() Snapshot`
+  - [ ] `Close() error`
+
+- [ ] Implement error types
+  - [ ] `ErrIO` - I/O errors with recovery procedures
+  - [ ] `ErrCorruption` - Data integrity issues
+  - [ ] `ErrConfig` - Configuration errors
+  - [ ] `ErrResource` - Resource exhaustion
+  - [ ] `ErrConcurrency` - Race conditions
+  - [ ] `ErrNotFound` - Key not found
+
+- [ ] Create comprehensive documentation
+  - [ ] API usage examples
+  - [ ] Configuration guidelines
+  - [ ] Performance characteristics
+  - [ ] Error handling recommendations
\ No newline at end of file
diff --git a/go.mod b/go.mod
new file mode 100644
index 0000000..55d6233
--- /dev/null
+++ b/go.mod
@@ -0,0 +1,3 @@
+module git.canoozie.net/jer/go-storage
+
+go 1.24.2