# Go Storage: A Minimalist LSM Storage Engine ## Vision Build a clean, composable, and educational storage engine in Go that follows Log-Structured Merge Tree (LSM) principles, focusing on simplicity while providing the building blocks needed for higher-level database implementations. ## Goals ### 1. Extreme Simplicity - Create minimal but complete primitives that can support various database paradigms (KV, relational, graph) - Prioritize readability and educational value over hyper-optimization - Use idiomatic Go with clear interfaces and documentation - Implement a single-writer architecture for simplicity and reduced concurrency complexity ### 2. Durability + Performance - Implement the LSM architecture pattern: Write-Ahead Log → MemTable → SSTables - Provide configurable durability guarantees (sync vs. batched fsync) - Optimize for both point lookups and range scans ### 3. Configurability - Store all configuration parameters in a versioned, persistent manifest - Allow tuning of memory usage, compaction behavior, and durability settings - Support reproducible startup states across restarts ### 4. Composable Primitives - Design clean interfaces for fundamental operations (reads, writes, snapshots, iteration) - Enable building of higher-level abstractions (SQL, Gremlin, custom query languages) - Support both transactional and analytical workloads - Provide simple atomic write primitives that can be built upon: - Leverage read snapshots from immutable LSM structure - Support basic atomic batch operations - Ensure crash recovery through proper WAL handling ## Target Use Cases 1. **Educational Tool**: Learn and teach storage engine internals 2. **Embedded Storage**: Applications needing local, durable storage with predictable performance 3. **Prototype Foundation**: Base layer for experimenting with novel database designs 4. **Go Ecosystem Component**: Reusable storage layer for Go applications and services ## Non-Goals 1. **Feature Parity with Production Engines**: Not trying to compete with RocksDB, LevelDB, etc. 2. **Multi-Node Distribution**: Focusing on single-node operation 3. **Complex Query Planning**: Leaving higher-level query features to layers built on top ## Success Criteria 1. **Correctness**: Data is never lost or corrupted, even during crashes 2. **Understandability**: Code is clear enough to serve as an educational reference 3. **Performance**: Reasonable throughput and latency for common operations 4. **Extensibility**: Can be built upon to create specialized database engines