2.5 KiB
2.5 KiB
Go Storage: A Minimalist LSM Storage Engine
Vision
Build a clean, composable, and educational storage engine in Go that follows Log-Structured Merge Tree (LSM) principles, focusing on simplicity while providing the building blocks needed for higher-level database implementations.
Goals
1. Extreme Simplicity
- Create minimal but complete primitives that can support various database paradigms (KV, relational, graph)
- Prioritize readability and educational value over hyper-optimization
- Use idiomatic Go with clear interfaces and documentation
- Implement a single-writer architecture for simplicity and reduced concurrency complexity
2. Durability + Performance
- Implement the LSM architecture pattern: Write-Ahead Log → MemTable → SSTables
- Provide configurable durability guarantees (sync vs. batched fsync)
- Optimize for both point lookups and range scans
3. Configurability
- Store all configuration parameters in a versioned, persistent manifest
- Allow tuning of memory usage, compaction behavior, and durability settings
- Support reproducible startup states across restarts
4. Composable Primitives
- Design clean interfaces for fundamental operations (reads, writes, snapshots, iteration)
- Enable building of higher-level abstractions (SQL, Gremlin, custom query languages)
- Support both transactional and analytical workloads
- Provide simple atomic write primitives that can be built upon:
- Leverage read snapshots from immutable LSM structure
- Support basic atomic batch operations
- Ensure crash recovery through proper WAL handling
Target Use Cases
- Educational Tool: Learn and teach storage engine internals
- Embedded Storage: Applications needing local, durable storage with predictable performance
- Prototype Foundation: Base layer for experimenting with novel database designs
- Go Ecosystem Component: Reusable storage layer for Go applications and services
Non-Goals
- Feature Parity with Production Engines: Not trying to compete with RocksDB, LevelDB, etc.
- Multi-Node Distribution: Focusing on single-node operation
- Complex Query Planning: Leaving higher-level query features to layers built on top
Success Criteria
- Correctness: Data is never lost or corrupted, even during crashes
- Understandability: Code is clear enough to serve as an educational reference
- Performance: Reasonable throughput and latency for common operations
- Extensibility: Can be built upon to create specialized database engines