kevo/IDEA.md

2.5 KiB

Go Storage: A Minimalist LSM Storage Engine

Vision

Build a clean, composable, and educational storage engine in Go that follows Log-Structured Merge Tree (LSM) principles, focusing on simplicity while providing the building blocks needed for higher-level database implementations.

Goals

1. Extreme Simplicity

  • Create minimal but complete primitives that can support various database paradigms (KV, relational, graph)
  • Prioritize readability and educational value over hyper-optimization
  • Use idiomatic Go with clear interfaces and documentation
  • Implement a single-writer architecture for simplicity and reduced concurrency complexity

2. Durability + Performance

  • Implement the LSM architecture pattern: Write-Ahead Log → MemTable → SSTables
  • Provide configurable durability guarantees (sync vs. batched fsync)
  • Optimize for both point lookups and range scans

3. Configurability

  • Store all configuration parameters in a versioned, persistent manifest
  • Allow tuning of memory usage, compaction behavior, and durability settings
  • Support reproducible startup states across restarts

4. Composable Primitives

  • Design clean interfaces for fundamental operations (reads, writes, snapshots, iteration)
  • Enable building of higher-level abstractions (SQL, Gremlin, custom query languages)
  • Support both transactional and analytical workloads
  • Provide simple atomic write primitives that can be built upon:
    • Leverage read snapshots from immutable LSM structure
    • Support basic atomic batch operations
    • Ensure crash recovery through proper WAL handling

Target Use Cases

  1. Educational Tool: Learn and teach storage engine internals
  2. Embedded Storage: Applications needing local, durable storage with predictable performance
  3. Prototype Foundation: Base layer for experimenting with novel database designs
  4. Go Ecosystem Component: Reusable storage layer for Go applications and services

Non-Goals

  1. Feature Parity with Production Engines: Not trying to compete with RocksDB, LevelDB, etc.
  2. Multi-Node Distribution: Focusing on single-node operation
  3. Complex Query Planning: Leaving higher-level query features to layers built on top

Success Criteria

  1. Correctness: Data is never lost or corrupted, even during crashes
  2. Understandability: Code is clear enough to serve as an educational reference
  3. Performance: Reasonable throughput and latency for common operations
  4. Extensibility: Can be built upon to create specialized database engines