jer/kevo

Jeremy Tregunna ee23a47a74

docs: added idea, plan, and todo docs

2025-04-19 14:06:53 -06:00

2.5 KiB

Raw Blame History

Go Storage: A Minimalist LSM Storage Engine

Vision

Build a clean, composable, and educational storage engine in Go that follows Log-Structured Merge Tree (LSM) principles, focusing on simplicity while providing the building blocks needed for higher-level database implementations.

Goals

1. Extreme Simplicity

Create minimal but complete primitives that can support various database paradigms (KV, relational, graph)
Prioritize readability and educational value over hyper-optimization
Use idiomatic Go with clear interfaces and documentation
Implement a single-writer architecture for simplicity and reduced concurrency complexity

2. Durability + Performance

Implement the LSM architecture pattern: Write-Ahead Log → MemTable → SSTables
Provide configurable durability guarantees (sync vs. batched fsync)
Optimize for both point lookups and range scans

3. Configurability

Store all configuration parameters in a versioned, persistent manifest
Allow tuning of memory usage, compaction behavior, and durability settings
Support reproducible startup states across restarts

4. Composable Primitives

Design clean interfaces for fundamental operations (reads, writes, snapshots, iteration)
Enable building of higher-level abstractions (SQL, Gremlin, custom query languages)
Support both transactional and analytical workloads
Provide simple atomic write primitives that can be built upon:
- Leverage read snapshots from immutable LSM structure
- Support basic atomic batch operations
- Ensure crash recovery through proper WAL handling

Target Use Cases

Educational Tool: Learn and teach storage engine internals
Embedded Storage: Applications needing local, durable storage with predictable performance
Prototype Foundation: Base layer for experimenting with novel database designs
Go Ecosystem Component: Reusable storage layer for Go applications and services

Non-Goals

Feature Parity with Production Engines: Not trying to compete with RocksDB, LevelDB, etc.
Multi-Node Distribution: Focusing on single-node operation
Complex Query Planning: Leaving higher-level query features to layers built on top

Success Criteria

Correctness: Data is never lost or corrupted, even during crashes
Understandability: Code is clear enough to serve as an educational reference
Performance: Reasonable throughput and latency for common operations
Extensibility: Can be built upon to create specialized database engines