jer/kevo

Go Tests / Run Tests (1.24.2) (push) Has been cancelled

Details

feat: Initial release of kevo storage engine.

Adds a complete LSM-based storage engine with these features:
- Single-writer based architecture for the storage engine
- WAL for durability, and hey it's configurable
- MemTable with skip list implementation for fast read/writes
- SSTable with block-based structure for on-disk level-based storage
- Background compaction with tiered strategy
- ACID transactions
- Good documentation (I hope)

2025-04-20 14:06:50 -06:00

12 KiB

Raw Blame History

Transaction Package Documentation

The transaction package implements ACID-compliant transactions for the Kevo engine. It provides a way to group multiple read and write operations into atomic units, ensuring data consistency and isolation.

Overview

Transactions in the Kevo engine follow a SQLite-inspired concurrency model using reader-writer locks. This approach provides a simple yet effective solution for concurrent access, allowing multiple simultaneous readers while ensuring exclusive write access.

Key responsibilities of the transaction package include:

Implementing atomic operations (all-or-nothing semantics)
Managing isolation between concurrent transactions
Providing a consistent view of data during transactions
Supporting both read-only and read-write transactions
Handling transaction commit and rollback

Architecture

Key Components

The transaction system consists of several interrelated components:

┌───────────────────────┐
│   Transaction (API)   │
└───────────┬───────────┘
            │
┌───────────▼───────────┐      ┌───────────────────────┐
│  EngineTransaction    │◄─────┤   TransactionCreator  │
└───────────┬───────────┘      └───────────────────────┘
            │
            ▼
┌───────────────────────┐      ┌───────────────────────┐
│     TxBuffer          │◄─────┤   Transaction          │
└───────────────────────┘      │      Iterators         │
                               └───────────────────────┘

Transaction Interface: The public API for transaction operations
EngineTransaction: Implementation of the Transaction interface
TransactionCreator: Factory pattern for creating transactions
TxBuffer: In-memory storage for uncommitted changes
Transaction Iterators: Special iterators that merge buffer and database state

ACID Properties Implementation

Atomicity

Transactions ensure all-or-nothing semantics through several mechanisms:

Write Buffering:
- All writes are stored in an in-memory buffer during the transaction
- No changes are applied to the database until commit
Batch Commit:
- At commit time, all changes are submitted as a single batch
- The WAL (Write-Ahead Log) ensures the batch is atomic
Rollback Support:
- Discarding the buffer effectively rolls back all changes
- No cleanup needed since changes weren't applied to the database

Consistency

The engine maintains data consistency through:

Single-Writer Architecture:
- Only one write transaction can be active at a time
- Prevents inconsistent states from concurrent modifications
Write-Ahead Logging:
- All changes are logged before being applied
- System can recover to a consistent state after crashes
Key Ordering:
- Keys are maintained in sorted order throughout the system
- Ensures consistent iteration and range scan behavior

Isolation

The transaction system provides isolation using a simple but effective approach:

Reader-Writer Locks:
- Read-only transactions acquire shared (read) locks
- Read-write transactions acquire exclusive (write) locks
- Multiple readers can execute concurrently
- Writers have exclusive access
Read Snapshot Semantics:
- Readers see a consistent snapshot of the database
- New writes by other transactions aren't visible
Isolation Level:
- Effectively provides "serializable" isolation
- Transactions execute as if they were run one after another

Durability

Durability is ensured through the WAL (Write-Ahead Log):

WAL Integration:
- Transaction commits are written to the WAL first
- Only after WAL sync are changes considered committed
Sync Options:
- Transactions can use different WAL sync modes
- Configurable trade-off between performance and durability

Implementation Details

Transaction Lifecycle

A transaction follows this lifecycle:

Creation:
- Read-only: Acquires a read lock
- Read-write: Acquires a write lock (exclusive)
Operation Phase:
- Read operations check the buffer first, then the engine
- Write operations are stored in the buffer only
Commit:
- Read-only: Simply releases the read lock
- Read-write: Applies buffered changes via a WAL batch, then releases write lock
Rollback:
- Discards the buffer
- Releases locks
- Marks transaction as closed

Transaction Buffer

The transaction buffer is an in-memory staging area for changes:

Buffering Mechanism:
- Stores key-value pairs and deletion markers
- Maintains sorted order for efficient iteration
- Deduplicates repeated operations on the same key
Precedence Rules:
- Buffer operations take precedence over engine values
- Latest operation on a key within the buffer wins
Tombstone Handling:
- Deletions are stored as tombstones in the buffer
- Applied to the engine only on commit

Transaction Iterators

Specialized iterators provide a merged view of buffer and engine data:

Merged View:
- Combines data from both the transaction buffer and the underlying engine
- Buffer entries take precedence over engine entries for the same key
Range Iterators:
- Support bounded iterations within a key range
- Enforce bounds checking on both buffer and engine data
Deletion Handling:
- Skip tombstones during iteration
- Hide engine keys that are deleted in the buffer

Concurrency Control

Reader-Writer Lock Model

The transaction system uses a simple reader-writer lock approach:

Lock Acquisition:
- Read-only transactions acquire shared (read) locks
- Read-write transactions acquire exclusive (write) locks
Concurrency Patterns:
- Multiple read-only transactions can run concurrently
- Read-write transactions run exclusively (no other transactions)
- Writers block new readers, but don't interrupt existing ones
Lock Management:
- Locks are acquired at transaction start
- Released at commit or rollback
- Safety mechanisms prevent multiple releases

Isolation Level

The system provides serializable isolation:

Serializable Semantics:
- Transactions behave as if executed one after another
- No anomalies like dirty reads, non-repeatable reads, or phantoms
Implementation Strategy:
- Simple locking approach
- Write exclusivity ensures no write conflicts
- Read snapshots provide consistent views
Optimistic vs. Pessimistic:
- Uses a pessimistic approach with up-front locking
- Avoids need for validation or aborts due to conflicts

Common Usage Patterns

Basic Transaction Usage

// Start a read-write transaction
tx, err := engine.BeginTransaction(false) // false = read-write
if err != nil {
    log.Fatal(err)
}

// Perform operations
err = tx.Put([]byte("key1"), []byte("value1"))
if err != nil {
    tx.Rollback()
    log.Fatal(err)
}

value, err := tx.Get([]byte("key2"))
if err != nil && err != engine.ErrKeyNotFound {
    tx.Rollback()
    log.Fatal(err)
}

// Delete a key
err = tx.Delete([]byte("key3"))
if err != nil {
    tx.Rollback()
    log.Fatal(err)
}

// Commit the transaction
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}

Read-Only Transactions

// Start a read-only transaction
tx, err := engine.BeginTransaction(true) // true = read-only
if err != nil {
    log.Fatal(err)
}
defer tx.Rollback() // Safe to call even after commit

// Perform read operations
value, err := tx.Get([]byte("key1"))
if err != nil && err != engine.ErrKeyNotFound {
    log.Fatal(err)
}

// Iterate over a range of keys
iter := tx.NewRangeIterator([]byte("start"), []byte("end"))
for iter.SeekToFirst(); iter.Valid(); iter.Next() {
    fmt.Printf("%s: %s\n", iter.Key(), iter.Value())
}

// Commit (for read-only, this just releases resources)
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}

Batch Operations

// Start a read-write transaction
tx, err := engine.BeginTransaction(false)
if err != nil {
    log.Fatal(err)
}

// Perform multiple operations
for i := 0; i < 100; i++ {
    key := []byte(fmt.Sprintf("key%d", i))
    value := []byte(fmt.Sprintf("value%d", i))
    
    if err := tx.Put(key, value); err != nil {
        tx.Rollback()
        log.Fatal(err)
    }
}

// Commit as a single atomic batch
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}

Performance Considerations

Transaction Overhead

Transactions introduce some overhead compared to direct engine operations:

Locking Overhead:
- Acquiring and releasing locks has some cost
- Write transactions block other transactions
Memory Usage:
- Transaction buffers consume memory
- Large transactions with many changes need more memory
Commit Cost:
- WAL batch writes and syncs add latency at commit time
- More changes in a transaction means higher commit cost

Optimization Strategies

Several strategies can improve transaction performance:

Transaction Sizing:
- Very large transactions increase memory pressure
- Very small transactions have higher per-operation overhead
- Find a balance based on your workload
Read-Only Preference:
- Use read-only transactions when possible
- They allow concurrency and have lower overhead
Batch Similar Operations:
- Group similar operations in a transaction
- Reduces overall transaction count
Key Locality:
- Group operations on related keys
- Improves cache locality and iterator efficiency

Limitations and Trade-offs

Concurrency Model Limitations

The simple locking approach has some trade-offs:

Writer Blocking:
- Only one writer at a time limits write throughput
- Long-running write transactions block other writers
No Write Concurrency:
- Unlike some databases, no support for row/key-level locking
- Entire database is locked for writes
No Deadlock Detection:
- Simple model doesn't need deadlock detection
- But also can't handle complex lock acquisition patterns

Error Handling

Transaction error handling requires some care:

Commit Errors:
- If commit fails, data is not persisted
- Application must decide whether to retry or report error
Rollback After Errors:
- Always rollback after encountering errors
- Prevents leaving locks held
Resource Leaks:
- Unclosed transactions can lead to lock leaks
- Use defer for Rollback() to ensure cleanup

Advanced Concepts

Potential Future Enhancements

Several enhancements could improve the transaction system:

Optimistic Concurrency:
- Allow concurrent write transactions with validation at commit time
- Could improve throughput for workloads with few conflicts
Finer-Grained Locking:
- Key-range locks or partitioned locks
- Would allow more concurrency for non-overlapping operations
Savepoints:
- Partial rollback capability within transactions
- Useful for complex operations with recovery points
Nested Transactions:
- Support for transactions within transactions
- Would enable more complex application logic

12 KiB Raw Blame History