kevo/docs/transaction.md
Jeremy Tregunna 6fc3be617d
Some checks failed
Go Tests / Run Tests (1.24.2) (push) Has been cancelled
feat: Initial release of kevo storage engine.
Adds a complete LSM-based storage engine with these features:
- Single-writer based architecture for the storage engine
- WAL for durability, and hey it's configurable
- MemTable with skip list implementation for fast read/writes
- SSTable with block-based structure for on-disk level-based storage
- Background compaction with tiered strategy
- ACID transactions
- Good documentation (I hope)
2025-04-20 14:06:50 -06:00

12 KiB

Transaction Package Documentation

The transaction package implements ACID-compliant transactions for the Kevo engine. It provides a way to group multiple read and write operations into atomic units, ensuring data consistency and isolation.

Overview

Transactions in the Kevo engine follow a SQLite-inspired concurrency model using reader-writer locks. This approach provides a simple yet effective solution for concurrent access, allowing multiple simultaneous readers while ensuring exclusive write access.

Key responsibilities of the transaction package include:

  • Implementing atomic operations (all-or-nothing semantics)
  • Managing isolation between concurrent transactions
  • Providing a consistent view of data during transactions
  • Supporting both read-only and read-write transactions
  • Handling transaction commit and rollback

Architecture

Key Components

The transaction system consists of several interrelated components:

┌───────────────────────┐
│   Transaction (API)   │
└───────────┬───────────┘
            │
┌───────────▼───────────┐      ┌───────────────────────┐
│  EngineTransaction    │◄─────┤   TransactionCreator  │
└───────────┬───────────┘      └───────────────────────┘
            │
            ▼
┌───────────────────────┐      ┌───────────────────────┐
│     TxBuffer          │◄─────┤   Transaction          │
└───────────────────────┘      │      Iterators         │
                               └───────────────────────┘
  1. Transaction Interface: The public API for transaction operations
  2. EngineTransaction: Implementation of the Transaction interface
  3. TransactionCreator: Factory pattern for creating transactions
  4. TxBuffer: In-memory storage for uncommitted changes
  5. Transaction Iterators: Special iterators that merge buffer and database state

ACID Properties Implementation

Atomicity

Transactions ensure all-or-nothing semantics through several mechanisms:

  1. Write Buffering:

    • All writes are stored in an in-memory buffer during the transaction
    • No changes are applied to the database until commit
  2. Batch Commit:

    • At commit time, all changes are submitted as a single batch
    • The WAL (Write-Ahead Log) ensures the batch is atomic
  3. Rollback Support:

    • Discarding the buffer effectively rolls back all changes
    • No cleanup needed since changes weren't applied to the database

Consistency

The engine maintains data consistency through:

  1. Single-Writer Architecture:

    • Only one write transaction can be active at a time
    • Prevents inconsistent states from concurrent modifications
  2. Write-Ahead Logging:

    • All changes are logged before being applied
    • System can recover to a consistent state after crashes
  3. Key Ordering:

    • Keys are maintained in sorted order throughout the system
    • Ensures consistent iteration and range scan behavior

Isolation

The transaction system provides isolation using a simple but effective approach:

  1. Reader-Writer Locks:

    • Read-only transactions acquire shared (read) locks
    • Read-write transactions acquire exclusive (write) locks
    • Multiple readers can execute concurrently
    • Writers have exclusive access
  2. Read Snapshot Semantics:

    • Readers see a consistent snapshot of the database
    • New writes by other transactions aren't visible
  3. Isolation Level:

    • Effectively provides "serializable" isolation
    • Transactions execute as if they were run one after another

Durability

Durability is ensured through the WAL (Write-Ahead Log):

  1. WAL Integration:

    • Transaction commits are written to the WAL first
    • Only after WAL sync are changes considered committed
  2. Sync Options:

    • Transactions can use different WAL sync modes
    • Configurable trade-off between performance and durability

Implementation Details

Transaction Lifecycle

A transaction follows this lifecycle:

  1. Creation:

    • Read-only: Acquires a read lock
    • Read-write: Acquires a write lock (exclusive)
  2. Operation Phase:

    • Read operations check the buffer first, then the engine
    • Write operations are stored in the buffer only
  3. Commit:

    • Read-only: Simply releases the read lock
    • Read-write: Applies buffered changes via a WAL batch, then releases write lock
  4. Rollback:

    • Discards the buffer
    • Releases locks
    • Marks transaction as closed

Transaction Buffer

The transaction buffer is an in-memory staging area for changes:

  1. Buffering Mechanism:

    • Stores key-value pairs and deletion markers
    • Maintains sorted order for efficient iteration
    • Deduplicates repeated operations on the same key
  2. Precedence Rules:

    • Buffer operations take precedence over engine values
    • Latest operation on a key within the buffer wins
  3. Tombstone Handling:

    • Deletions are stored as tombstones in the buffer
    • Applied to the engine only on commit

Transaction Iterators

Specialized iterators provide a merged view of buffer and engine data:

  1. Merged View:

    • Combines data from both the transaction buffer and the underlying engine
    • Buffer entries take precedence over engine entries for the same key
  2. Range Iterators:

    • Support bounded iterations within a key range
    • Enforce bounds checking on both buffer and engine data
  3. Deletion Handling:

    • Skip tombstones during iteration
    • Hide engine keys that are deleted in the buffer

Concurrency Control

Reader-Writer Lock Model

The transaction system uses a simple reader-writer lock approach:

  1. Lock Acquisition:

    • Read-only transactions acquire shared (read) locks
    • Read-write transactions acquire exclusive (write) locks
  2. Concurrency Patterns:

    • Multiple read-only transactions can run concurrently
    • Read-write transactions run exclusively (no other transactions)
    • Writers block new readers, but don't interrupt existing ones
  3. Lock Management:

    • Locks are acquired at transaction start
    • Released at commit or rollback
    • Safety mechanisms prevent multiple releases

Isolation Level

The system provides serializable isolation:

  1. Serializable Semantics:

    • Transactions behave as if executed one after another
    • No anomalies like dirty reads, non-repeatable reads, or phantoms
  2. Implementation Strategy:

    • Simple locking approach
    • Write exclusivity ensures no write conflicts
    • Read snapshots provide consistent views
  3. Optimistic vs. Pessimistic:

    • Uses a pessimistic approach with up-front locking
    • Avoids need for validation or aborts due to conflicts

Common Usage Patterns

Basic Transaction Usage

// Start a read-write transaction
tx, err := engine.BeginTransaction(false) // false = read-write
if err != nil {
    log.Fatal(err)
}

// Perform operations
err = tx.Put([]byte("key1"), []byte("value1"))
if err != nil {
    tx.Rollback()
    log.Fatal(err)
}

value, err := tx.Get([]byte("key2"))
if err != nil && err != engine.ErrKeyNotFound {
    tx.Rollback()
    log.Fatal(err)
}

// Delete a key
err = tx.Delete([]byte("key3"))
if err != nil {
    tx.Rollback()
    log.Fatal(err)
}

// Commit the transaction
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}

Read-Only Transactions

// Start a read-only transaction
tx, err := engine.BeginTransaction(true) // true = read-only
if err != nil {
    log.Fatal(err)
}
defer tx.Rollback() // Safe to call even after commit

// Perform read operations
value, err := tx.Get([]byte("key1"))
if err != nil && err != engine.ErrKeyNotFound {
    log.Fatal(err)
}

// Iterate over a range of keys
iter := tx.NewRangeIterator([]byte("start"), []byte("end"))
for iter.SeekToFirst(); iter.Valid(); iter.Next() {
    fmt.Printf("%s: %s\n", iter.Key(), iter.Value())
}

// Commit (for read-only, this just releases resources)
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}

Batch Operations

// Start a read-write transaction
tx, err := engine.BeginTransaction(false)
if err != nil {
    log.Fatal(err)
}

// Perform multiple operations
for i := 0; i < 100; i++ {
    key := []byte(fmt.Sprintf("key%d", i))
    value := []byte(fmt.Sprintf("value%d", i))
    
    if err := tx.Put(key, value); err != nil {
        tx.Rollback()
        log.Fatal(err)
    }
}

// Commit as a single atomic batch
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}

Performance Considerations

Transaction Overhead

Transactions introduce some overhead compared to direct engine operations:

  1. Locking Overhead:

    • Acquiring and releasing locks has some cost
    • Write transactions block other transactions
  2. Memory Usage:

    • Transaction buffers consume memory
    • Large transactions with many changes need more memory
  3. Commit Cost:

    • WAL batch writes and syncs add latency at commit time
    • More changes in a transaction means higher commit cost

Optimization Strategies

Several strategies can improve transaction performance:

  1. Transaction Sizing:

    • Very large transactions increase memory pressure
    • Very small transactions have higher per-operation overhead
    • Find a balance based on your workload
  2. Read-Only Preference:

    • Use read-only transactions when possible
    • They allow concurrency and have lower overhead
  3. Batch Similar Operations:

    • Group similar operations in a transaction
    • Reduces overall transaction count
  4. Key Locality:

    • Group operations on related keys
    • Improves cache locality and iterator efficiency

Limitations and Trade-offs

Concurrency Model Limitations

The simple locking approach has some trade-offs:

  1. Writer Blocking:

    • Only one writer at a time limits write throughput
    • Long-running write transactions block other writers
  2. No Write Concurrency:

    • Unlike some databases, no support for row/key-level locking
    • Entire database is locked for writes
  3. No Deadlock Detection:

    • Simple model doesn't need deadlock detection
    • But also can't handle complex lock acquisition patterns

Error Handling

Transaction error handling requires some care:

  1. Commit Errors:

    • If commit fails, data is not persisted
    • Application must decide whether to retry or report error
  2. Rollback After Errors:

    • Always rollback after encountering errors
    • Prevents leaving locks held
  3. Resource Leaks:

    • Unclosed transactions can lead to lock leaks
    • Use defer for Rollback() to ensure cleanup

Advanced Concepts

Potential Future Enhancements

Several enhancements could improve the transaction system:

  1. Optimistic Concurrency:

    • Allow concurrent write transactions with validation at commit time
    • Could improve throughput for workloads with few conflicts
  2. Finer-Grained Locking:

    • Key-range locks or partitioned locks
    • Would allow more concurrency for non-overlapping operations
  3. Savepoints:

    • Partial rollback capability within transactions
    • Useful for complex operations with recovery points
  4. Nested Transactions:

    • Support for transactions within transactions
    • Would enable more complex application logic