kevo/docs/transaction.md

# Transaction Package Documentation

The `transaction` package implements ACID-compliant transactions for the Kevo engine. It provides a way to group multiple read and write operations into atomic units, ensuring data consistency and isolation.

## Overview

Transactions in the Kevo engine follow a SQLite-inspired concurrency model using reader-writer locks. This approach provides a simple yet effective solution for concurrent access, allowing multiple simultaneous readers while ensuring exclusive write access.

Key responsibilities of the transaction package include:
- Implementing atomic operations (all-or-nothing semantics)
- Managing isolation between concurrent transactions
- Providing a consistent view of data during transactions
- Supporting both read-only and read-write transactions
- Handling transaction commit and rollback

## Architecture

### Key Components

The transaction system consists of several interrelated components:

```
┌───────────────────────┐
│   Transaction (API)   │
└───────────┬───────────┘
            │
┌───────────▼───────────┐      ┌───────────────────────┐
│  TransactionManager   │◄─────┤     EngineFacade      │
└───────────┬───────────┘      └───────────────────────┘
            │
            ▼
┌───────────▼───────────┐      ┌───────────────────────┐
│  EngineTransaction    │◄─────┤   StorageManager      │
└───────────┬───────────┘      └───────────────────────┘
            │
            ▼
┌───────────────────────┐      ┌───────────────────────┐
│     TxBuffer          │◄─────┤   Transaction          │
└───────────────────────┘      │      Iterators         │
                               └───────────────────────┘
```

1. **Transaction Interface**: The public API for transaction operations
2. **TransactionManager**: Handles transaction creation and tracking
3. **EngineTransaction**: Implementation of the Transaction interface
4. **StorageManager**: Provides the underlying storage operations
5. **TxBuffer**: In-memory storage for uncommitted changes
6. **Transaction Iterators**: Special iterators that merge buffer and database state

## ACID Properties Implementation

### Atomicity

Transactions ensure all-or-nothing semantics through several mechanisms:

1. **Write Buffering**:
   - All writes are stored in an in-memory buffer during the transaction
   - No changes are applied to the database until commit

2. **Batch Commit**:
   - At commit time, all changes are submitted as a single batch
   - The WAL (Write-Ahead Log) ensures the batch is atomic

3. **Rollback Support**:
   - Discarding the buffer effectively rolls back all changes
   - No cleanup needed since changes weren't applied to the database

### Consistency

The engine maintains data consistency through:

1. **Single-Writer Architecture**:
   - Only one write transaction can be active at a time
   - Prevents inconsistent states from concurrent modifications

2. **Write-Ahead Logging**:
   - All changes are logged before being applied
   - System can recover to a consistent state after crashes

3. **Key Ordering**:
   - Keys are maintained in sorted order throughout the system
   - Ensures consistent iteration and range scan behavior

### Isolation

The transaction system provides isolation using a simple but effective approach:

1. **Reader-Writer Locks**:
   - Read-only transactions acquire shared (read) locks
   - Read-write transactions acquire exclusive (write) locks
   - Multiple readers can execute concurrently
   - Writers have exclusive access

2. **Read Snapshot Semantics**:
   - Readers see a consistent snapshot of the database
   - New writes by other transactions aren't visible

3. **Isolation Level**:
   - Effectively provides "serializable" isolation
   - Transactions execute as if they were run one after another

### Durability

Durability is ensured through the WAL (Write-Ahead Log):

1. **WAL Integration**:
   - Transaction commits are written to the WAL first
   - Only after WAL sync are changes considered committed

2. **Sync Options**:
   - Transactions can use different WAL sync modes
   - Configurable trade-off between performance and durability

## Implementation Details

### Transaction Lifecycle

A transaction follows this lifecycle:

1. **Creation**:
   - Read-only: Acquires a read lock
   - Read-write: Acquires a write lock (exclusive)

2. **Operation Phase**:
   - Read operations check the buffer first, then the engine
   - Write operations are stored in the buffer only

3. **Commit**:
   - Read-only: Simply releases the read lock
   - Read-write: Applies buffered changes via a WAL batch, then releases write lock

4. **Rollback**:
   - Discards the buffer
   - Releases locks
   - Marks transaction as closed

### Transaction Buffer

The transaction buffer is an in-memory staging area for changes:

1. **Buffering Mechanism**:
   - Stores key-value pairs and deletion markers
   - Maintains sorted order for efficient iteration
   - Deduplicates repeated operations on the same key

2. **Precedence Rules**:
   - Buffer operations take precedence over engine values
   - Latest operation on a key within the buffer wins

3. **Tombstone Handling**:
   - Deletions are stored as tombstones in the buffer
   - Applied to the engine only on commit

### Transaction Iterators

Specialized iterators provide a merged view of buffer and engine data:

1. **Merged View**:
   - Combines data from both the transaction buffer and the underlying engine
   - Buffer entries take precedence over engine entries for the same key

2. **Range Iterators**:
   - Support bounded iterations within a key range
   - Enforce bounds checking on both buffer and engine data

3. **Deletion Handling**:
   - Skip tombstones during iteration
   - Hide engine keys that are deleted in the buffer

## Concurrency Control

### Reader-Writer Lock Model

The transaction system uses a simple reader-writer lock approach:

1. **Lock Acquisition**:
   - Read-only transactions acquire shared (read) locks
   - Read-write transactions acquire exclusive (write) locks

2. **Concurrency Patterns**:
   - Multiple read-only transactions can run concurrently
   - Read-write transactions run exclusively (no other transactions)
   - Writers block new readers, but don't interrupt existing ones

3. **Lock Management**:
   - Locks are acquired at transaction start
   - Released at commit or rollback
   - Safety mechanisms prevent multiple releases

### Isolation Level

The system provides serializable isolation:

1. **Serializable Semantics**:
   - Transactions behave as if executed one after another
   - No anomalies like dirty reads, non-repeatable reads, or phantoms

2. **Implementation Strategy**:
   - Simple locking approach
   - Write exclusivity ensures no write conflicts
   - Read snapshots provide consistent views

3. **Optimistic vs. Pessimistic**:
   - Uses a pessimistic approach with up-front locking
   - Avoids need for validation or aborts due to conflicts

## Common Usage Patterns

### Basic Transaction Usage

```go
// Start a read-write transaction
tx, err := engine.BeginTransaction(false) // false = read-write
if err != nil {
    log.Fatal(err)
}

// Perform operations
err = tx.Put([]byte("key1"), []byte("value1"))
if err != nil {
    tx.Rollback()
    log.Fatal(err)
}

value, err := tx.Get([]byte("key2"))
if err != nil && err != engine.ErrKeyNotFound {
    tx.Rollback()
    log.Fatal(err)
}

// Delete a key
err = tx.Delete([]byte("key3"))
if err != nil {
    tx.Rollback()
    log.Fatal(err)
}

// Commit the transaction
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}
```

### Read-Only Transactions

```go
// Start a read-only transaction
tx, err := engine.BeginTransaction(true) // true = read-only
if err != nil {
    log.Fatal(err)
}
defer tx.Rollback() // Safe to call even after commit

// Perform read operations
value, err := tx.Get([]byte("key1"))
if err != nil && err != engine.ErrKeyNotFound {
    log.Fatal(err)
}

// Iterate over a range of keys
iter := tx.NewRangeIterator([]byte("start"), []byte("end"))
for iter.SeekToFirst(); iter.Valid(); iter.Next() {
    fmt.Printf("%s: %s\n", iter.Key(), iter.Value())
}

// Commit (for read-only, this just releases resources)
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}
```

### Batch Operations

```go
// Start a read-write transaction
tx, err := engine.BeginTransaction(false)
if err != nil {
    log.Fatal(err)
}

// Perform multiple operations
for i := 0; i < 100; i++ {
    key := []byte(fmt.Sprintf("key%d", i))
    value := []byte(fmt.Sprintf("value%d", i))

    if err := tx.Put(key, value); err != nil {
        tx.Rollback()
        log.Fatal(err)
    }
}

// Commit as a single atomic batch
if err := tx.Commit(); err != nil {
    log.Fatal(err)
}
```

## Performance Considerations

### Transaction Overhead

Transactions introduce some overhead compared to direct engine operations:

1. **Locking Overhead**:
   - Acquiring and releasing locks has some cost
   - Write transactions block other transactions

2. **Memory Usage**:
   - Transaction buffers consume memory
   - Large transactions with many changes need more memory

3. **Commit Cost**:
   - WAL batch writes and syncs add latency at commit time
   - More changes in a transaction means higher commit cost

### Optimization Strategies

Several strategies can improve transaction performance:

1. **Transaction Sizing**:
   - Very large transactions increase memory pressure
   - Very small transactions have higher per-operation overhead
   - Find a balance based on your workload

2. **Read-Only Preference**:
   - Use read-only transactions when possible
   - They allow concurrency and have lower overhead

3. **Batch Similar Operations**:
   - Group similar operations in a transaction
   - Reduces overall transaction count

4. **Key Locality**:
   - Group operations on related keys
   - Improves cache locality and iterator efficiency

## Limitations and Trade-offs

### Concurrency Model Limitations

The simple locking approach has some trade-offs:

1. **Writer Blocking**:
   - Only one writer at a time limits write throughput
   - Long-running write transactions block other writers

2. **No Write Concurrency**:
   - Unlike some databases, no support for row/key-level locking
   - Entire database is locked for writes

3. **No Deadlock Detection**:
   - Simple model doesn't need deadlock detection
   - But also can't handle complex lock acquisition patterns

### Error Handling

Transaction error handling requires some care:

1. **Commit Errors**:
   - If commit fails, data is not persisted
   - Application must decide whether to retry or report error

2. **Rollback After Errors**:
   - Always rollback after encountering errors
   - Prevents leaving locks held

3. **Resource Leaks**:
   - Unclosed transactions can lead to lock leaks
   - Use defer for Rollback() to ensure cleanup

## Advanced Concepts

### Potential Future Enhancements

Several enhancements could improve the transaction system:

1. **Optimistic Concurrency**:
   - Allow concurrent write transactions with validation at commit time
   - Could improve throughput for workloads with few conflicts

2. **Finer-Grained Locking**:
   - Key-range locks or partitioned locks
   - Would allow more concurrency for non-overlapping operations

3. **Savepoints**:
   - Partial rollback capability within transactions
   - Useful for complex operations with recovery points

4. **Nested Transactions**:
   - Support for transactions within transactions
   - Would enable more complex application logic