kevo/docs/transaction.md
Jeremy Tregunna 0637c40a40
Some checks failed
Go Tests / Run Tests (1.24.2) (push) Failing after 5m4s
feat: big refactor cleaning up the engine code
2025-04-23 22:45:16 -06:00

391 lines
12 KiB
Markdown

# Transaction Package Documentation
The `transaction` package implements ACID-compliant transactions for the Kevo engine. It provides a way to group multiple read and write operations into atomic units, ensuring data consistency and isolation.
## Overview
Transactions in the Kevo engine follow a SQLite-inspired concurrency model using reader-writer locks. This approach provides a simple yet effective solution for concurrent access, allowing multiple simultaneous readers while ensuring exclusive write access.
Key responsibilities of the transaction package include:
- Implementing atomic operations (all-or-nothing semantics)
- Managing isolation between concurrent transactions
- Providing a consistent view of data during transactions
- Supporting both read-only and read-write transactions
- Handling transaction commit and rollback
## Architecture
### Key Components
The transaction system consists of several interrelated components:
```
┌───────────────────────┐
│ Transaction (API) │
└───────────┬───────────┘
┌───────────▼───────────┐ ┌───────────────────────┐
│ TransactionManager │◄─────┤ EngineFacade │
└───────────┬───────────┘ └───────────────────────┘
┌───────────▼───────────┐ ┌───────────────────────┐
│ EngineTransaction │◄─────┤ StorageManager │
└───────────┬───────────┘ └───────────────────────┘
┌───────────────────────┐ ┌───────────────────────┐
│ TxBuffer │◄─────┤ Transaction │
└───────────────────────┘ │ Iterators │
└───────────────────────┘
```
1. **Transaction Interface**: The public API for transaction operations
2. **TransactionManager**: Handles transaction creation and tracking
3. **EngineTransaction**: Implementation of the Transaction interface
4. **StorageManager**: Provides the underlying storage operations
5. **TxBuffer**: In-memory storage for uncommitted changes
6. **Transaction Iterators**: Special iterators that merge buffer and database state
## ACID Properties Implementation
### Atomicity
Transactions ensure all-or-nothing semantics through several mechanisms:
1. **Write Buffering**:
- All writes are stored in an in-memory buffer during the transaction
- No changes are applied to the database until commit
2. **Batch Commit**:
- At commit time, all changes are submitted as a single batch
- The WAL (Write-Ahead Log) ensures the batch is atomic
3. **Rollback Support**:
- Discarding the buffer effectively rolls back all changes
- No cleanup needed since changes weren't applied to the database
### Consistency
The engine maintains data consistency through:
1. **Single-Writer Architecture**:
- Only one write transaction can be active at a time
- Prevents inconsistent states from concurrent modifications
2. **Write-Ahead Logging**:
- All changes are logged before being applied
- System can recover to a consistent state after crashes
3. **Key Ordering**:
- Keys are maintained in sorted order throughout the system
- Ensures consistent iteration and range scan behavior
### Isolation
The transaction system provides isolation using a simple but effective approach:
1. **Reader-Writer Locks**:
- Read-only transactions acquire shared (read) locks
- Read-write transactions acquire exclusive (write) locks
- Multiple readers can execute concurrently
- Writers have exclusive access
2. **Read Snapshot Semantics**:
- Readers see a consistent snapshot of the database
- New writes by other transactions aren't visible
3. **Isolation Level**:
- Effectively provides "serializable" isolation
- Transactions execute as if they were run one after another
### Durability
Durability is ensured through the WAL (Write-Ahead Log):
1. **WAL Integration**:
- Transaction commits are written to the WAL first
- Only after WAL sync are changes considered committed
2. **Sync Options**:
- Transactions can use different WAL sync modes
- Configurable trade-off between performance and durability
## Implementation Details
### Transaction Lifecycle
A transaction follows this lifecycle:
1. **Creation**:
- Read-only: Acquires a read lock
- Read-write: Acquires a write lock (exclusive)
2. **Operation Phase**:
- Read operations check the buffer first, then the engine
- Write operations are stored in the buffer only
3. **Commit**:
- Read-only: Simply releases the read lock
- Read-write: Applies buffered changes via a WAL batch, then releases write lock
4. **Rollback**:
- Discards the buffer
- Releases locks
- Marks transaction as closed
### Transaction Buffer
The transaction buffer is an in-memory staging area for changes:
1. **Buffering Mechanism**:
- Stores key-value pairs and deletion markers
- Maintains sorted order for efficient iteration
- Deduplicates repeated operations on the same key
2. **Precedence Rules**:
- Buffer operations take precedence over engine values
- Latest operation on a key within the buffer wins
3. **Tombstone Handling**:
- Deletions are stored as tombstones in the buffer
- Applied to the engine only on commit
### Transaction Iterators
Specialized iterators provide a merged view of buffer and engine data:
1. **Merged View**:
- Combines data from both the transaction buffer and the underlying engine
- Buffer entries take precedence over engine entries for the same key
2. **Range Iterators**:
- Support bounded iterations within a key range
- Enforce bounds checking on both buffer and engine data
3. **Deletion Handling**:
- Skip tombstones during iteration
- Hide engine keys that are deleted in the buffer
## Concurrency Control
### Reader-Writer Lock Model
The transaction system uses a simple reader-writer lock approach:
1. **Lock Acquisition**:
- Read-only transactions acquire shared (read) locks
- Read-write transactions acquire exclusive (write) locks
2. **Concurrency Patterns**:
- Multiple read-only transactions can run concurrently
- Read-write transactions run exclusively (no other transactions)
- Writers block new readers, but don't interrupt existing ones
3. **Lock Management**:
- Locks are acquired at transaction start
- Released at commit or rollback
- Safety mechanisms prevent multiple releases
### Isolation Level
The system provides serializable isolation:
1. **Serializable Semantics**:
- Transactions behave as if executed one after another
- No anomalies like dirty reads, non-repeatable reads, or phantoms
2. **Implementation Strategy**:
- Simple locking approach
- Write exclusivity ensures no write conflicts
- Read snapshots provide consistent views
3. **Optimistic vs. Pessimistic**:
- Uses a pessimistic approach with up-front locking
- Avoids need for validation or aborts due to conflicts
## Common Usage Patterns
### Basic Transaction Usage
```go
// Start a read-write transaction
tx, err := engine.BeginTransaction(false) // false = read-write
if err != nil {
log.Fatal(err)
}
// Perform operations
err = tx.Put([]byte("key1"), []byte("value1"))
if err != nil {
tx.Rollback()
log.Fatal(err)
}
value, err := tx.Get([]byte("key2"))
if err != nil && err != engine.ErrKeyNotFound {
tx.Rollback()
log.Fatal(err)
}
// Delete a key
err = tx.Delete([]byte("key3"))
if err != nil {
tx.Rollback()
log.Fatal(err)
}
// Commit the transaction
if err := tx.Commit(); err != nil {
log.Fatal(err)
}
```
### Read-Only Transactions
```go
// Start a read-only transaction
tx, err := engine.BeginTransaction(true) // true = read-only
if err != nil {
log.Fatal(err)
}
defer tx.Rollback() // Safe to call even after commit
// Perform read operations
value, err := tx.Get([]byte("key1"))
if err != nil && err != engine.ErrKeyNotFound {
log.Fatal(err)
}
// Iterate over a range of keys
iter := tx.NewRangeIterator([]byte("start"), []byte("end"))
for iter.SeekToFirst(); iter.Valid(); iter.Next() {
fmt.Printf("%s: %s\n", iter.Key(), iter.Value())
}
// Commit (for read-only, this just releases resources)
if err := tx.Commit(); err != nil {
log.Fatal(err)
}
```
### Batch Operations
```go
// Start a read-write transaction
tx, err := engine.BeginTransaction(false)
if err != nil {
log.Fatal(err)
}
// Perform multiple operations
for i := 0; i < 100; i++ {
key := []byte(fmt.Sprintf("key%d", i))
value := []byte(fmt.Sprintf("value%d", i))
if err := tx.Put(key, value); err != nil {
tx.Rollback()
log.Fatal(err)
}
}
// Commit as a single atomic batch
if err := tx.Commit(); err != nil {
log.Fatal(err)
}
```
## Performance Considerations
### Transaction Overhead
Transactions introduce some overhead compared to direct engine operations:
1. **Locking Overhead**:
- Acquiring and releasing locks has some cost
- Write transactions block other transactions
2. **Memory Usage**:
- Transaction buffers consume memory
- Large transactions with many changes need more memory
3. **Commit Cost**:
- WAL batch writes and syncs add latency at commit time
- More changes in a transaction means higher commit cost
### Optimization Strategies
Several strategies can improve transaction performance:
1. **Transaction Sizing**:
- Very large transactions increase memory pressure
- Very small transactions have higher per-operation overhead
- Find a balance based on your workload
2. **Read-Only Preference**:
- Use read-only transactions when possible
- They allow concurrency and have lower overhead
3. **Batch Similar Operations**:
- Group similar operations in a transaction
- Reduces overall transaction count
4. **Key Locality**:
- Group operations on related keys
- Improves cache locality and iterator efficiency
## Limitations and Trade-offs
### Concurrency Model Limitations
The simple locking approach has some trade-offs:
1. **Writer Blocking**:
- Only one writer at a time limits write throughput
- Long-running write transactions block other writers
2. **No Write Concurrency**:
- Unlike some databases, no support for row/key-level locking
- Entire database is locked for writes
3. **No Deadlock Detection**:
- Simple model doesn't need deadlock detection
- But also can't handle complex lock acquisition patterns
### Error Handling
Transaction error handling requires some care:
1. **Commit Errors**:
- If commit fails, data is not persisted
- Application must decide whether to retry or report error
2. **Rollback After Errors**:
- Always rollback after encountering errors
- Prevents leaving locks held
3. **Resource Leaks**:
- Unclosed transactions can lead to lock leaks
- Use defer for Rollback() to ensure cleanup
## Advanced Concepts
### Potential Future Enhancements
Several enhancements could improve the transaction system:
1. **Optimistic Concurrency**:
- Allow concurrent write transactions with validation at commit time
- Could improve throughput for workloads with few conflicts
2. **Finer-Grained Locking**:
- Key-range locks or partitioned locks
- Would allow more concurrency for non-overlapping operations
3. **Savepoints**:
- Partial rollback capability within transactions
- Useful for complex operations with recovery points
4. **Nested Transactions**:
- Support for transactions within transactions
- Would enable more complex application logic