jer/kevo

Jeremy Tregunna 60d401a615 docs: update documentation with information about replication

2025-04-29 15:03:03 -06:00

13 KiB

Raw Blame History

Kevo Client SDK Development Guide

This document provides technical guidance for developing client SDKs for Kevo in various programming languages. It focuses on the gRPC API, communication patterns, and best practices.

gRPC API Overview

Kevo exposes its functionality through a gRPC service defined in proto/kevo/service.proto. The service provides operations for:

Key-Value Operations - Basic get, put, and delete operations
Batch Operations - Atomic multi-key operations
Iterator Operations - Range scans and prefix scans
Transaction Operations - Support for ACID transactions
Administrative Operations - Statistics and compaction
Replication Operations - Node role discovery and topology information

Service Definition

The main service is KevoService, which contains the following RPC methods:

Key-Value Operations

Get(GetRequest) returns (GetResponse): Retrieves a value by key
Put(PutRequest) returns (PutResponse): Stores a key-value pair
Delete(DeleteRequest) returns (DeleteResponse): Removes a key-value pair

Batch Operations

BatchWrite(BatchWriteRequest) returns (BatchWriteResponse): Performs multiple operations atomically

Iterator Operations

Scan(ScanRequest) returns (stream ScanResponse): Streams key-value pairs in a range

Transaction Operations

BeginTransaction(BeginTransactionRequest) returns (BeginTransactionResponse): Starts a new transaction
CommitTransaction(CommitTransactionRequest) returns (CommitTransactionResponse): Commits a transaction
RollbackTransaction(RollbackTransactionRequest) returns (RollbackTransactionResponse): Aborts a transaction
TxGet(TxGetRequest) returns (TxGetResponse): Get operation in a transaction
TxPut(TxPutRequest) returns (TxPutResponse): Put operation in a transaction
TxDelete(TxDeleteRequest) returns (TxDeleteResponse): Delete operation in a transaction
TxScan(TxScanRequest) returns (stream TxScanResponse): Scan operation in a transaction

Administrative Operations

GetStats(GetStatsRequest) returns (GetStatsResponse): Retrieves database statistics
Compact(CompactRequest) returns (CompactResponse): Triggers compaction

Replication Operations

GetNodeInfo(GetNodeInfoRequest) returns (GetNodeInfoResponse): Retrieves information about the node's role and replication topology

Implementation Considerations

When implementing a client SDK, consider the following aspects:

Connection Management

Establish Connection: Create and maintain gRPC connection to the server
Connection Pooling: Implement connection pooling for performance (if the language/platform supports it)
Timeout Handling: Set appropriate timeouts for connection establishment and requests
TLS Support: Support secure communications with TLS
Replication Awareness: Discover node roles and maintain appropriate connections

// Connection options example
options = {
  endpoint: "localhost:50051",
  connectTimeout: 5000,  // milliseconds
  requestTimeout: 10000, // milliseconds
  poolSize: 5,           // number of connections
  tlsEnabled: false,
  certPath: "/path/to/cert.pem",
  keyPath: "/path/to/key.pem",
  caPath: "/path/to/ca.pem",
  
  // Replication options
  discoverTopology: true, // automatically discover node role and topology
  autoRouteWrites: true,  // automatically route writes to primary
  autoRouteReads: true    // route reads to replicas when possible
}

Basic Operations

Implement clean, idiomatic methods for basic operations:

// Example operations (in pseudo-code)
client.get(key) -> [value, found]
client.put(key, value, sync) -> success
client.delete(key, sync) -> success

// With proper error handling
try {
  value, found = client.get(key)
} catch (Exception e) {
  // Handle errors
}

Batch Operations

Batch operations should be atomic from the client perspective:

// Example batch write
operations = [
  { type: "put", key: key1, value: value1 },
  { type: "put", key: key2, value: value2 },
  { type: "delete", key: key3 }
]

success = client.batchWrite(operations, sync)

Streaming Operations

For scan operations, implement both streaming and iterator patterns based on language idioms:

// Streaming example
client.scan(prefix, startKey, endKey, limit, function(key, value) {
  // Process each key-value pair
})

// Iterator example
iterator = client.scan(prefix, startKey, endKey, limit)
while (iterator.hasNext()) {
  [key, value] = iterator.next()
  // Process each key-value pair
}
iterator.close()

Transaction Support

Provide a transaction API with proper resource management:

// Transaction example
tx = client.beginTransaction(readOnly)
try {
  val = tx.get(key)
  tx.put(key2, value2)
  tx.commit()
} catch (Exception e) {
  tx.rollback()
  throw e
}

Consider implementing a transaction callback pattern for better resource management (if the language supports it):

// Transaction callback pattern
client.transaction(function(tx) {
  // Operations inside transaction
  val = tx.get(key)
  tx.put(key2, value2)
  // Auto-commit if no exceptions
})

Error Handling and Retries

Error Categories: Map gRPC error codes to meaningful client-side errors
Retry Policy: Implement exponential backoff with jitter for transient errors
Error Context: Provide detailed error information

// Retry policy example
retryPolicy = {
  maxRetries: 3,
  initialBackoffMs: 100,
  maxBackoffMs: 2000,
  backoffFactor: 1.5,
  jitter: 0.2
}

Performance Considerations

Message Size Limits: Handle large messages appropriately
Stream Management: Properly handle long-running streams

// Performance options example
options = {
  maxMessageSize: 16 * 1024 * 1024  // 16MB
}

Key Implementation Areas

Key and Value Types

All keys and values are represented as binary data (bytes in protobuf). Your SDK should handle conversions between language-specific types and byte arrays.

The `sync` Parameter

In operations that modify data (Put, Delete, BatchWrite), the sync parameter determines whether the operation waits for data to be durably persisted before returning. This is a critical parameter for balancing performance vs. durability.

Transaction IDs

Transaction IDs are strings generated by the server on transaction creation. Clients must store and pass these IDs for all operations within a transaction.

Scan Operation Parameters

prefix: Optional prefix to filter keys (when provided, start_key/end_key are ignored)
start_key: Start of the key range (inclusive)
end_key: End of the key range (exclusive)
limit: Maximum number of results to return

Node Role and Replication Support

When implementing an SDK for a Kevo cluster with replication, your client should:

Discover Node Role: On connection, query the server for node role information
Connection Management: Maintain appropriate connections based on node role:
- When connected to a primary, optionally connect to available replicas for reads
- When connected to a replica, connect to the primary for writes
Operation Routing: Direct operations to the appropriate node:
- Read operations: Can be directed to replicas when available
- Write operations: Must be directed to the primary
Connection Recovery: Handle connection failures with automatic reconnection

Node Role Discovery

// Get node information on connection
nodeInfo = client.getNodeInfo()

// Check node role
if (nodeInfo.role == "primary") {
  // Connected to primary
  // Optionally connect to replicas for read distribution
  for (replica in nodeInfo.replicas) {
    if (replica.available) {
      connectToReplica(replica.address)
    }
  }
} else if (nodeInfo.role == "replica") {
  // Connected to replica
  // Connect to primary for writes
  connectToPrimary(nodeInfo.primaryAddress)
}

Operation Routing

// Get operation
function get(key) {
  if (nodeInfo.role == "primary" && hasReplicaConnections()) {
    // Try to read from replica
    try {
      return readFromReplica(key)
    } catch (error) {
      // Fall back to primary if replica read fails
      return readFromPrimary(key)
    }
  } else {
    // Read from current connection
    return readFromCurrent(key)
  }
}

// Put operation
function put(key, value) {
  if (nodeInfo.role == "replica" && hasPrimaryConnection()) {
    // Route write to primary
    return writeToPrimary(key, value)
  } else {
    // Write to current connection
    return writeToCurrent(key, value)
  }
}

Common Pitfalls

Stream Resource Leaks: Always close streams properly
Transaction Resource Leaks: Always commit or rollback transactions
Large Result Sets: Implement proper pagination or streaming for large scans
Connection Management: Properly handle connection failures and reconnection
Timeout Handling: Set appropriate timeouts for different operations
Role Discovery: Discover node role at connection time and after reconnections
Write Routing: Always route writes to the primary node
Read-after-Write: Be aware of potential replica lag in read-after-write scenarios

Example Usage Patterns

To ensure a consistent experience across different language SDKs, consider implementing these common usage patterns:

Simple Usage

// Create client
client = new KevoClient("localhost:50051")

// Connect
client.connect()

// Key-value operations
client.put("key", "value")
value = client.get("key")
client.delete("key")

// Close connection
client.close()

Advanced Usage with Options

// Create client with options
options = {
  endpoint: "kevo-server:50051",
  connectTimeout: 5000,
  requestTimeout: 10000,
  tlsEnabled: true,
  certPath: "/path/to/cert.pem",
  // ... more options
}
client = new KevoClient(options)

// Connect with context
client.connect(context)

// Batch operations
operations = [
  { type: "put", key: "key1", value: "value1" },
  { type: "put", key: "key2", value: "value2" },
  { type: "delete", key: "key3" }
]
client.batchWrite(operations, true)  // sync=true

// Transaction
client.transaction(function(tx) {
  value = tx.get("key1")
  tx.put("key2", "updated-value")
  tx.delete("key3")
})

// Iterator
iterator = client.scan({ prefix: "user:" })
while (iterator.hasNext()) {
  [key, value] = iterator.next()
  // Process each key-value pair
}
iterator.close()

// Close connection
client.close()

Replication Usage

// Create client with replication options
options = {
  endpoint: "kevo-replica:50051",  // Connect to any node (primary or replica)
  discoverTopology: true,          // Automatically discover node role
  autoRouteWrites: true,           // Route writes to primary
  autoRouteReads: true             // Distribute reads to replicas when possible
}
client = new KevoClient(options)

// Connect and discover topology
client.connect()

// Get node role information
nodeInfo = client.getNodeInfo()
console.log("Connected to " + nodeInfo.role + " node")

if (nodeInfo.role == "primary") {
  console.log("This node has " + nodeInfo.replicas.length + " replicas")
} else if (nodeInfo.role == "replica") {
  console.log("Primary node is at " + nodeInfo.primaryAddr)
}

// Operations automatically routed to appropriate nodes
client.put("key1", "value1")    // Routed to primary
value = client.get("key1")      // May be routed to a replica if available

// Different routing behavior can be explicitly set
value = client.get("key2", { preferReplica: false })  // Force primary read

// Manual routing for advanced use cases
client.withPrimary(function(primary) {
  // These operations are executed directly on the primary
  primary.get("key3")
  primary.put("key4", "value4")
})

// Close all connections
client.close()

Testing Your SDK

When testing your SDK implementation, consider these scenarios:

Basic Operations: Simple get, put, delete operations
Concurrency: Multiple concurrent operations
Error Handling: Server errors, timeouts, network issues
Connection Management: Reconnection after server restart
Large Data: Large keys and values, many operations
Transactions: ACID properties, concurrent transactions
Performance: Throughput, latency, resource usage
Replication:
- Node role discovery
- Write redirection from replica to primary
- Read distribution to replicas
- Connection handling when nodes are unavailable
- Read-after-write scenarios with potential replica lag

Conclusion

When implementing a Kevo client SDK, focus on providing an idiomatic experience for the target language while correctly handling the underlying gRPC communication details. The goal is to make the client API intuitive for developers familiar with the language, while ensuring correct and efficient interaction with the Kevo server.

13 KiB Raw Blame History

Kevo Client SDK Development Guide

gRPC API Overview

Service Definition

Key-Value Operations

Batch Operations

Iterator Operations

Transaction Operations

Administrative Operations

Replication Operations

Implementation Considerations

Connection Management

Basic Operations

Batch Operations

Streaming Operations

Transaction Support

Error Handling and Retries

Performance Considerations

Key Implementation Areas

Key and Value Types

The sync Parameter

Transaction IDs

Scan Operation Parameters

Node Role and Replication Support

Node Role Discovery

Operation Routing

Common Pitfalls

Example Usage Patterns

Simple Usage

Advanced Usage with Options

Replication Usage

Testing Your SDK

Conclusion

13 KiB

Raw Blame History

The `sync` Parameter