Data Consistency and Distributed Transactions Questions

In depth focus on data consistency models and practical approaches to maintaining correctness across distributed components. Covers strong consistency models including linearizability and serializability, causal consistency, eventual consistency, and the implications of each for replication, latency, and user experience. Discusses CAP theorem implications for consistency choices, idempotency, exactly once and at least once semantics, concurrency control and isolation levels, handling race conditions and conflict resolution, and concrete patterns for coordinating updates across services such as two phase commit, three phase commit, and the saga pattern with compensating transactions. Also includes operational challenges like retries, timeouts, ordering, clocks and monotonic timestamps, trade offs between throughput and consistency, and when eventual consistency is acceptable versus when strong consistency is required for correctness (for example financial systems versus social feeds).

MediumTechnical

0 practiced

Write a Python function that merges two vector clocks and determines causality. API: def merge_and_compare(vc1: Dict[str,int], vc2: Dict[str,int]) -> Tuple[Dict[str,int], str]. Return merged vector clock and relation: 'vc1-before-vc2', 'vc2-before-vc1', or 'concurrent'. Assume node IDs are strings. Explain algorithmic complexity and how you would prune clocks at scale.

HardTechnical

0 practiced

Implement (or sketch) a simplified transactional outbox poller in Python pseudocode that: 1) runs as a separate process, 2) reads pending outbox rows atomically, 3) publishes messages to Kafka, and 4) marks rows as published in a durable way. Include error handling for partial publishes, retries, idempotency considerations, and how the poller coordinates with application transactions.

HardTechnical

0 practiced

You have a stateful stream job that consumes Kafka and writes transformed events to an OLTP database which does not support distributed transactions. Outline and sketch code/pseudocode for an approach that provides practical end-to-end exactly-once semantics (or the closest safe approximation). Options include transactional outbox, storing offsets alongside writes, idempotent upserts with unique dedup keys, or two-phase application logic. Explain recovery and GC of deduplication metadata.

HardSystem Design

0 practiced

Architect a cross-service transactional system for moving money between accounts managed by separate services. Requirements: atomic transfer semantics (debit+credit or compensated rollback), durable audit trail for regulation, and ability to reconcile and prove conservation of funds. You cannot use a single distributed database. Propose an architecture (saga, 2PC, or hybrid), specify protocol steps, failure handling, idempotency guarantees, and reconciliation processes.

EasyTechnical

0 practiced

Explain the difference between linearizability and serializability in distributed data systems. Provide two concise data-engineering examples showing when each model is required (for example, bank account updates needing linearizability vs. OLAP batch reads where serializability may suffice). Describe practical ways to implement each guarantee on replicated storage and the operational costs involved (latency, coordination, throughput).

Unlock Full Question Bank

Get access to hundreds of Data Consistency and Distributed Transactions interview questions and detailed answers.

Join thousands of developers preparing for their dream job.