InterviewStack.io LogoInterviewStack.io

Data Consistency and Distributed Transactions Questions

In depth focus on data consistency models and practical approaches to maintaining correctness across distributed components. Covers strong consistency models including linearizability and serializability, causal consistency, eventual consistency, and the implications of each for replication, latency, and user experience. Discusses CAP theorem implications for consistency choices, idempotency, exactly once and at least once semantics, concurrency control and isolation levels, handling race conditions and conflict resolution, and concrete patterns for coordinating updates across services such as two phase commit, three phase commit, and the saga pattern with compensating transactions. Also includes operational challenges like retries, timeouts, ordering, clocks and monotonic timestamps, trade offs between throughput and consistency, and when eventual consistency is acceptable versus when strong consistency is required for correctness (for example financial systems versus social feeds).

MediumTechnical
0 practiced
Compare Apache Flink and Spark Structured Streaming in how they implement exactly-once processing guarantees. Discuss checkpointing mechanisms, state snapshots, fault tolerance, sink semantics (idempotent sinks vs two-phase-commit), and how each framework's design affects latency, state size, and operational complexity for production data pipelines.
MediumTechnical
0 practiced
In a microservices architecture Service A writes an order and emits an event; Service B consumes the event and decrements inventory. Users report occasional oversells. Describe likely root causes (ordering, retries, race conditions, duplicate events) and propose a design to eliminate oversells without global locking. Include patterns such as reservations, idempotent consumers, outbox/CDC, and saga/compensation strategies.
HardTechnical
0 practiced
Explain Hybrid Logical Clocks (HLC): how they combine wall-clock time with logical counters to ensure monotonic timestamps in geo-distributed systems. Provide a concrete example of using HLC to timestamp events to achieve causal ordering in an event store, and discuss how HLC handles clock skew and ordering guarantees compared to pure logical or pure physical clocks.
HardSystem Design
0 practiced
Design a global ID generation scheme that provides collision-free IDs across regions with approximate monotonicity per region and no central coordinator. Discuss design options such as Snowflake-like identifiers (timestamp+node+sequence), HLC-based IDs, and trade-offs for clock skew, throughput, sortability in downstream systems, and reclaiming or rollover strategies.
MediumTechnical
0 practiced
As a senior data engineer, you must lead the team decision between adopting sagas or distributed transactions (e.g., 2PC) for cross-service updates. Describe the evaluation criteria you would use (SLOs, failure modes, operational complexity, throughput, recovery time), experiments or proof-of-concepts you'd run, and how you'd present a recommendation and migration plan to engineering and product stakeholders.

Unlock Full Question Bank

Get access to hundreds of Data Consistency and Distributed Transactions interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.