InterviewStack.io LogoInterviewStack.io

Data Consistency and Recovery Questions

Covers the spectrum of data consistency models used in distributed systems and the operational practices for detecting and recovering from inconsistency. Topics include strong consistency guarantees provided by atomicity, consistency, isolation, and durability style transactions and synchronous replication, and weaker models such as eventual consistency and causal consistency along with their read guarantees like read your writes and monotonic reads. Explain the trade offs between consistency, availability, and latency and how those trade offs influence architecture decisions, user experience, and cost. Discuss replication strategies including synchronous replication, asynchronous replication, and read replicas, and how replication modes affect staleness and failure behavior. Include coordination and consensus mechanisms for achieving stronger guarantees, for example leader based replication and consensus protocols, and distributed transaction approaches such as two phase commit. Cover operational concerns: how consistency choices change testing, deployment, monitoring, and incident response. Describe detection and recovery techniques for inconsistency such as validation checks, reconciliation and anti entropy processes, tombstones and conflict resolution strategies, use of vector clocks or conflict free replicated data types to resolve concurrent updates, point in time recovery and backups, and procedures for partial repairs, rollbacks, and replays. At senior levels also address how consistency decisions shape runbooks, alerting, and post incident analysis.

HardTechnical
0 practiced
Explain the CAP theorem and its practical implications for SRE decisions. Give three concrete architecture choices where you would sacrifice consistency for availability or vice versa, and quantify the latency and cost impacts of each choice.
MediumTechnical
0 practiced
Implement a last-write-wins (LWW) register merge function in Python that accepts (value, timestamp, node_id) tuples and defines deterministic tie-breaking for equal timestamps. Describe how you would mitigate clock skew problems in production and what consequences they have for correctness.
MediumBehavioral
0 practiced
Behavioral: Tell me about a time you were oncall for a system that experienced data inconsistency. Use the STAR method to describe the situation, what you discovered, the actions you took to contain and fix the issue, how you prevented recurrence, and what you learned as an SRE.
MediumTechnical
0 practiced
You're asked to design an anti-entropy repair process for a distributed key-value store that runs repeatedly without impacting foreground traffic. Explain design choices: Merkle trees versus hashing ranges, granularity of work units, scheduling windows, backoff strategies, and how to surface progress to SRE dashboards.
HardTechnical
0 practiced
Discuss how encryption (at rest, in transit, and application-level end-to-end encryption) can complicate consistency checks and recovery processes. Provide concrete examples where encrypted payloads prevent Merkle-tree comparisons or make deduplication and tombstone processing harder, and propose mitigations.

Unlock Full Question Bank

Get access to hundreds of Data Consistency and Recovery interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.