InterviewStack.io LogoInterviewStack.io

Distributed Systems Security Questions

Security considerations and patterns for distributed systems and multi service environments. Topics include service to service authentication and authorization, key management and secret rotation at scale, implications of eventual consistency for access control decisions, securing inter service communication, distributed logging and auditing, handling security during partial failures and partitioning, Byzantine fault tolerant scenarios and consensus impacts on security, tradeoffs between availability confidentiality and integrity across regions, and designing resilient defenses for systems spanning multiple data centers or organizational boundaries.

HardTechnical
0 practiced
Compare opaque tokens that require introspection with self-contained JWTs in a global microservices environment that experiences intermittent network partitions. Discuss revocation complexity, cacheability, introspection latency, consistency of revocation decisions, and proposed architectures for both approaches. Provide guidelines for SREs on when to prefer opaque tokens vs JWTs based on trust boundaries and SLOs.
EasyTechnical
0 practiced
Scenario: A developer accidentally included production database credentials in a container image and it was deployed to production. As the on-call SRE, list your immediate containment steps, how you would rotate and replace credentials with zero or low downtime, what forensic evidence you would collect, and longer-term process and automation changes to prevent reoccurrence (e.g., scanning, secret detection, pipeline changes).
MediumTechnical
0 practiced
Compare CRLs and OCSP for certificate revocation at scale. Explain operational challenges for high-QPS services, how OCSP stapling and OCSP responders affect latency and revocation timeliness, the impact of caching on revocation windows, and SRE choices to balance revocation promptness and performance (e.g., short-lived certs, stapling, local responders).
EasyTechnical
0 practiced
Why are tamper-evident and append-only audit logs important for distributed systems? As an SRE, describe design patterns to make logs tamper-evident across services and regions (e.g., write-once storage, hash chains, Merkle trees, signed events), how to ensure ordering per entity, and operational controls to preserve logs during incidents.
MediumSystem Design
0 practiced
Design a distributed auditing pipeline that collects security events from 200 services across three regions at a sustained rate of 100k events/sec. Requirements: tamper-evident storage, per-entity ordered audit history, low-latency querying for investigations, 7-year retention, and resilience to partitions. Describe ingestion, transport, storage choices, cryptographic integrity controls (e.g., hash chains, Merkle trees), and how SREs operate restores and audit queries.

Unlock Full Question Bank

Get access to hundreds of Distributed Systems Security interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.