Distributed Systems Security Questions

Security considerations and patterns for distributed systems and multi service environments. Topics include service to service authentication and authorization, key management and secret rotation at scale, implications of eventual consistency for access control decisions, securing inter service communication, distributed logging and auditing, handling security during partial failures and partitioning, Byzantine fault tolerant scenarios and consensus impacts on security, tradeoffs between availability confidentiality and integrity across regions, and designing resilient defenses for systems spanning multiple data centers or organizational boundaries.

HardSystem Design

0 practiced

Design a certificate-pinning strategy for inter-service TLS to prevent fraudulent CA-issued certificates from being trusted. Requirements: allow occasional CA rotation without breaking services, provide rollout and emergency unpin mechanisms, and minimize operational risk of accidental outages. Describe pin distribution, verification algorithm, fallback, and testing approach.

MediumTechnical

0 practiced

A new malicious library was introduced to your build pipeline, potentially compromising build artifacts that were promoted to production. As the SRE responsible for platform security, outline immediate containment (stop pipelines, isolate images), forensic measures (reproducible builds, binary provenance), and longer-term supply-chain defenses: artifact signing, reproducible builds, provenance attestations (e.g., SLSA), and least-privilege build agents.

MediumTechnical

0 practiced

Implement a thread-safe sliding-window rate limiter in Go to protect login endpoints from brute-force attacks. The limiter should expose:

type Limiter struct { /* ... */ }
func NewLimiter(max int, window time.Duration) *Limiter
func (l *Limiter) Allow(key string) bool

Requirements: per-key limits, reasonable memory usage, concurrency-safety for many goroutines. Focus on algorithm (bucketized counters or sliding logs) and correctness under concurrent access.

HardTechnical

0 practiced

An employee with privileged access exfiltrates service account keys that allow access to several production services. As the SRE on-call lead, describe detection signals (abnormal API usage, unexpected IPs, mass listing operations), immediate remediation (revoke keys, rotate credentials, isolate systems), evidence gathering for legal teams, and long-term mitigations (just-in-time access, stronger attestation, split keys, and session-based ephemeral credentials).

HardSystem Design

0 practiced

Design a zero-trust architecture that spans multiple organizations and clusters: 4 Kubernetes clusters, 3 cloud providers, and several third-party SaaS integrations. Requirements: per-service identity, cross-org trust federation, least-privilege access, full auditability, and the ability to revoke cross-org access within 30 seconds. Describe trust model (federated vs centralized CA), certificate issuance and attestation flows, policy enforcement points, and the SRE operational model for rotation, emergency revocation, and audits.

Unlock Full Question Bank

Get access to hundreds of Distributed Systems Security interview questions and detailed answers.

Join thousands of developers preparing for their dream job.