InterviewStack.io LogoInterviewStack.io

Probabilistic Data Structures & Hashing Questions

Covers probabilistic data structures (e.g., Bloom filters, Count-Min Sketch, HyperLogLog, Cuckoo filters) and hashing techniques, including space-time trade-offs, false positive rates, distinct element estimation, and practical applications in databases, caches, streaming analytics, and network systems.

HardSystem Design
0 practiced
Design a hybrid cache invalidation and membership check system where Bloom filters reduce DB reads but the underlying cache supports deletions and TTLs. Describe how you will avoid stale-positive responses from Bloom filters (which might think a deleted key exists) and propose practical mechanisms such as counting filters, versioned filters/epochs, write-through invalidation, or TTL-aligned filter rotations. Analyze memory and latency trade-offs.
HardTechnical
0 practiced
A production RocksDB-backed key-value store uses a Bloom filter to avoid unnecessary SST file lookups. Observability shows the Bloom filter's false positive rate is an order of magnitude higher than expected. List a systematic debugging plan: metrics to collect (bit density, estimated n, k used, hash seeds), likely root causes (serialization/parameter mismatch, corruption, wrong k, unbounded growth of n), and immediate mitigations to reduce impact on latency and DB IO.
MediumTechnical
0 practiced
You manage a data warehouse storing daily partitions of user activity in BigQuery/Redshift. Propose a design to store and query approximate distinct counts using HyperLogLog per partition. Include example schema elements (how the sketch is stored), sample SQL pseudo-code to aggregate HLLs across dates, and an approach to support ad-hoc queries that require exact counts for specific segments.
MediumTechnical
0 practiced
If a Bloom filter uses double hashing (two independent hash functions) to generate k hash values instead of k independent hash functions, analyze how that affects the expected false positive probability. State assumptions and discuss under what circumstances double hashing is an acceptable approximation.
EasyTechnical
0 practiced
Define 'mergeability' for probabilistic sketches. For Bloom filters, Count-Min Sketch, HyperLogLog, and Cuckoo filters, state whether they are mergeable, describe how merging works where applicable, and list any constraints required to merge safely (e.g., same parameters, precision).

Unlock Full Question Bank

Get access to hundreds of Probabilistic Data Structures & Hashing interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.