Data Structure Selection and Trade Offs Questions

Skill in selecting appropriate data structures and algorithmic approaches for practical problems and performance constraints. Candidates should demonstrate how to choose between arrays lists maps sets trees heaps and specialized structures based on access patterns memory and CPU requirements and concurrency considerations. Coverage includes case based selection for domain specific systems such as games inventory or spatial indexing where structures like quadtrees or spatial hashing are appropriate, and language specific considerations such as value versus reference types or object pooling. Emphasis is on explaining rationale trade offs and expected performance implications in concrete scenarios.

HardSystem Design

0 practiced

Design a globally-distributed per-user rate limiter that enforces 100 requests per minute per user across multiple regions with eventual consistency tolerances. Discuss data structure choices for counters (strict counters, leaky bucket, token bucket, approximate counters), where to store state (Redis, in-process, database), and tradeoffs between accuracy, latency, and cost.

MediumTechnical

0 practiced

Implement the Misra-Gries (frequent items) streaming algorithm in Python. Function signature: def misra_gries(stream: Iterable[str], k: int) -> Dict[str,int]. The stream is too large to store; you may maintain up to k-1 counters. Return the candidate heavy hitters and their counters. Explain limitations of the output.

HardTechnical

0 practiced

A Java-based ETL microservice creates and discards millions of short-lived small objects per second. Explain tradeoffs between using object pooling and ephemeral allocation. Discuss how each approach affects garbage collection behavior, throughput, memory fragmentation, and code complexity. Provide practical guidelines for when to implement pooling.

HardTechnical

0 practiced

You need to build a low-latency similarity search service for 100 million 768-dimensional vectors. Compare LSH, IVF (inverted file), and HNSW index approaches in terms of memory cost, query latency, recall, update cost, and how they fit into an ETL pipeline that periodically rebuilds indexes. Which would you choose for 99th percentile latency under 20 ms and why?

HardTechnical

0 practiced

For a wide analytic table that receives frequent appends and occasional point updates, compare copy-on-write (COW) vs merge-on-read (MOR) architectures (e.g., COW Parquet vs MOR Delta/ Hudi). Discuss data structure and file layout implications, query latency for fresh data, and maintenance costs for compaction and vacuuming.

Unlock Full Question Bank

Get access to hundreds of Data Structure Selection and Trade Offs interview questions and detailed answers.

Join thousands of developers preparing for their dream job.