InterviewStack.io LogoInterviewStack.io

Python Data Structures and Algorithms Questions

Core Python data structure and algorithm knowledge used for manipulating collections and solving common data processing problems. Candidates should know built in types such as lists, dictionaries, sets, and tuples and their performance characteristics; be able to implement and reason about searching, sorting, counting, deduplication, and frequency analysis tasks; and choose appropriate algorithms and data structures for time and space efficiency. Familiarity with Python standard library utilities such as collections.Counter, defaultdict, deque, and heapq is expected, as is writing Pythonic, clear code that handles edge cases. Questions may include algorithmic trade offs, complexity analysis, and applying these techniques to practical data manipulation problems where custom logic is required beyond what pandas or NumPy provide.

MediumTechnical
36 practiced
Design a memory-efficient Python pipeline to deduplicate a 200GB CSV file on a key column using an 8GB RAM machine. The pipeline should read chunks, hash-partition rows to disk, deduplicate partitions in memory, and write final output. Provide pseudo-code and discuss how to choose partition count, handle failures, and rerun without reprocessing successfully completed partitions.
HardSystem Design
16 practiced
Design a distributed LRU caching strategy for a fleet of Python data-processing workers where each worker has a local in-memory cache and Redis is available as a shared store. Explain consistency models (eventual vs strong), eviction coordination, cache-aside vs write-through, warm-up/rehydration, and how to minimize cross-worker cache misses and network overhead.
HardTechnical
24 practiced
Given a Python ETL job with nested loops over lists of dictionaries that is CPU-bound, outline a practical profiling and optimization plan. Include how to use profilers (cProfile, line_profiler), algorithmic refactoring (using dict/set for O(1) lookups), replacing Python loops with built-ins, and when to use Cython or rewrite hotspots in C/NumPy. Provide code examples showing a nested-loop -> hash-join refactor.
EasyTechnical
19 practiced
In Python, implement a function flatten(list_of_lists: Iterable[Iterable[int]]) -> List[int] that flattens a two-level nested collection into a single list. The solution should handle empty sublists and very large inputs; additionally provide a generator-based lazy version that yields elements one-by-one. Explain time and space complexity for both approaches and when you'd prefer the generator in a data pipeline.
EasyTechnical
23 practiced
Using collections.Counter, implement top_n(items: Iterable[str], n: int) -> List[Tuple[str,int]] that returns the top-n frequent items. Explain how Counter.most_common handles ties and describe a deterministic tie-breaking strategy (e.g., secondary sort by item). Discuss performance implications for very large inputs.

Unlock Full Question Bank

Get access to hundreds of Python Data Structures and Algorithms interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.