InterviewStack.io LogoInterviewStack.io

Python Data Structures and Algorithms Questions

Core Python data structure and algorithm knowledge used for manipulating collections and solving common data processing problems. Candidates should know built in types such as lists, dictionaries, sets, and tuples and their performance characteristics; be able to implement and reason about searching, sorting, counting, deduplication, and frequency analysis tasks; and choose appropriate algorithms and data structures for time and space efficiency. Familiarity with Python standard library utilities such as collections.Counter, defaultdict, deque, and heapq is expected, as is writing Pythonic, clear code that handles edge cases. Questions may include algorithmic trade offs, complexity analysis, and applying these techniques to practical data manipulation problems where custom logic is required beyond what pandas or NumPy provide.

MediumTechnical
0 practiced
Implement a RunningMedian class in Python with methods add(num: float) and median() -> float using two heaps (a max-heap for the lower half and a min-heap for the upper half). Ensure correctness with duplicate values and explain the time complexity of add and median operations.
MediumTechnical
0 practiced
You're building a token-to-id vocabulary for a transformer model from terabytes of raw text that don't fit into memory. Describe and sketch Python code for a streaming, memory-efficient pipeline that: reads files line-by-line, tokenizes, counts frequencies incrementally, prunes rare tokens, and writes a stable vocabulary file. Discuss sharding, external counting, and parallelization approaches.
HardTechnical
0 practiced
Compare storage and access strategies for very large embedding tables (hundreds of millions of vectors) in Python-based systems: in-memory NumPy arrays, memory-mapped NumPy (memmap), on-disk key-value stores (LMDB/RocksDB), and approximate index libraries (FAISS/Annoy). For each, describe read latency, memory usage, update capabilities, and suitability for nearest-neighbor search vs training.
MediumTechnical
0 practiced
In a machine learning preprocessing pipeline you must represent sparse, high-dimensional feature vectors for millions of examples in Python. Compare representation options: dict mapping index->value, list-of-(index,value) tuples, and scipy.sparse CSR/CSC. Explain how to implement efficient dot product between two sparse vectors and strategies to batch-convert to dense for GPU training.
EasyTechnical
0 practiced
Explain how the heapq module implements a binary heap in Python. Give the time complexity of heappush, heappop, heapify, and demonstrate two ways to obtain the k largest elements from a list: using heapq.nlargest and by maintaining a fixed-size heap of size k. Discuss memory and time trade-offs for each approach.

Unlock Full Question Bank

Get access to hundreds of Python Data Structures and Algorithms interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.