InterviewStack.io LogoInterviewStack.io

Data Processing & Matrix Operations Questions

Covers data processing concepts (ETL/ELT, batch and streaming pipelines, data transformation, quality, and schema design) as well as matrix operations (linear algebra basics such as matrix multiplication, decompositions, eigenvalues, and singular value decomposition) that underpin analytics workloads and ML systems within data engineering & analytics infrastructure.

MediumTechnical
45 practiced
Design an approach to compress and quantize a large embedding table to reduce memory usage from 32-bit floats to 8-bit or 4-bit for inference with minimal accuracy loss. Discuss per-row vs per-tensor quantization, uniform vs learned codebook (k-means) methods, hardware considerations, and how to evaluate the trade-off.
EasyTechnical
41 practiced
Implement a Python function using NumPy that performs batched matrix multiplication for two inputs A and B with shapes (B, N, M) and (B, M, K). The function should: 1) accept B==1 broadcasting for either input, 2) avoid unnecessary copies, and 3) return result of shape (B, N, K). Show example input shapes and expected output shape.
HardTechnical
43 practiced
Discuss numerical issues when computing SVD for matrices with huge dynamic range or nearly repeated singular values. Compare algorithms like divide-and-conquer SVD and Jacobi SVD in terms of accuracy and speed. Explain implications of algorithm choice for downstream ML pipelines.
HardTechnical
49 practiced
You must reshard a massive embedding table across nodes to redistribute memory hotspots while keeping serving available. Provide a detailed zero-downtime resharding protocol: migration steps, routing/versioning, handling writes (dual-write or redirect), verification, and rollback procedures.
MediumTechnical
47 practiced
Your training loss decreases on a development set but validation loss increases after a change in preprocessing shuffled ordering. Outline a debugging plan to determine whether the issue is due to data leakage, label mismatch, preprocessing nondeterminism, or a bug in data splits. Include quick checks and deeper invariants to validate.

Unlock Full Question Bank

Get access to hundreds of Data Processing & Matrix Operations interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.