Data-Centric Algorithmic Problem Solving Questions
Foundational algorithm design and data-structure concepts with an emphasis on data-centric problem solving. Covers algorithmic paradigms (e.g., greedy, dynamic programming, divide-and-conquer, graph algorithms), data structures, complexity analysis, and practical approaches to solving computational problems using data.
MediumTechnical
38 practiced
An operational dashboard queries a fact table with 500M rows and is taking multiple seconds to render. Describe a step-by-step approach to profile, diagnose, and optimize the query and dashboard. Discuss use of EXPLAIN, index design, schema changes, materialized views, pre-aggregation, caching, and monitoring.
MediumTechnical
44 practiced
Table: user_events(event_id PK, user_id INT, event_type TEXT, event_ts TIMESTAMP). Write an ANSI SQL query to deduplicate by (user_id, event_type) keeping only the row with the latest event_ts per pair. Show the SELECT approach (no destructive deletes) and mention how you'd perform deletion safely in production.
MediumTechnical
38 practiced
A product analytics dashboard needs near-real-time metrics for user activity and accurate daily aggregates for finance. Design an ETL/data pipeline: choose between batch, micro-batch, and streaming, and justify trade-offs in latency, cost, complexity, and consistency. Propose a hybrid approach if appropriate.
HardTechnical
38 practiced
You have tables:users(id PK, created_at TIMESTAMP, country_id),orders(id PK, user_id FK, total_amount, created_at TIMESTAMP),countries(id PK, name).A report runs slowly: SELECT c.name, COUNT(o.id) FROM countries c JOIN users u ON u.country_id=c.id JOIN orders o ON o.user_id=u.id WHERE o.created_at >= '2025-01-01' GROUP BY c.name;EXPLAIN shows a sequential scan on orders (500M rows) followed by nested-loop joins. Propose query rewrites, indexes, partitioning, or pre-aggregation strategies to improve latency and discuss trade-offs.
HardSystem Design
44 practiced
Design a system that maintains near-real-time daily active user counts and per-country counters for a global product with 200M users and peak ingestion of 50k events/sec. Requirements: <5s latency for updates, at-least-once ingestion, queryable latest counters for dashboards, horizontal scalability, and fault tolerance. Describe components, state management, consistency, and recovery.
Unlock Full Question Bank
Get access to hundreds of Data-Centric Algorithmic Problem Solving interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.