InterviewStack.io LogoInterviewStack.io

Storage Services and Data Management Questions

Know primary storage options: Object Storage (S3, Azure Blob, GCS) - for unstructured data at scale, highly available, cost-effective. Block Storage (EBS, Azure Managed Disks) - for VM storage, IOPS/throughput optimized. Databases - Relational (RDS, Azure SQL, Cloud SQL) for structured data with relationships; NoSQL (DynamoDB, Cosmos DB, Firestore) for flexible schemas and scale. Understand access patterns, durability, and consistency models. Know when to use each storage type based on data characteristics and access patterns.

EasyTechnical
57 practiced
Compare a data lake (S3/GCS) and a data warehouse (BigQuery/Redshift) for ML workloads. Discuss schema requirements, query latency, cost model (storage vs query), and typical roles (raw data landing, ETL, large-scale feature extraction). When should an ML team adopt a lakehouse (Delta/Iceberg) architecture instead of separate lake and warehouse?
EasyTechnical
69 practiced
Given a transactions table with schema (transaction_id PK, user_id INT, amount DECIMAL, occurred_at TIMESTAMP), write a PostgreSQL-compatible SQL query that computes each user's mean and standard deviation over the past 365 days and flags transactions that are > mean + 3 * stddev. Explain how you handle users with fewer than two transactions in the window.
HardTechnical
66 practiced
Compare how DynamoDB, Cloud Spanner, and Cassandra implement replication and consistency. For each system describe the underlying techniques (quorum reads/writes, Paxos/Raft, synchronized clocks/TrueTime, vector clocks), the guarantees they provide, and operational trade-offs relevant to ML serving and metadata stores.
HardTechnical
73 practiced
Design a system to perform fast online joins between incoming request context and streaming features with strict tail latency requirements (99.9th percentile < 20ms). Decide where to store features (in-process cache, Redis/Memcached, DynamoDB), how to ensure freshness and consistency, and strategies to reduce tail latency including speculative reads, timeouts, batching, and connection pooling.
MediumTechnical
62 practiced
You're using DynamoDB with a timestamp-based partition key and observe hot partitions during peak ingestion. Propose schema and access-pattern changes to eliminate hotspots while preserving query semantics (for example, ability to read the most recent N items per user). Provide concrete alternative key designs and explain trade-offs.

Unlock Full Question Bank

Get access to hundreds of Storage Services and Data Management interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.