Storage Services and Data Management Questions
Know primary storage options: Object Storage (S3, Azure Blob, GCS) - for unstructured data at scale, highly available, cost-effective. Block Storage (EBS, Azure Managed Disks) - for VM storage, IOPS/throughput optimized. Databases - Relational (RDS, Azure SQL, Cloud SQL) for structured data with relationships; NoSQL (DynamoDB, Cosmos DB, Firestore) for flexible schemas and scale. Understand access patterns, durability, and consistency models. Know when to use each storage type based on data characteristics and access patterns.
HardTechnical
0 practiced
Design a system to perform fast online joins between incoming request context and streaming features with strict tail latency requirements (99.9th percentile < 20ms). Decide where to store features (in-process cache, Redis/Memcached, DynamoDB), how to ensure freshness and consistency, and strategies to reduce tail latency including speculative reads, timeouts, batching, and connection pooling.
MediumSystem Design
0 practiced
Design a backup and restore plan for model artifacts and datasets for a company with ~1,000 production models and ~10 PB of training data. Cover object versioning, cross-region replication, retention policies, restore time objectives (RTO) and restore point objectives (RPO), cost controls, and how you would test periodic restores.
EasyTechnical
0 practiced
Describe block storage (e.g., EBS, Azure Managed Disks). What are the key performance characteristics (IOPS, throughput, latency) and typical ML use cases such as ephemeral scratch space for distributed training, database backing disks, or caching? Compare instance store vs attached block volumes and explain implications for fault tolerance and checkpointing.
HardTechnical
0 practiced
Write an optimized SQL query (pseudocode acceptable) for a data warehouse (e.g., BigQuery) that computes rolling 30-day aggregated features (sum, count, avg) per user from an events table partitioned by date. Explain how you minimize scanned data and cost using partition pruning, daily incremental pre-aggregation, clustering by user_id, and materialized views.
MediumSystem Design
0 practiced
Design the storage architecture for batch training pipelines that must handle 5 PB of raw image data stored in object storage and consumed by distributed training jobs. Include file format choices, partitioning strategies, consistent snapshotting for reproducibility, catalog/metadata design, and practical steps to optimize read throughput and cost for many concurrent training jobs.
Unlock Full Question Bank
Get access to hundreds of Storage Services and Data Management interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.