InterviewStack.io LogoInterviewStack.io

Technical Tools and Stack Proficiency Questions

Assessment of a candidates practical proficiency across the technology stack and tools relevant to their role. This includes the ability to list and explain hands on experience with programming languages, frameworks, libraries, cloud platforms, data and machine learning tooling, analytics and visualization tools, and design and prototyping software. Candidates should demonstrate depth not just familiarity by describing specific problems they solved with each tool, trade offs between alternatives, integration points, deployment and operational considerations, and examples of end to end workflows. The description covers developer and data scientist stacks such as Python and C plus plus, machine learning frameworks like TensorFlow and PyTorch, cloud providers such as Amazon Web Services, Google Cloud Platform and Microsoft Azure, as well as design tools and research tools such as Figma and Adobe Creative Suite. Interviewers may probe for evidence of hands on tasks, configuration and troubleshooting, performance or cost trade offs, versioning and collaboration practices, and how the candidate keeps skills current.

MediumTechnical
62 practiced
You have two large datasets in S3: orders (~10B rows) and users (~50M rows), but some user_ids are highly skewed. In PySpark, implement an efficient join strategy to avoid OOMs and excessive shuffle. Provide pseudocode showing repartitioning, broadcast thresholds, and a salting approach to handle skewed keys. Explain key Spark configurations you would tune and how you'd validate improvements.
MediumTechnical
62 practiced
Sketch a Terraform module structure to provision an S3 bucket with lifecycle rules, an EMR/Dataproc or Databricks cluster, and IAM roles/policies for a data pipeline. Explain remote state management (locking), how to handle secrets securely, and testing strategies for infrastructure changes (plan, staging apply, automated checks).
HardTechnical
55 practiced
A nightly Spark job on a 200-node EMR cluster processing 50 TB frequently hits OOMs and long GC pauses. Provide a step-by-step tuning plan: profiling approach (Spark UI, GC logs), tuning executor cores and memory, setting spark.sql.shuffle.partitions, memory fractions (storage vs execution), serialization (Kryo), dynamic allocation, shuffle/io configs, and JVM GC settings. Provide a sample spark-submit configuration and explain trade-offs.
HardTechnical
86 practiced
Design an end-to-end CDC pipeline delivering transactional updates from MySQL into an analytics store using Debezium + Kafka + a sink such as Snowflake or BigQuery. Ensure near real-time replication with minimal duplicates and explain handling of schema changes, ordering, multi-row transactions, idempotency at the sink, bootstrapping initial data, and reconciliation strategies.
MediumTechnical
48 practiced
Explain how to implement idempotent ETL tasks and safe retries in Airflow for a multi-step pipeline that writes to a data warehouse. Provide Python pseudocode or SQL examples showing upsert patterns, use of staging tables, run identifiers, and how you persist checkpoints so retries do not duplicate data or corrupt transactional boundaries.

Unlock Full Question Bank

Get access to hundreds of Technical Tools and Stack Proficiency interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.