InterviewStack.io LogoInterviewStack.io

Monitoring Tools and Observability Questions

Covers hands on familiarity with modern monitoring and observability platforms and the practices for instrumenting and operating production systems. Candidates should be able to describe one or more tools such as Prometheus, Grafana, Datadog, CloudWatch, and explain how to write queries, design dashboards, and configure alerts. Include understanding of metrics collection, time series databases, log aggregation, distributed tracing, and common query languages used by these platforms. Also cover integrating monitoring with incident management systems such as PagerDuty and Opsgenie, defining service level indicators and objectives, setting alerting thresholds to reduce noise, and using dashboards and alerts to troubleshoot performance and availability issues.

HardSystem Design
0 practiced
Design a resilient OpenTelemetry Collector deployment for edge services with intermittent connectivity. Include local buffering strategies, disk usage limits, batching, retry and back-off policies, telemetry prioritization for limited bandwidth, and considerations for delivery semantics (at-least-once vs best-effort).
MediumTechnical
0 practiced
Describe how to integrate Prometheus Alertmanager or Grafana alerts with PagerDuty and Slack for a multi-team organization. Cover authentication, deduplication/routing rules, escalation policies, and approaches to avoid paging the wrong on-call group during noisy events.
EasyTechnical
0 practiced
Explain metric cardinality, why it is harmful for Prometheus and other TSDBs, and provide three concrete strategies you would recommend to a client to control cardinality (with examples such as label whitelisting, relabeling, or bucketing). For each strategy state the trade-off.
MediumTechnical
0 practiced
Design a log retention and storage-tiering plan to meet a 90-day compliance requirement while minimizing cost. Include hot/warm/cold tiers, indexing vs raw retention, compression and parsing trade-offs, and example filters to significantly reduce ingested log volume without losing forensic capability.
HardTechnical
0 practiced
Design a monitoring test plan to validate new observability instrumentation before rolling to production. Include unit tests for metric emission, integration tests that validate traces and logs propagate, synthetic checks, CI gating criteria, and acceptance criteria for metrics, logs and traces.

Unlock Full Question Bank

Get access to hundreds of Monitoring Tools and Observability interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.