What This Stack Does
Observability isn't a single tool — it's three pillars (metrics, logs, traces) working together to answer the question 'what is happening in my system right now, and why?' This stack covers each pillar with the tools that have earned their position through years of production use. The open-source core keeps costs predictable, while Datadog is included as the managed alternative for teams that prefer operational simplicity over cost optimization.
Metrics Collection and Dashboard Visualization
Prometheus handles metrics — CPU usage, request latency, error rates, queue depths, custom application counters. Its pull-based architecture scrapes /metrics endpoints at configured intervals, storing time-series data that PromQL can query with precision. For Kubernetes environments, Prometheus is effectively required — it's the standard that the entire CNCF monitoring ecosystem is built around. Self-hosted Prometheus is free; add Grafana Mimir or Thanos for long-term storage and high availability.
Grafana is the visualization layer that ties everything together. It queries Prometheus for metrics, Elasticsearch for logs, and any of 150+ other data sources — displaying them in unified dashboards that show system health at a glance. Pre-built dashboard libraries for Kubernetes, databases, and popular services accelerate setup. Custom dashboards with template variables let you drill down from cluster overview to individual pod metrics. Grafana is free to self-host; Grafana Cloud offers a managed experience with a generous free tier.
Error Tracking and Log Aggregation
Sentry provides application-level error tracking and performance monitoring. When your code throws an exception, Sentry captures the full stack trace, request context, user information, and breadcrumbs showing the sequence of events that led to the error. Release tracking associates errors with specific deployments. The performance monitoring features show transaction traces with latency breakdowns per service. For developers debugging production issues, Sentry's context-rich error reports are dramatically more useful than grepping log files.
Elasticsearch serves as the log aggregation and search layer. Application logs, infrastructure logs, and audit logs are shipped to Elasticsearch via Filebeat, Fluentd, or Logstash, where they're indexed for fast full-text search. When Prometheus alerts fire or Sentry captures an error, Elasticsearch is where you search for the surrounding log context. Many teams use Grafana with the Elasticsearch data source for a unified dashboard experience rather than running Kibana separately.
The Bottom Line
Datadog is included as the all-in-one alternative that replaces the entire open-source stack with a single managed platform. If your team lacks the expertise or desire to operate Prometheus, Grafana, Elasticsearch, and Sentry separately, Datadog provides metrics, logs, traces, error tracking, and more in one SaaS product. The trade-off is cost versus simplicity — Datadog's pricing scales significantly with infrastructure size, but many organizations use a hybrid: Prometheus and Grafana for Kubernetes metrics, Datadog for application-level observability.