Observability & Monitoring Stack

Full-stack observability from infrastructure metrics to application errors — built on the tools that engineering teams trust in production.

What This Stack Does

Observability isn't a single tool — it's three pillars (metrics, logs, traces) working together to answer the question 'what is happening in my system right now, and why?' This stack covers each pillar with the tools that have earned their position through years of production use. The open-source core keeps costs predictable, while Datadog is included as the managed alternative for teams that prefer operational simplicity over cost optimization.

Metrics Collection and Dashboard Visualization

Prometheus handles metrics — CPU usage, request latency, error rates, queue depths, custom application counters. Its pull-based architecture scrapes /metrics endpoints at configured intervals, storing time-series data that PromQL can query with precision. For Kubernetes environments, Prometheus is effectively required — it's the standard that the entire CNCF monitoring ecosystem is built around. Self-hosted Prometheus is free; add Grafana Mimir or Thanos for long-term storage and high availability.

Grafana is the visualization layer that ties everything together. It queries Prometheus for metrics, Elasticsearch for logs, and any of 150+ other data sources — displaying them in unified dashboards that show system health at a glance. Pre-built dashboard libraries for Kubernetes, databases, and popular services accelerate setup. Custom dashboards with template variables let you drill down from cluster overview to individual pod metrics. Grafana is free to self-host; Grafana Cloud offers a managed experience with a generous free tier.

Error Tracking and Log Aggregation

Sentry provides application-level error tracking and performance monitoring. When your code throws an exception, Sentry captures the full stack trace, request context, user information, and breadcrumbs showing the sequence of events that led to the error. Release tracking associates errors with specific deployments. The performance monitoring features show transaction traces with latency breakdowns per service. For developers debugging production issues, Sentry's context-rich error reports are dramatically more useful than grepping log files.

Elasticsearch serves as the log aggregation and search layer. Application logs, infrastructure logs, and audit logs are shipped to Elasticsearch via Filebeat, Fluentd, or Logstash, where they're indexed for fast full-text search. When Prometheus alerts fire or Sentry captures an error, Elasticsearch is where you search for the surrounding log context. Many teams use Grafana with the Elasticsearch data source for a unified dashboard experience rather than running Kibana separately.

The Bottom Line

Datadog is included as the all-in-one alternative that replaces the entire open-source stack with a single managed platform. If your team lacks the expertise or desire to operate Prometheus, Grafana, Elasticsearch, and Sentry separately, Datadog provides metrics, logs, traces, error tracking, and more in one SaaS product. The trade-off is cost versus simplicity — Datadog's pricing scales significantly with infrastructure size, but many organizations use a hybrid: Prometheus and Grafana for Kubernetes metrics, Datadog for application-level observability.

Tool	Role	Pricing	Open Source
Grafana	Dashboards & Unified Visualization	Self-hosted free under AGPL v3. Grafana Cloud free tier available. Cloud Pro from $19/mo + usage. Enterprise from a $25,000/year spend commit.	Yes
Prometheus	Metrics Collection & Alerting	Free and open source (Apache 2.0). No commercial version.	Yes
Sentry	Error Tracking & Performance	Developer free (5K errors/mo). Team $26/mo. Business $80/mo. Self-hosted free.	Yes
Elasticsearch	Log Aggregation & Search	Self-managed Basic is free; Elastic Cloud Hosted and Serverless offer free trials with usage/resource-based pricing.	Yes
Datadog	All-in-One Alternative	Free tier (5 hosts), Pro from $15/host/mo, Enterprise from $23/host/mo.	No

Observability & Monitoring Stack

What This Stack Does

Metrics Collection and Dashboard Visualization

Error Tracking and Log Aggregation

The Bottom Line

Stack Overview