20 tools tagged
Showing 20 of 20 tools
Developer-focused APM for Ruby, Elixir, Node.js, and Python
AppSignal is an application performance monitoring platform for Ruby, Elixir, Node.js, Python, and frontend JavaScript. It combines error tracking, performance monitoring, host metrics, anomaly detection, and uptime checks in a single dashboard with per-request pricing. The Netherlands-based company serves over 10,000 developers with an interface designed for clarity over enterprise complexity.
Autonomous Kubernetes and GPU infrastructure optimization
ScaleOps provides autonomous real-time management of Kubernetes and GPU infrastructure, reducing cloud costs by up to 80 percent without manual configuration. Backed by 130 million in Series C funding at an 800 million dollar valuation, it serves enterprises including Adobe, Wiz, DocuSign, and Salesforce. The platform continuously rightsizes pods, optimizes replicas, manages nodes, and allocates GPUs based on live workload demand rather than static configurations.
All-in-one open-source observability — logs, metrics, traces, RUM
OpenObserve is an open-source observability platform that unifies logs, metrics, traces, and real user monitoring in a single binary. It claims 140x lower storage costs than Elasticsearch through columnar storage and compression, with native OpenTelemetry support, a built-in query UI, dashboards, and alerts. Designed for AI and cloud-native workloads at petabyte scale. Over 15,000 GitHub stars.
Observability data accessible to AI agents via MCP
Netdata's MCP integration exposes infrastructure monitoring, discovery, and root-cause analysis capabilities to AI agents. Built into the 78K+ star Netdata monitoring platform, it lets agents query real-time metrics, explore system health, investigate incidents, and generate observability reports through the Model Context Protocol.
AI-driven log analysis with zero false positives
Dash0 is an AI-driven observability platform focused on log analysis that auto-structures unstructured logs, provides instant alerting with zero false positives, and delivers full-stack tracing capabilities. It uses AI to transform raw log data into structured, searchable events without requiring manual parsing configuration, making log-based debugging significantly faster for engineering teams.
AI-powered incident management in Slack and Teams
Rootly is an AI-native incident management platform that runs entirely within Slack and Microsoft Teams, automating incident workflows from detection through postmortem. It reduces manual incident overhead with AI-generated summaries, automated role assignments, escalation paths, and postmortem drafts, holding SOC 2 Type II, GDPR, and HIPAA compliance certifications for enterprise use.
Kubernetes troubleshooting with event context
Komodor is a Kubernetes troubleshooting platform that extracts event and change context from clusters, correlating deployments, config changes, and infrastructure events to quickly identify the root cause of pod failures. Its Slack integration delivers incident context directly into team channels, helping SRE and platform teams reduce mean time to resolution by connecting the dots between what changed and what broke.
Evaluation-first LLM and agent observability
Confident AI is an evaluation-first observability platform that scores every trace and span with 50+ metrics, alerting on quality drops in LLM and agent applications. It goes beyond traditional APM by treating evaluation as core observability, providing actionable insights that help teams understand not just whether their AI applications are running but whether they are producing correct and useful outputs.
Preemptive observability in the IDE
Digma provides preemptive observability by catching runtime issues and performance bottlenecks directly in the IDE before code is committed. It shifts observability left by helping developers understand how their code will behave in production using real telemetry data, surfacing slow queries, N+1 problems, and error-prone code paths during development rather than after deployment.
AI observability with security posture management
Coralogix uses AI to provide actionable insights across logs and traces with a dedicated AI-SPM dashboard for tracking prompt injections and data leaks in AI applications. Its pay-per-use model with no upfront fees integrates security posture management directly into the observability stack, making it uniquely positioned for teams running both traditional and AI-powered production workloads.
Cloud-native observability with AI correlation
Middleware is a cloud-native observability platform that provides real-time insights into Kubernetes environments using AI to correlate metrics, logs, and traces for faster troubleshooting. It simplifies the debugging of complex microservice clusters by automatically connecting related signals across distributed systems, with a freemium model accessible to teams of all sizes.
Data and AI observability for production pipelines
Monte Carlo is the leading data observability platform that recently expanded into AI observability, providing end-to-end lineage tracking from data tables to LLM outputs. It detects data downtime including freshness issues, volume anomalies, schema changes, and distribution drift across data warehouses, lakes, and AI pipelines, helping teams ensure that the data powering their AI applications is reliable and trustworthy.
Open-source ML and LLM monitoring with 100+ metrics
Evidently AI is an open-source platform with 100+ pre-built metrics for monitoring data quality, model performance, and data drift in AI/ML pipelines. Available under Apache 2.0 with a cloud version, it helps teams detect when production data shifts away from training distributions, LLM output quality degrades, or feature pipelines introduce anomalies that silently degrade model accuracy.
Open-source distributed tracing for microservice observability
Jaeger is a CNCF-graduated open-source distributed tracing platform developed by Uber Technologies for monitoring microservice architectures. It captures trace data as requests flow through distributed systems, visualizing service dependencies and identifying latency bottlenecks. Jaeger supports OpenTelemetry natively and integrates with Grafana, Elasticsearch, and Kafka for production-scale observability with configurable storage backends.
Full-stack observability with AI-powered monitoring
New Relic is a full-stack observability platform combining APM, infrastructure monitoring, log management, distributed tracing, browser and mobile monitoring, synthetics, and AIOps in one unified dashboard with 50+ capabilities and 780+ integrations. Features AI-powered anomaly detection, incident correlation, root cause analysis, and an SRE agent for automated troubleshooting. Usage-based pricing with 100 GB free data ingest monthly and one free full platform user. Used by 16,000+ orgs.
Open-source AI observability for models and data pipelines
WhyLabs is an AI observability platform for monitoring ML models, LLM apps, and data pipelines — now fully open-sourced. Built on whylogs for privacy-preserving data logging and LangKit for LLM monitoring. Provides continuous drift detection, data quality monitoring, anomaly alerting, and LLM security including prompt injection and hallucination detection. Processes 100% of data without sampling across tabular, image, text, and embedding types. Incubated at Allen Institute for AI.
Open-source monitoring and alerting toolkit — the CNCF standard for metrics collection.
Prometheus is the open-source monitoring system and time-series database that has become the CNCF standard for metrics collection in cloud-native environments. Features a powerful query language (PromQL), pull-based metrics collection, multi-dimensional data model, and built-in alerting via Alertmanager. The foundation of modern Kubernetes observability.
Application monitoring and error tracking that helps developers fix issues faster.
Sentry is the leading error tracking and performance monitoring platform for developers. Captures and aggregates errors with full stack traces, breadcrumbs, and context across 100+ platforms. Used by over 100,000 organizations. Features session replay, performance tracing, and code-level profiling. Open source self-hosted option available.
Cloud-scale monitoring, security, and analytics platform for modern infrastructure.
Datadog is a comprehensive observability platform that unifies metrics, traces, logs, and security signals across your entire stack. Used by thousands of enterprises to monitor cloud infrastructure and applications in real time. Offers 750+ integrations, AI-powered alerting, and end-to-end visibility. The industry leader in cloud monitoring with support for AWS, Azure, GCP, and Kubernetes.
Open-source observability platform for metrics, logs, and traces visualization.
Grafana is the leading open-source platform for monitoring and observability visualization. It connects to virtually any data source — Prometheus, Elasticsearch, InfluxDB, PostgreSQL, CloudWatch, Datadog, and 150+ others — to create beautiful, interactive dashboards. Used by millions of users at companies like Bloomberg, JPMorgan, eBay, and PayPal. Grafana Cloud offers a fully managed experience with generous free tier. The CNCF ecosystem standard for metrics visualization.