AI Monitoring & Observability

Traceway

OpenTelemetry-native observability with AI tracing, logs, traces, metrics, and session replay — self-hosted in 90 seconds.

Traceway is an open-source, OpenTelemetry-native observability platform that combines logs, traces, metrics, exceptions, session replay, and AI tracing in a single self-hosted system. MIT licensed with no open-core restrictions, it deploys in 90 seconds via Docker Compose and accepts OTLP/HTTP from any OTel SDK without a Collector or per-language vendor SDK.

open-sourceOpen Source

Judgeval

Open-source post-building layer for agents — tracing, evals, and online monitoring

Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.

open-sourceOpen Source

TraceRoot

Open-source observability and self-healing layer for AI agents

TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.

open-sourceOpen Source

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is an open-source Python toolkit from Tracer Cloud for building AI SRE agents that investigate and respond to production incidents. It ships with connectors to Prometheus, Grafana, Kubernetes and incident platforms, plus a simulation harness that replays past incidents so teams can benchmark agent accuracy before trusting it on live pager rotations.

open-sourceOpen Source

Evolver

Self-evolution engine for AI agents with auditable updates

Evolver is an open-source self-evolution engine for AI agents that turns run logs into auditable, reviewable updates via its Genome Evolution Protocol. Instead of ad hoc prompt tweaking, teams collect traces and Evolver proposes versioned diffs to prompts, tools and workflows that engineers can approve, reject or roll back like code.

open-sourceOpen Source

CodeBurn

See where your AI coding tokens actually go

Open-source TUI dashboard and CLI that shows where your AI coding tokens actually go, broken down by task type, tool, model, MCP server, and project. CodeBurn reads local session data directly from Claude Code, Codex, Cursor, OpenCode, Pi, and GitHub Copilot — no wrapper, proxy, or API keys — and layers on one-shot success rates so you can see whether the AI nails work first try or burns budget on edit/test/fix retries. Ships with a macOS menu bar widget and CSV/JSON export.

freeOpen Source

Weights & Biases

ML experiment tracking and model monitoring

Weights and Biases is the AI developer platform for experiment tracking, model monitoring, and ML workflow orchestration. Weave extends W&B with LLM ops capabilities for prompt engineering, evaluation, and deployment. Enables teams to track experiments, monitor model performance in production, manage datasets, log LLM application traces, and collaborate on ML projects with visualization dashboards, automated logging, and enterprise SSO and RBAC compliance.

freemium

Resolve AI

AI-powered production incident resolution

Resolve AI automates production incident investigation, diagnosis, and remediation acting as an AI SRE that participates in every on-call rotation. Autonomously investigates incidents pursuing multiple hypotheses in parallel, validates against real evidence, creates code snippets and drafts PRs, generates post-mortems, and onboards new teammates with instant answers about code and infrastructure. Drives 5x faster MTTR and 87% faster incident investigations.

paid

Hopsworks

AI Lakehouse with Feature Store for real-time ML

Hopsworks is a data-intensive AI platform combining a Python-centric Feature Store with MLOps capabilities for production ML systems. Provides sub-millisecond feature retrieval powered by RonDB, dual offline and online storage for batch and real-time inference, experiment tracking, model registry, and deployment pipelines. Available as managed cloud on AWS, Azure, and GCP, self-hosted on Kubernetes, or serverless platform.

freemiumOpen Source

RagaAI Catalyst

AI testing and evaluation for agents and LLM apps

RagaAI Catalyst is a comprehensive Python SDK for observability, monitoring, and evaluation of LLM and agentic applications. Provides agent tracing with execution graph visualization, self-hosted dashboard with analytics, synthetic data generation, multi-metric evaluation framework, and guardrail management. Built for teams running production RAG systems and AI agents who need systematic testing, debugging, and performance optimization workflows.

open-sourceOpen Source

Laminar

Open-source observability for AI agents

Laminar is an open-source observability platform for AI agents providing tracing, evaluation, and analytics for LLM applications. It integrates with Vercel AI SDK, LangChain, OpenAI, and Anthropic with a single line of code. Features include OpenTelemetry-native SDKs, an extensible evaluation framework with CI/CD support, SQL access to traces and metrics, and a visual debugging timeline for agent reasoning and actions.

freemiumOpen Source

New API

Unified LLM API gateway and proxy hub

New API is an open-source multi-tenant AI gateway that aggregates and distributes LLM API requests across providers like OpenAI, Claude, and Gemini through a unified proxy interface. It cross-converts requests into OpenAI-compatible, Claude-compatible, or Gemini-compatible formats, with built-in channel management, quota control, token-based authentication, and billing capabilities. Deploy via Docker with SQLite or MySQL for centralized model management.

open-sourceOpen Source

Sentrial

Production monitoring platform for AI agent reliability

Sentrial is a YC W26-backed monitoring platform for AI agent reliability in production. It semantically detects loops, hallucinations, tool misuse, and user frustration in real-time, then diagnoses root causes and recommends fixes. The platform claims 70% MTTR reduction via automated remediation including rollback, model retraining triggers, and webhooks. Sentrial positions itself as the Datadog for teams deploying autonomous AI agents at scale.

paid

Sonarly

AI production engineer that auto-triages and fixes alerts

Sonarly is a YC W26-backed AI production engineer that autonomously triages production alerts, deduplicates them by root cause, and sends ready-to-merge pull request fixes. It connects to monitoring tools like Sentry and Datadog, analyzes alert patterns to identify the underlying issue, and generates code fixes or optimization recommendations. Built on Claude APIs, Sonarly reduces mean time to resolution for production incidents while minimizing alert fatigue for engineering teams.

paid

AI-Infra-Guard

AI red teaming and infrastructure security scanner by Tencent

AI-Infra-Guard is Tencent's open-source AI security platform providing one-click evaluation of AI infrastructure risks across five modules. It covers insecure config detection, multi-agent workflow evaluation, MCP server scanning across 14 risk categories, vulnerability scanning for 55+ AI frameworks with 1,000+ CVE mappings, and jailbreak evaluation for prompt robustness. Deployable via Docker with academic backing from Peking and Fudan Universities.

open-sourceOpen Source

Manifest

Smart LLM router that cuts inference costs up to 70%

Manifest is an open-source smart model router that intelligently routes LLM requests to the cheapest capable model, reducing inference costs by up to 70% without sacrificing output quality. It uses a 23-dimension scoring algorithm to evaluate 300+ models across providers including OpenAI, Anthropic, Google, and DeepSeek, with automatic fallbacks and budget controls. Manifest can be deployed as a cloud service, local plugin, or self-hosted Docker container with transparent routing logic.

freemiumOpen Source

CozeLoop

Full-lifecycle AI agent optimization and monitoring

CozeLoop is an open-source AI agent optimization platform from ByteDance's Coze ecosystem providing full-lifecycle management from development to production monitoring. It enables developers to debug agent prompts, evaluate agent performance across test cases, optimize reasoning processes, and monitor deployed agents in real-time. Built on Go and React with SDKs for Go, Python, and Node.js, CozeLoop is designed for enterprise-grade AI agent development and operation.

open-sourceOpen Source

PostHog

Open-source product analytics, session replay, and feature flags

PostHog is an all-in-one open-source product analytics platform that combines event tracking, session replay, feature flags, A/B testing, error tracking, and user surveys in a single tool. With a generous free tier covering 1 million events per month, it replaces the need for separate tools like Mixpanel, Amplitude, LaunchDarkly, and Hotjar. Self-host on your own infrastructure for complete data ownership, or use the managed cloud with warehousing and SQL query access.

freemiumOpen Source

Keep

Open-source AIOps alert management platform

Keep is an open-source AIOps platform that provides a single pane of glass for all alerts from monitoring tools like Datadog, PagerDuty, Grafana, and 50+ integrations. It uses AI to correlate, deduplicate, and enrich alerts, reducing noise and helping on-call teams focus on real incidents. Keep includes workflow automation, bidirectional sync with ticketing systems, and a modern web dashboard.

open-sourceOpen Source

RouteLLM

Intelligent model router that balances cost and quality across LLM providers

RouteLLM by LMSYS routes LLM requests to the most cost-effective model that can handle each query's complexity. It uses learned routing models to classify whether a query needs a powerful expensive model or can be handled by a cheaper alternative, reducing costs by up to 85% while maintaining quality. Supports OpenAI, Anthropic, and other providers through an OpenAI-compatible API.

open-sourceOpen Source

Robusta

CNCF Sandbox Kubernetes alert enrichment and automation platform

Robusta is a CNCF Sandbox project that enriches Kubernetes alerts with diagnostic context and automates remediation workflows. It intercepts Prometheus alerts, attaches relevant logs, pod status, resource metrics, and troubleshooting suggestions before delivering them to Slack, Teams, or PagerDuty. Supports custom playbooks for automated incident response and AI-powered root cause analysis.

open-sourceOpen Source

Metoro

AI-powered SRE agent for Kubernetes troubleshooting

Metoro is an AI SRE platform for Kubernetes that combines observability with autonomous troubleshooting. Its Guardian agent monitors cluster health, correlates metrics, logs, and traces to identify root causes, and suggests remediation actions. Features an MCP server for integration with AI coding agents and natural language querying of infrastructure state.

freemium

Coroot

Zero-instrumentation Kubernetes observability powered by eBPF

Coroot is an open-source observability platform that uses eBPF to automatically instrument Kubernetes applications without code changes. It provides application maps, latency analysis, log correlation, and continuous profiling with automatic anomaly detection. Replaces the need for manual instrumentation with agents that capture metrics, traces, and logs at the kernel level.

open-sourceOpen Source

OpenLIT

OpenTelemetry-native observability for LLM applications with evals and GPU monitoring

OpenLIT is an open-source AI engineering platform that provides OpenTelemetry-native observability for LLM applications. It combines distributed tracing, evaluation, prompt management, a secrets vault, and GPU telemetry in a single self-hostable stack. With 50+ integrations across LLM providers and frameworks, it lets teams monitor AI applications using their existing observability backends like Grafana, Datadog, or Jaeger.

open-sourceOpen Source