Loading...
Loading...
Using AI tools to identify, diagnose, and fix bugs — from automated error analysis to intelligent stack trace interpretation and root cause detection
Showing 24 of 111 tools
Open-source post-building layer for agents — tracing, evals, and online monitoring
Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.
Open-source observability and self-healing layer for AI agents
TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.
One tool for intercepting, mocking, and replaying HTTP — acquired by BrowserStack
Requestly is an open-source HTTP interceptor, API client, and session replay tool that lets developers modify, mock, and debug network traffic without leaving the browser. Acquired by BrowserStack and trusted by 200,000+ developers, it bundles a Chrome extension, a full API client, mock servers, and shareable session captures into one free-plus-commercial product.
Open-source toolkit for building AI SRE incident response agents
OpenSRE is an open-source Python toolkit from Tracer Cloud for building AI SRE agents that investigate and respond to production incidents. It ships with connectors to Prometheus, Grafana, Kubernetes and incident platforms, plus a simulation harness that replays past incidents so teams can benchmark agent accuracy before trusting it on live pager rotations.
Self-evolution engine for AI agents with auditable updates
Evolver is an open-source self-evolution engine for AI agents that turns run logs into auditable, reviewable updates via its Genome Evolution Protocol. Instead of ad hoc prompt tweaking, teams collect traces and Evolver proposes versioned diffs to prompts, tools and workflows that engineers can approve, reject or roll back like code.
Official Chrome DevTools MCP server for coding agents
chrome-devtools-mcp is the Chrome DevTools team's official MCP server that lets coding agents control and inspect a live Chrome browser with first-party Chrome DevTools Protocol fidelity. It exposes Network inspection, Performance traces, Lighthouse audits, console output, and structured DOM snapshots as typed MCP tools, so agents can debug real pages and ship reliable web performance investigations without resorting to brittle DOM scraping.
AI-powered production incident resolution
Resolve AI automates production incident investigation, diagnosis, and remediation acting as an AI SRE that participates in every on-call rotation. Autonomously investigates incidents pursuing multiple hypotheses in parallel, validates against real evidence, creates code snippets and drafts PRs, generates post-mortems, and onboards new teammates with instant answers about code and infrastructure. Drives 5x faster MTTR and 87% faster incident investigations.
Java diagnostic and troubleshooting tool
Arthas is Alibaba's open-source Java diagnostic tool that lets developers troubleshoot production issues without modifying code or restarting servers. It attaches to running JVM processes to inspect class loading, decompile classes, trace method invocations, monitor performance metrics, and view real-time stack traces. Supports JDK 6+ with both telnet and WebSocket interfaces for local and remote diagnostics across Linux, macOS, and Windows.
Production monitoring platform for AI agent reliability
Sentrial is a YC W26-backed monitoring platform for AI agent reliability in production. It semantically detects loops, hallucinations, tool misuse, and user frustration in real-time, then diagnoses root causes and recommends fixes. The platform claims 70% MTTR reduction via automated remediation including rollback, model retraining triggers, and webhooks. Sentrial positions itself as the Datadog for teams deploying autonomous AI agents at scale.
AI production engineer that auto-triages and fixes alerts
Sonarly is a YC W26-backed AI production engineer that autonomously triages production alerts, deduplicates them by root cause, and sends ready-to-merge pull request fixes. It connects to monitoring tools like Sentry and Datadog, analyzes alert patterns to identify the underlying issue, and generates code fixes or optimization recommendations. Built on Claude APIs, Sonarly reduces mean time to resolution for production incidents while minimizing alert fatigue for engineering teams.
Open-source product analytics, session replay, and feature flags
PostHog is an all-in-one open-source product analytics platform that combines event tracking, session replay, feature flags, A/B testing, error tracking, and user surveys in a single tool. With a generous free tier covering 1 million events per month, it replaces the need for separate tools like Mixpanel, Amplitude, LaunchDarkly, and Hotjar. Self-host on your own infrastructure for complete data ownership, or use the managed cloud with warehousing and SQL query access.
Open-source AIOps alert management platform
Keep is an open-source AIOps platform that provides a single pane of glass for all alerts from monitoring tools like Datadog, PagerDuty, Grafana, and 50+ integrations. It uses AI to correlate, deduplicate, and enrich alerts, reducing noise and helping on-call teams focus on real incidents. Keep includes workflow automation, bidirectional sync with ticketing systems, and a modern web dashboard.
Extremely fast Python type checker written in Rust
ty is an extremely fast Python type checker built in Rust by Astral, the team behind Ruff and uv. It performs full type inference, supports PEP 695 type parameter syntax, and checks Python code orders of magnitude faster than mypy or pyright. ty completes the Astral Python toolchain alongside Ruff for linting and uv for package management, giving developers a unified Rust-powered development experience.
Run GitHub Actions locally for fast feedback
Act is an open-source tool that runs GitHub Actions workflows locally using Docker containers that match GitHub's execution environment. It provides instant feedback on workflow changes without pushing to a repository, supports matrix builds, secret management, and artifact handling. Act can also replace Makefiles by using workflow files as task definitions, making it useful for both CI/CD development and local task automation across development teams.
CNCF Sandbox Kubernetes alert enrichment and automation platform
Robusta is a CNCF Sandbox project that enriches Kubernetes alerts with diagnostic context and automates remediation workflows. It intercepts Prometheus alerts, attaches relevant logs, pod status, resource metrics, and troubleshooting suggestions before delivering them to Slack, Teams, or PagerDuty. Supports custom playbooks for automated incident response and AI-powered root cause analysis.
AI-powered SRE agent for Kubernetes troubleshooting
Metoro is an AI SRE platform for Kubernetes that combines observability with autonomous troubleshooting. Its Guardian agent monitors cluster health, correlates metrics, logs, and traces to identify root causes, and suggests remediation actions. Features an MCP server for integration with AI coding agents and natural language querying of infrastructure state.
Zero-instrumentation Kubernetes observability powered by eBPF
Coroot is an open-source observability platform that uses eBPF to automatically instrument Kubernetes applications without code changes. It provides application maps, latency analysis, log correlation, and continuous profiling with automatic anomaly detection. Replaces the need for manual instrumentation with agents that capture metrics, traces, and logs at the kernel level.
Bayesian git bisection for finding commits that caused flaky tests
Git Bayesect applies Bayesian inference to git bisection, solving the problem of finding commits that introduced non-deterministic bugs like flaky tests. Unlike standard git bisect which requires binary pass-fail results, Git Bayesect handles probabilistic outcomes where a test might pass sometimes and fail sometimes, using entropy minimization to efficiently narrow down the culprit commit.
Open-source LLMOps platform for prompt management and evaluation
Agenta is an open-source LLMOps platform that combines prompt engineering playgrounds, prompt version management, LLM evaluation, and observability in a unified interface. It supports 50+ LLM models with side-by-side prompt comparison, A/B testing, human evaluation workflows, and OpenTelemetry-native tracing. Self-hostable with 4,000+ GitHub stars.
All-in-one open-source observability — logs, metrics, traces, RUM
OpenObserve is an open-source observability platform that unifies logs, metrics, traces, and real user monitoring in a single binary. It claims 140x lower storage costs than Elasticsearch through columnar storage and compression, with native OpenTelemetry support, a built-in query UI, dashboards, and alerts. Designed for AI and cloud-native workloads at petabyte scale. Over 15,000 GitHub stars.
Slack-native incident management with AI SRE agent
Incident.io is a Slack-native incident management platform with an AI SRE that autonomously investigates alerts, correlates deployments with telemetry, and drafts fix pull requests. Used by Buffer (70% fewer critical incidents), Favor (37% MTTR reduction), Intercom, and Productboard. Features include automated workflows, on-call scheduling, post-incident learning, and status pages. Integrates with PagerDuty, Datadog, GitHub, Jira, and 100+ tools.
Observability platform purpose-built for Python and Pydantic AI apps
Pydantic Logfire is an observability platform built by the Pydantic team specifically for Python AI applications. It provides structured logging, distributed tracing, and metrics with native understanding of Pydantic models, FastAPI, and AI framework data types. Auto-instruments OpenAI, Anthropic, LangChain, and other LLM providers. Built on OpenTelemetry for vendor-neutral data export. Offers a managed cloud dashboard with a generous free tier for development and small-scale production use.
Non-agent approach to automated software engineering via localize-and-repair
Agentless takes a deliberate non-agent approach to LLM-powered software engineering. Instead of autonomous agents making tool calls, it uses a structured localize-then-repair pipeline: first narrowing down which files and functions are relevant, then generating targeted patches. Achieved competitive SWE-Bench results at $0.34 average cost per issue. Adopted by OpenAI for o3 evaluations. 3,000+ GitHub stars, MIT licensed. A counterpoint to the agent-heavy trend in AI coding tools.
OpenTelemetry-based observability SDK for LLM applications
Traceloop's OpenLLMetry is an open-source observability SDK that instruments LLM applications using the OpenTelemetry standard. It auto-traces calls to OpenAI, Anthropic, Cohere, Pinecone, ChromaDB, LangChain, and other AI frameworks, sending data to any OTEL-compatible backend like Datadog, Grafana, Jaeger, or Honeycomb. Backed by Battery Ventures with 2,000+ GitHub stars. Ideal for teams already using OpenTelemetry who want LLM observability without vendor lock-in.