Best tools for AI-Powered Debugging

Using AI tools to identify, diagnose, and fix bugs — from automated error analysis to intelligent stack trace interpretation and root cause detection

Showing 24 of 113 tools

Accomplish Coworker

Open-source desktop AI coworker for browsing and code execution.

Accomplish Coworker is an MIT-licensed open-source AI coworker that runs on the desktop, combining computer-use style browsing with code execution so agents can research, implement, run, and debug workflows in one local environment.

open-sourceOpen SourceTelemetry

Safari MCP Server

Apple's Safari-native MCP server for web debugging agents

Safari MCP Server is Apple's safaridriver-based MCP server in Safari Technology Preview, giving compatible coding agents local access to Safari page content, console logs, network requests, screenshots, JavaScript evaluation, interactions, viewport controls, and accessibility/performance checks.

freeTelemetry

Judgeval

Open-source post-building layer for agents — tracing, evals, and online monitoring

Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.

open-sourceOpen Source

TraceRoot

Open-source observability and self-healing layer for AI agents

TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.

open-sourceOpen Source

Requestly

One tool for intercepting, mocking, and replaying HTTP — acquired by BrowserStack

Requestly is a BrowserStack-backed API client, HTTP interceptor, mock server, and session replay tool for frontend and QA teams. Its current product is commercial/API-client led, while the legacy interceptor/open-source code is AGPLv3. The free plan covers individual workflows, and Pro lists at $12/user/month monthly or $9/user/month annually for collaborative QA and frontend debugging teams.

freemium

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.

open-sourceOpen Source

Evolver

Self-evolution engine for AI agents with auditable updates

Evolver is an open-source self-evolution engine for AI agents that turns run logs into auditable, reviewable updates via its Genome Evolution Protocol. Instead of ad hoc prompt tweaking, teams collect traces and Evolver proposes versioned diffs to prompts, tools and workflows that engineers can approve, reject or roll back like code.

open-sourceOpen Source

chrome-devtools-mcp

Official Chrome DevTools MCP server for coding agents

chrome-devtools-mcp is the Chrome DevTools team's official MCP server that lets coding agents control and inspect a live Chrome browser with first-party Chrome DevTools Protocol fidelity. It exposes Network inspection, Performance traces, Lighthouse audits, console output, and structured DOM snapshots as typed MCP tools, so agents can debug real pages and ship reliable web performance investigations without resorting to brittle DOM scraping.

open-sourceOpen Source

Resolve AI

AI-powered production incident resolution

Resolve AI automates production incident investigation, diagnosis, and remediation acting as an AI SRE that participates in every on-call rotation. Autonomously investigates incidents pursuing multiple hypotheses in parallel, validates against real evidence, creates code snippets and drafts PRs, generates post-mortems, and onboards new teammates with instant answers about code and infrastructure. Drives 5x faster MTTR and 87% faster incident investigations.

paid

Arthas

Java diagnostic and troubleshooting tool

Arthas is Alibaba's open-source Java diagnostic tool that lets developers troubleshoot production issues without modifying code or restarting servers. It attaches to running JVM processes to inspect class loading, decompile classes, trace method invocations, monitor performance metrics, and view real-time stack traces. Supports JDK 6+ with both telnet and WebSocket interfaces for local and remote diagnostics across Linux, macOS, and Windows.

open-sourceOpen Source

Sentrial

Production monitoring platform for AI agent reliability

Sentrial is a YC W26-backed monitoring platform for AI agent reliability in production. It semantically detects loops, hallucinations, tool misuse, and user frustration in real-time, then diagnoses root causes and recommends fixes. The platform claims 70% MTTR reduction via automated remediation including rollback, model retraining triggers, and webhooks. Sentrial positions itself as the Datadog for teams deploying autonomous AI agents at scale.

paid

Sonarly

AI production engineer that auto-triages and fixes alerts

Sonarly is a YC W26-backed AI production engineer that autonomously triages production alerts, deduplicates them by root cause, and sends ready-to-merge pull request fixes. It connects to monitoring tools like Sentry and Datadog, analyzes alert patterns to identify the underlying issue, and generates code fixes or optimization recommendations. Built on Claude APIs, Sonarly reduces mean time to resolution for production incidents while minimizing alert fatigue for engineering teams.

paid

PostHog

Open-source product analytics, session replay, and feature flags

PostHog is an open-source product and data tools platform for analytics, session replay, feature flags, experiments, surveys, error tracking, web analytics, data warehouse, CDP and LLM observability workflows. It suits developer-led teams that want one integrated product OS instead of many separate tools.

freemiumOpen Source

Keep

Open-source AIOps alert management platform

Keep is an open-source AIOps platform that provides a single pane of glass for all alerts from monitoring tools like Datadog, PagerDuty, Grafana, and 50+ integrations. It uses AI to correlate, deduplicate, and enrich alerts, reducing noise and helping on-call teams focus on real incidents. Keep includes workflow automation, bidirectional sync with ticketing systems, and a modern web dashboard.

open-sourceOpen Source

ty

Extremely fast Python type checker written in Rust

ty is an extremely fast Python type checker built in Rust by Astral, the team behind Ruff and uv. It performs full type inference, supports PEP 695 type parameter syntax, and checks Python code orders of magnitude faster than mypy or pyright. ty completes the Astral Python toolchain alongside Ruff for linting and uv for package management, giving developers a unified Rust-powered development experience.

open-sourceOpen Source

Act

Run GitHub Actions locally for fast feedback

Act is an open-source tool that runs GitHub Actions workflows locally using Docker containers that match GitHub's execution environment. It provides instant feedback on workflow changes without pushing to a repository, supports matrix builds, secret management, and artifact handling. Act can also replace Makefiles by using workflow files as task definitions, making it useful for both CI/CD development and local task automation across development teams.

open-sourceOpen Source

Robusta

CNCF Sandbox Kubernetes alert enrichment and automation platform

Robusta is a CNCF Sandbox project that enriches Kubernetes alerts with diagnostic context and automates remediation workflows. It intercepts Prometheus alerts, attaches relevant logs, pod status, resource metrics, and troubleshooting suggestions before delivering them to Slack, Teams, or PagerDuty. Supports custom playbooks for automated incident response and AI-powered root cause analysis.

open-sourceOpen Source

Metoro

AI-powered SRE agent for Kubernetes troubleshooting

Metoro is an AI SRE platform for Kubernetes that combines observability with autonomous troubleshooting. Its Guardian agent monitors cluster health, correlates metrics, logs, and traces to identify root causes, and suggests remediation actions. Features an MCP server for integration with AI coding agents and natural language querying of infrastructure state.

freemium

Coroot

Zero-instrumentation Kubernetes observability powered by eBPF

Coroot is an open-source observability platform that uses eBPF to automatically instrument Kubernetes applications without code changes. It provides application maps, latency analysis, log correlation, and continuous profiling with automatic anomaly detection. Replaces the need for manual instrumentation with agents that capture metrics, traces, and logs at the kernel level.

open-sourceOpen Source

Git Bayesect

Bayesian git bisection for finding commits that caused flaky tests

Git Bayesect applies Bayesian inference to git bisection, solving the problem of finding commits that introduced non-deterministic bugs like flaky tests. Unlike standard git bisect which requires binary pass-fail results, Git Bayesect handles probabilistic outcomes where a test might pass sometimes and fail sometimes, using entropy minimization to efficiently narrow down the culprit commit.

open-sourceOpen Source

Agenta

Open-source LLMOps platform for prompt management and evaluation

Agenta is an open-source LLMOps platform that combines prompt engineering playgrounds, prompt version management, LLM evaluation, and observability in a unified interface. It supports 50+ LLM models with side-by-side prompt comparison, A/B testing, human evaluation workflows, and OpenTelemetry-native tracing. Self-hostable with 4,000+ GitHub stars.

open-sourceOpen Source

OpenObserve

All-in-one open-source observability — logs, metrics, traces, RUM

OpenObserve is an open-source observability platform that unifies logs, metrics, traces, and real user monitoring in a single binary. It claims 140x lower storage costs than Elasticsearch through columnar storage and compression, with native OpenTelemetry support, a built-in query UI, dashboards, and alerts. Designed for AI and cloud-native workloads at petabyte scale. Over 15,000 GitHub stars.

open-sourceOpen Source

Incident.io

Slack-native incident management with AI SRE agent

Incident.io is a Slack- and Microsoft Teams-native incident management platform with AI SRE investigation, on-call scheduling, status pages, and post-incident learning in one product. Vendor case studies cite Buffer reducing critical incidents by 70% and Favor reducing MTTR by 37%. It integrates with PagerDuty, Datadog, GitHub, Jira, and 100+ tools for incident response and operational workflows.

freemium

Pydantic Logfire

Observability platform purpose-built for Python and Pydantic AI apps

Pydantic Logfire is an observability platform built by the Pydantic team specifically for Python AI applications. It provides structured logging, distributed tracing, and metrics with native understanding of Pydantic models, FastAPI, and AI framework data types. Auto-instruments OpenAI, Anthropic, LangChain, and other LLM providers. Built on OpenTelemetry for vendor-neutral data export. Offers a managed cloud dashboard with a generous free tier for development and small-scale production use.

freemium