65 tools tagged
Showing 24 of 65 tools
Persistent memory layer for AI coding agents — keeps Claude Code, Codex, Cursor, and any MCP agent in context across sessions
agentmemory is an open-source MCP server that gives AI coding agents persistent, cross-session memory. Built on hybrid vector-graph search, it achieves 95.2% recall on the LongMemEval-S benchmark while using up to 92% fewer context tokens than naive context injection. Works out of the box with Claude Code, Codex, Cursor, Windsurf, Cline, OpenCode, Kilo Code, Hermes, and any MCP client through 51 MCP tools plus 12 hooks and 4 skills.
OpenTelemetry-native observability with AI tracing, logs, traces, metrics, and session replay — self-hosted in 90 seconds.
Traceway is an open-source, OpenTelemetry-native observability platform that combines logs, traces, metrics, exceptions, session replay, and AI tracing in a single self-hosted system. MIT licensed with no open-core restrictions, it deploys in 90 seconds via Docker Compose and accepts OTLP/HTTP from any OTel SDK without a Collector or per-language vendor SDK.
Open-source post-building layer for agents — tracing, evals, and online monitoring
Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.
Open-source observability and self-healing layer for AI agents
TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.
Self-evolution engine for AI agents with auditable updates
Evolver is an open-source self-evolution engine for AI agents that turns run logs into auditable, reviewable updates via its Genome Evolution Protocol. Instead of ad hoc prompt tweaking, teams collect traces and Evolver proposes versioned diffs to prompts, tools and workflows that engineers can approve, reject or roll back like code.
See where your AI coding tokens actually go
Open-source TUI dashboard and CLI that shows where your AI coding tokens actually go, broken down by task type, tool, model, MCP server, and project. CodeBurn reads local session data directly from Claude Code, Codex, Cursor, OpenCode, Pi, and GitHub Copilot — no wrapper, proxy, or API keys — and layers on one-shot success rates so you can see whether the AI nails work first try or burns budget on edit/test/fix retries. Ships with a macOS menu bar widget and CSV/JSON export.
ML experiment tracking and model monitoring
Weights and Biases is the AI developer platform for experiment tracking, model monitoring, and ML workflow orchestration. Weave extends W&B with LLM ops capabilities for prompt engineering, evaluation, and deployment. Enables teams to track experiments, monitor model performance in production, manage datasets, log LLM application traces, and collaborate on ML projects with visualization dashboards, automated logging, and enterprise SSO and RBAC compliance.
AI-powered production incident resolution
Resolve AI automates production incident investigation, diagnosis, and remediation acting as an AI SRE that participates in every on-call rotation. Autonomously investigates incidents pursuing multiple hypotheses in parallel, validates against real evidence, creates code snippets and drafts PRs, generates post-mortems, and onboards new teammates with instant answers about code and infrastructure. Drives 5x faster MTTR and 87% faster incident investigations.
AI testing and evaluation for agents and LLM apps
RagaAI Catalyst is a comprehensive Python SDK for observability, monitoring, and evaluation of LLM and agentic applications. Provides agent tracing with execution graph visualization, self-hosted dashboard with analytics, synthetic data generation, multi-metric evaluation framework, and guardrail management. Built for teams running production RAG systems and AI agents who need systematic testing, debugging, and performance optimization workflows.
Open-source observability for AI agents
Laminar is an open-source observability platform for AI agents providing tracing, evaluation, and analytics for LLM applications. It integrates with Vercel AI SDK, LangChain, OpenAI, and Anthropic with a single line of code. Features include OpenTelemetry-native SDKs, an extensible evaluation framework with CI/CD support, SQL access to traces and metrics, and a visual debugging timeline for agent reasoning and actions.
Open-source product analytics, session replay, and feature flags
PostHog is an all-in-one open-source product analytics platform that combines event tracking, session replay, feature flags, A/B testing, error tracking, and user surveys in a single tool. With a generous free tier covering 1 million events per month, it replaces the need for separate tools like Mixpanel, Amplitude, LaunchDarkly, and Hotjar. Self-host on your own infrastructure for complete data ownership, or use the managed cloud with warehousing and SQL query access.
Postgres sync engine for local-first and real-time applications
ElectricSQL is a sync engine that keeps local application state synchronized with PostgreSQL in real-time. It enables local-first architectures where apps work offline with instant responsiveness, syncing data bidirectionally when connectivity is available. Supports partial replication with shape-based subscriptions to sync only relevant data subsets to each client.
Graph-relational database with EdgeQL query language, formerly EdgeDB
Gel (formerly EdgeDB) is a graph-relational database that combines the relational model with graph database traversal capabilities through its EdgeQL query language. Built on PostgreSQL, it eliminates the object-relational impedance mismatch with a type system that maps directly to application data models. Features built-in migrations, authentication, and an interactive web UI.
AI-powered test generation agent for automated code coverage improvement
qodo-cover (formerly Cover Agent) is an open-source AI agent that automatically generates meaningful unit tests to improve code coverage. It analyzes existing code and test patterns to produce tests that follow project conventions and target uncovered branches. Uses an iterative approach where generated tests are verified by running them, discarding those that fail. MIT licensed with over 5,300 GitHub stars.
CNCF Sandbox chaos engineering framework for Kubernetes resilience
Krkn is a CNCF Sandbox chaos engineering tool that tests Kubernetes cluster resilience by injecting controlled failures. It simulates pod kills, node failures, network partitions, CPU/memory pressure, and zone outages. Krkn-AI adds AI-powered scenario generation that suggests chaos experiments based on cluster topology. Supports CI/CD integration for automated resilience testing in deployment pipelines.
CNCF Sandbox Kubernetes alert enrichment and automation platform
Robusta is a CNCF Sandbox project that enriches Kubernetes alerts with diagnostic context and automates remediation workflows. It intercepts Prometheus alerts, attaches relevant logs, pod status, resource metrics, and troubleshooting suggestions before delivering them to Slack, Teams, or PagerDuty. Supports custom playbooks for automated incident response and AI-powered root cause analysis.
AI-powered SRE agent for Kubernetes troubleshooting
Metoro is an AI SRE platform for Kubernetes that combines observability with autonomous troubleshooting. Its Guardian agent monitors cluster health, correlates metrics, logs, and traces to identify root causes, and suggests remediation actions. Features an MCP server for integration with AI coding agents and natural language querying of infrastructure state.
Git-native AI agent session capture and reasoning traceability
Checkpoints by Entire captures the full reasoning context behind AI-generated code directly in Git. Founded by former GitHub CEO Thomas Dohmke with a 60 million dollar seed round, it records transcripts, prompts, files touched, token usage, and tool calls alongside every commit. Session metadata lives on a separate branch keeping your history clean, with rewind capabilities to restore any previous agent checkpoint when things go sideways.
OpenTelemetry-native observability for LLM applications with evals and GPU monitoring
OpenLIT is an open-source AI engineering platform that provides OpenTelemetry-native observability for LLM applications. It combines distributed tracing, evaluation, prompt management, a secrets vault, and GPU telemetry in a single self-hostable stack. With 50+ integrations across LLM providers and frameworks, it lets teams monitor AI applications using their existing observability backends like Grafana, Datadog, or Jaeger.
ACP skill definitions giving coding agents HuggingFace ML superpowers
Hugging Face Skills is the official collection of ACP skill definitions that give AI coding agents access to HuggingFace ML capabilities. The 13 skills cover LLM fine-tuning with TRL, vision model training, dataset management, model evaluation, and cloud job submission on HF infrastructure. Compatible with Claude Code, Codex, Gemini CLI, and Cursor via a single npx command.
Open-source LLMOps platform for prompt management and evaluation
Agenta is an open-source LLMOps platform that combines prompt engineering playgrounds, prompt version management, LLM evaluation, and observability in a unified interface. It supports 50+ LLM models with side-by-side prompt comparison, A/B testing, human evaluation workflows, and OpenTelemetry-native tracing. Self-hostable with 4,000+ GitHub stars.
All-in-one open-source observability — logs, metrics, traces, RUM
OpenObserve is an open-source observability platform that unifies logs, metrics, traces, and real user monitoring in a single binary. It claims 140x lower storage costs than Elasticsearch through columnar storage and compression, with native OpenTelemetry support, a built-in query UI, dashboards, and alerts. Designed for AI and cloud-native workloads at petabyte scale. Over 15,000 GitHub stars.
Open-source LLM gateway with built-in optimization and A/B testing
TensorZero is an open-source LLMOps platform in Rust that unifies an LLM gateway, observability, prompt optimization, and A/B experimentation in a single binary. It routes requests across providers with sub-millisecond P99 latency at 10K+ QPS while capturing structured data for continuous improvement. Supports dynamic in-context learning, fine-tuning workflows, and production feedback loops. Backed by $7.3M seed funding, 11K+ GitHub stars.
LLM evaluation and tracking with RAG triad metrics
TruLens is an open-source framework for evaluating and tracking LLM experiments with feedback functions, RAG triad metrics (answer relevance, context relevance, groundedness), and Honest/Harmless/Helpful evaluations. Features a unified Metric API for systematic evaluation of RAG pipelines and AI agents. 3,200+ GitHub stars, MIT licensed. Snowflake partnership adds enterprise integration. Supports LangChain, LlamaIndex, and custom LLM applications.