39 tools tagged
Showing 24 of 39 tools
Git-native AI agent session capture and reasoning traceability
Checkpoints by Entire captures the full reasoning context behind AI-generated code directly in Git. Founded by former GitHub CEO Thomas Dohmke with a 60 million dollar seed round, it records transcripts, prompts, files touched, token usage, and tool calls alongside every commit. Session metadata lives on a separate branch keeping your history clean, with rewind capabilities to restore any previous agent checkpoint when things go sideways.
OpenTelemetry-native observability for LLM applications with evals and GPU monitoring
OpenLIT is an open-source AI engineering platform that provides OpenTelemetry-native observability for LLM applications. It combines distributed tracing, evaluation, prompt management, a secrets vault, and GPU telemetry in a single self-hostable stack. With 50+ integrations across LLM providers and frameworks, it lets teams monitor AI applications using their existing observability backends like Grafana, Datadog, or Jaeger.
Open-source observability platform unifying logs, traces, and session replays
HyperDX is an open-source observability platform that correlates session replays, logs, metrics, traces, and errors in a single interface powered by ClickHouse and OpenTelemetry. Acquired by ClickHouse in 2025, it now forms the visualization layer of ClickStack. It offers schema-agnostic querying on any ClickHouse cluster, intuitive full-text and property search syntax, and blazing-fast analytics. Available as a self-hosted Docker deployment or a managed cloud service with a free tier.
Open-source LLMOps platform for prompt management and evaluation
Agenta is an open-source LLMOps platform that combines prompt engineering playgrounds, prompt version management, LLM evaluation, and observability in a unified interface. It supports 50+ LLM models with side-by-side prompt comparison, A/B testing, human evaluation workflows, and OpenTelemetry-native tracing. Self-hostable with 4,000+ GitHub stars.
All-in-one open-source observability — logs, metrics, traces, RUM
OpenObserve is an open-source observability platform that unifies logs, metrics, traces, and real user monitoring in a single binary. It claims 140x lower storage costs than Elasticsearch through columnar storage and compression, with native OpenTelemetry support, a built-in query UI, dashboards, and alerts. Designed for AI and cloud-native workloads at petabyte scale. Over 15,000 GitHub stars.
Open-source LLM gateway with built-in optimization and A/B testing
TensorZero is an open-source LLMOps platform in Rust that unifies an LLM gateway, observability, prompt optimization, and A/B experimentation in a single binary. It routes requests across providers with sub-millisecond P99 latency at 10K+ QPS while capturing structured data for continuous improvement. Supports dynamic in-context learning, fine-tuning workflows, and production feedback loops. Backed by $7.3M seed funding, 11K+ GitHub stars.
Observability data accessible to AI agents via MCP
Netdata's MCP integration exposes infrastructure monitoring, discovery, and root-cause analysis capabilities to AI agents. Built into the 78K+ star Netdata monitoring platform, it lets agents query real-time metrics, explore system health, investigate incidents, and generate observability reports through the Model Context Protocol.
Data and AI observability for enterprise teams
Monte Carlo is the leading data and AI observability platform using ML to monitor pipelines, warehouses, and lakes for quality issues. It detects freshness delays, volume anomalies, schema changes, and distribution shifts before they impact analytics. With 500+ deployments at Nasdaq, Honeywell, and Roche, it provides automated root cause analysis, field-level lineage, and incident management. Available on AWS and Azure Marketplace.
AI-driven log analysis with zero false positives
Dash0 is an AI-driven observability platform focused on log analysis that auto-structures unstructured logs, provides instant alerting with zero false positives, and delivers full-stack tracing capabilities. It uses AI to transform raw log data into structured, searchable events without requiring manual parsing configuration, making log-based debugging significantly faster for engineering teams.
Evaluation-first LLM and agent observability
Confident AI is an evaluation-first observability platform that scores every trace and span with 50+ metrics, alerting on quality drops in LLM and agent applications. It goes beyond traditional APM by treating evaluation as core observability, providing actionable insights that help teams understand not just whether their AI applications are running but whether they are producing correct and useful outputs.
Preemptive observability in the IDE
Digma provides preemptive observability by catching runtime issues and performance bottlenecks directly in the IDE before code is committed. It shifts observability left by helping developers understand how their code will behave in production using real telemetry data, surfacing slow queries, N+1 problems, and error-prone code paths during development rather than after deployment.
Open-source AI-powered log analysis by Salesforce
LogAI is an open-source log analysis platform by Salesforce Research that uses deep learning to detect anomalies in large-scale system logs. It provides research-backed autonomous log troubleshooting capabilities, applying ML models to identify patterns, cluster log events, and surface anomalies that would be invisible in manual log review across high-volume production environments.
AI observability with security posture management
Coralogix uses AI to provide actionable insights across logs and traces with a dedicated AI-SPM dashboard for tracking prompt injections and data leaks in AI applications. Its pay-per-use model with no upfront fees integrates security posture management directly into the observability stack, making it uniquely positioned for teams running both traditional and AI-powered production workloads.
Cloud-native observability with AI correlation
Middleware is a cloud-native observability platform that provides real-time insights into Kubernetes environments using AI to correlate metrics, logs, and traces for faster troubleshooting. It simplifies the debugging of complex microservice clusters by automatically connecting related signals across distributed systems, with a freemium model accessible to teams of all sizes.
OpenTelemetry-native LLM observability instrumentation
OpenLLMetry by Traceloop is an open-source instrumentation library with 7,000+ GitHub stars that adds OpenTelemetry-native tracing to LLM and AI agent applications. It captures detailed traces of model calls including latency, token usage, costs, and error rates, exporting data to any OpenTelemetry-compatible backend like Grafana, Datadog, or Jaeger for vendor-neutral AI observability.
Data and AI observability for production pipelines
Monte Carlo is the leading data observability platform that recently expanded into AI observability, providing end-to-end lineage tracking from data tables to LLM outputs. It detects data downtime including freshness issues, volume anomalies, schema changes, and distribution drift across data warehouses, lakes, and AI pipelines, helping teams ensure that the data powering their AI applications is reliable and trustworthy.
Open-source ML and LLM monitoring with 100+ metrics
Evidently AI is an open-source platform with 100+ pre-built metrics for monitoring data quality, model performance, and data drift in AI/ML pipelines. Available under Apache 2.0 with a cloud version, it helps teams detect when production data shifts away from training distributions, LLM output quality degrades, or feature pipelines introduce anomalies that silently degrade model accuracy.
Lightweight eval library for LLM applications
OpenEvals is a lightweight evaluation library from the LangChain team for testing LLM application quality using LLM-as-judge patterns. It provides pre-built prompt sets and evaluation functions that score model outputs against criteria like accuracy, relevance, coherence, and safety without requiring complex infrastructure. Available as both Python and JavaScript packages, OpenEvals complements OpenAI Evals with a simpler, framework-agnostic approach to quality measurement in agentic workflows.
Engineering intelligence for DORA metrics and workflow automation
LinearB is a software engineering intelligence platform trusted by over 3,000 engineering leaders to track DORA metrics, cycle time broken into four phases (coding, pickup, review, deploy), and developer workflow patterns. It connects to Git repos and project management tools, benchmarks team performance against 8.1M+ pull requests from 4,800 organizations, and automates workflow improvements via gitStream — a policy-as-code engine for PR routing, labeling, and review automation.
Framework for evaluating LLM and agent performance
OpenAI Evals is an open-source framework and benchmark registry for evaluating LLM performance on custom tasks. It provides infrastructure for writing evaluation prompts, running them against models, and recording results in a structured format for comparison. The hosted Evals API on the OpenAI platform adds managed run tracking, dataset management, and programmatic access to evaluation pipelines. With 17,700+ GitHub stars, it serves as a foundation for systematic LLM quality measurement.
Observability and lifecycle management for AI agents
AgentOps is an observability platform for monitoring, debugging, and managing the lifecycle of AI agents. It provides session replays with time-travel debugging, detailed event timelines showing every tool call and LLM interaction, cost tracking per session, and anomaly detection for agents stuck in loops or making unexpected decisions. AgentOps integrates with CrewAI, LangChain, AutoGen, and other frameworks through a lightweight SDK that requires just two lines of code to instrument.
Agentic application security from prompt to cloud
Cycode is an AI-native application security platform that converges AST, SSCS, and ASPM into a single solution with the Maestro AI orchestrator managing multi-agent security workflows. It provides native SAST, SCA, secrets detection, IaC scanning, and container security alongside ConnectorX integration with 100+ third-party tools. Cycode's AI Exploitability Agent reduces false positives by 94%, and the Context Intelligence Graph maps risk across code, pipelines, and runtime environments.
AI-powered Kubernetes diagnostics in plain English
K8sGPT is a CNCF Sandbox project that scans Kubernetes clusters, diagnoses issues, and explains problems in plain English with actionable remediation steps. It codifies SRE expertise into built-in analyzers for Pods, Services, Deployments, Ingress, PVCs, CronJobs, and more. K8sGPT connects to AI backends including OpenAI, Azure OpenAI, Google Gemini, Amazon Bedrock, Cohere, and local models via Ollama, with data anonymization to protect sensitive cluster information.
LLM monitoring and quality assurance platform
LangWatch is an open-source LLM monitoring platform providing real-time observability, quality evaluation, and analytics for AI applications. Features trace-level debugging, automated quality checks with custom evaluators, user feedback tracking, cost analytics, and alerting on quality degradation. Supports topic clustering for understanding user intent patterns. Integrates with LangChain, LlamaIndex, OpenAI, and other frameworks via Python and TypeScript SDKs.