aicoolies logo
Evidently AI logo

Evidently AI

Open-source ML and LLM monitoring with 100+ metrics

Share
open-sourceOpen Source
Visit Website →

Evidently AI is an open-source platform with 100+ pre-built metrics for monitoring data quality, model performance, and data drift in AI/ML pipelines. Available under Apache 2.0 with a cloud version, it helps teams detect when production data shifts away from training distributions, LLM output quality degrades, or feature pipelines introduce anomalies that silently degrade model accuracy.

We have a review for this tool

A detailed review by the aicoolies team — click to read

Evidently AI provides a comprehensive monitoring toolkit for machine learning and LLM applications with over 100 pre-built metrics covering data quality, data drift, model performance, and target drift. Teams can detect when production data distributions shift away from training data, identify features that are degrading model accuracy, and monitor LLM output quality metrics like hallucination rates, response relevance, and toxicity scores.

The platform generates visual reports and dashboards that make complex statistical concepts accessible to engineering teams without requiring deep data science expertise. Monitoring can be configured as batch jobs for periodic analysis or real-time pipelines for continuous production monitoring. Custom metrics and test suites allow teams to define application-specific quality criteria that trigger alerts when thresholds are breached.

Evidently AI is open-source under the Apache 2.0 license with a strong GitHub presence and active community contributing new metric types and integrations. A cloud version is available for teams that prefer managed infrastructure. The platform integrates with popular MLOps tools including MLflow, Airflow, and Grafana, fitting into existing data infrastructure rather than requiring a complete stack replacement.

Pricing

Free open-source (Apache 2.0); cloud version available

Platforms

Python, MLflow, Airflow, Grafana, Docker

Categories

Tags

Use Cases

Alternatives

Related Tools

Latitude

Sentry-style observability for AI agent conversations

Latitude is an agent observability platform for teams that need to inspect LLM traces, conversations, issues, and evaluation feedback in one workflow. Its public repo and docs position it as a Sentry-style monitor for AI agents, with semantic search, issue detection, annotations, MCP-assisted fixes, and cloud or self-hosted deployment paths for production debugging.

freemiumOpen SourceTelemetry

Spotlight by Backplanes

Session reports for Claude Code and Codex runs

Spotlight by Backplanes turns completed Claude Code and Codex sessions into concise reports for engineering, security, and spend review. The CLI installs on macOS, Linux, or WSL 2, watches sessions after they finish, redacts PII and credentials locally before upload, then summarizes files touched, commands run, external domains reached, scope drift, risky actions, and next-session improvements.

freemiumTelemetry
Traceway logo

Traceway

OpenTelemetry-native observability with AI tracing, logs, traces, metrics, and session replay — self-hosted in 90 seconds.

Traceway is an open-source, OpenTelemetry-native observability platform that combines logs, traces, metrics, exceptions, session replay, and AI tracing in a single self-hosted system. MIT licensed with no open-core restrictions, it deploys in 90 seconds via Docker Compose and accepts OTLP/HTTP from any OTel SDK without a Collector or per-language vendor SDK.

open-sourceOpen Source
Judgeval logo

Judgeval

Open-source post-building layer for agents — tracing, evals, and online monitoring

Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.

open-sourceOpen Source
TraceRoot logo

TraceRoot

Open-source observability and self-healing layer for AI agents

TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.

open-sourceOpen Source
OpenSRE logo

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.

open-sourceOpen Source

Used in Stacks

Comparisons