aicoolies logo
Keep logo

Keep

Open-source AIOps alert management platform

Share
open-sourceOpen Source
Visit Website →

Keep is an open-source AIOps platform that provides a single pane of glass for all alerts from monitoring tools like Datadog, PagerDuty, Grafana, and 50+ integrations. It uses AI to correlate, deduplicate, and enrich alerts, reducing noise and helping on-call teams focus on real incidents. Keep includes workflow automation, bidirectional sync with ticketing systems, and a modern web dashboard.

Keep addresses the alert fatigue problem that plagues engineering teams using multiple monitoring tools. Instead of checking Datadog, PagerDuty, Grafana, Sentry, and AWS CloudWatch separately, Keep aggregates alerts from 50+ sources into a unified timeline. Its AI engine automatically correlates related alerts, deduplicates redundant notifications, and enriches alert context by pulling relevant data from connected systems, reducing the volume of notifications that require human attention.

The platform includes a workflow engine that automates common incident response patterns — escalating alerts based on severity and time, creating tickets in Jira or Linear when certain conditions are met, sending notifications to Slack or Teams channels, and triggering runbook automations. Bidirectional sync ensures that actions taken in Keep are reflected in source monitoring tools and vice versa. The web dashboard provides filterable views, alert timelines, and analytics on alert patterns and MTTR.

With over 11,600 GitHub stars and active development, Keep has become the leading open-source alternative to enterprise AIOps platforms like BigPanda and Moogsoft. It deploys via Docker Compose or Kubernetes Helm charts and stores data in PostgreSQL. The platform is built with Python and React, extensible through a provider plugin system, and distributed under an MIT license. For teams drowning in alerts from multiple tools, Keep provides the consolidation layer that makes on-call manageable.

Pricing

Free and open source under MIT license

Platforms

Docker, Kubernetes — 50+ monitoring integrations

Categories

Tags

Use Cases

Alternatives

Related Tools

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source

Latitude

Sentry-style observability for AI agent conversations

Latitude is an agent observability platform for teams that need to inspect LLM traces, conversations, issues, and evaluation feedback in one workflow. Its public repo and docs position it as a Sentry-style monitor for AI agents, with semantic search, issue detection, annotations, MCP-assisted fixes, and cloud or self-hosted deployment paths for production debugging.

freemiumOpen SourceTelemetry

Spotlight by Backplanes

Session reports for Claude Code and Codex runs

Spotlight by Backplanes turns completed Claude Code and Codex sessions into concise reports for engineering, security, and spend review. The CLI installs on macOS, Linux, or WSL 2, watches sessions after they finish, redacts PII and credentials locally before upload, then summarizes files touched, commands run, external domains reached, scope drift, risky actions, and next-session improvements.

freemiumTelemetry
Traceway logo

Traceway

OpenTelemetry-native observability with AI tracing, logs, traces, metrics, and session replay — self-hosted in 90 seconds.

Traceway is an open-source, OpenTelemetry-native observability platform that combines logs, traces, metrics, exceptions, session replay, and AI tracing in a single self-hosted system. MIT licensed with no open-core restrictions, it deploys in 90 seconds via Docker Compose and accepts OTLP/HTTP from any OTel SDK without a Collector or per-language vendor SDK.

open-sourceOpen Source
Judgeval logo

Judgeval

Open-source post-building layer for agents — tracing, evals, and online monitoring

Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.

open-sourceOpen Source
TraceRoot logo

TraceRoot

Open-source observability and self-healing layer for AI agents

TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.

open-sourceOpen Source