aicoolies logo
Langfuse logo

Langfuse

Open-source LLM engineering platform for observability

Share
open-sourceOpen Source
Visit Website →

Langfuse is an open-source LLM engineering platform with 29K+ GitHub stars for tracing, evaluating, and monitoring AI applications. Acquired by ClickHouse, it provides detailed traces of LLM calls, prompt management with versioning, dataset-based evaluation, user feedback collection, and cost tracking. Framework-agnostic with native integrations for LangChain, LlamaIndex, OpenAI SDK, and Vercel AI SDK. Offers both self-hosted deployment and a managed cloud service.

We have a review for this tool

A detailed review by the aicoolies team — click to read

Langfuse is the most popular open-source LLM observability platform, providing tracing, evaluation, and monitoring for AI applications. With over 29,000+ GitHub stars and acquired by ClickHouse for its analytical capabilities, it has become a critical part of the LLM engineering stack.

The tracing system captures detailed information about every LLM call including inputs, outputs, latency, token usage, and costs. Complex agent workflows with multiple LLM calls are visualized as nested traces, making it easy to debug and optimize multi-step applications.

Prompt management with versioning allows teams to iterate on prompts, track changes, and deploy specific versions to production. Dataset-based evaluation enables systematic testing with custom metrics and LLM-as-judge evaluators. User feedback collection creates ground truth for continuous improvement.

Langfuse is framework-agnostic with native integrations for LangChain, LlamaIndex, OpenAI SDK, Vercel AI SDK, and more. Cost tracking aggregates LLM spending across models and provides breakdowns by user, feature, or time period.

Both self-hosted and managed cloud deployment are available. The open-source version can be deployed via Docker with full feature parity. The managed cloud offers additional convenience with automatic updates and scaling.

Pricing

Hobby free / Core from $29/mo / Pro from $199/mo

Platforms

Web, Self-hosted, Docker, Python, JS/TS SDK

Categories

Tags

Use Cases

Alternatives

Laminar logo

Laminar

Open-source observability for AI agents

Laminar is an open-source observability platform for AI agents providing tracing, evaluation, and analytics for LLM applications. It integrates with Vercel AI SDK, LangChain, OpenAI, and Anthropic with a single line of code. Features include OpenTelemetry-native SDKs, an extensible evaluation framework with CI/CD support, SQL access to traces and metrics, and a visual debugging timeline for agent reasoning and actions.

freemiumOpen Source
Weights & Biases logo

Weights & Biases

ML experiment tracking and model monitoring

Weights & Biases is an AI developer platform for experiment tracking, artifact and model lineage, model monitoring, and Weave-based LLM evaluation. It helps teams log runs, compare metrics, manage datasets and model artifacts, and collaborate through dashboards, reports, alerts, SSO/RBAC controls, and hosted or self-managed deployment options.

freemium
Braintrust logo

Braintrust

LLM evaluation and prompt engineering platform

Braintrust is an AI observability and evaluation platform for tracing LLM applications, building datasets, running prompt/model experiments, scoring outputs and turning production feedback into regression tests. It fits teams that need repeatable quality gates for AI releases rather than one-off prompt demos.

freemium
TraceRoot logo

TraceRoot

Open-source observability and self-healing layer for AI agents

TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.

open-sourceOpen Source

Related Tools

Hermes Agent logo

Hermes Agent

Top Pick

Open-source AI agent framework with persistent memory, reusable skills, tools, and messaging gateways

Hermes Agent is an open-source AI agent framework with persistent memory, reusable skills, 40+ tools, cron jobs, and messaging gateways.

open-sourceOpen Source

Safari MCP Server

Apple's Safari-native MCP server for web debugging agents

Safari MCP Server is Apple's safaridriver-based MCP server in Safari Technology Preview, giving compatible coding agents local access to Safari page content, console logs, network requests, screenshots, JavaScript evaluation, interactions, viewport controls, and accessibility/performance checks.

freeTelemetry
BeeAI Framework logo

BeeAI Framework

Python and TypeScript framework for production multi-agent systems

BeeAI Framework is an Apache-2.0 toolkit for building production-ready AI agents and multi-agent systems in Python and TypeScript. Its docs cover agents, tools, RAG, memory, workflows, backend providers, serving, and A2A/MCP integration surfaces, making it a vendor-neutral option for teams comparing LangGraph, CrewAI, Mastra, and related agent runtimes.

open-sourceOpen SourceTelemetry
Superserve logo

Superserve

Open-source Firecracker sandboxes for long-running AI agents

Superserve is an open-source sandbox infrastructure layer for AI agents that need durable computers instead of short-lived shells. It runs isolated Firecracker microVMs, supports pause, resume, snapshot, fork, preview URLs, MCP connectivity, SDK/API control, Docker workloads, and self-hosting, while the hosted service adds pay-as-you-go agent sandboxes for teams.

open-sourceOpen Source

Latitude

Sentry-style observability for AI agent conversations

Latitude is an agent observability platform for teams that need to inspect LLM traces, conversations, issues, and evaluation feedback in one workflow. Its public repo and docs position it as a Sentry-style monitor for AI agents, with semantic search, issue detection, annotations, MCP-assisted fixes, and cloud or self-hosted deployment paths for production debugging.

freemiumOpen SourceTelemetry

Anthropic Agent Skills

Official Claude Agent Skills examples, spec, and plugin marketplace for reusable agent capabilities

Anthropic Agent Skills is Anthropic's official reference repo and Claude Code plugin marketplace for reusable Skill folders. It packages example SKILL.md workflows, document skills, a Claude API skill, templates, and the Agent Skills spec so teams can turn repeatable instructions, scripts, and resources into on-demand Claude capabilities instead of copying prompts across sessions.

freeTelemetry

Used in Stacks

Comparisons

Phoenix vs Langfuse — Arize AI Observability Platform vs Open-Source LLM Analytics

Phoenix and Langfuse both provide observability for LLM applications but approach the problem from different perspectives. Phoenix by Arize focuses on OpenTelemetry-native tracing with built-in evaluation frameworks and experiment tracking for systematically improving AI quality. Langfuse provides lightweight prompt management, session tracking, and cost analytics through a developer-friendly dashboard with broader framework integrations.

Langfuse

OpenLIT vs Langfuse — OpenTelemetry-Native vs Purpose-Built LLM Observability

OpenLIT and Langfuse both provide tracing and evaluation for LLM applications but take architecturally different approaches. Langfuse offers a dedicated observability platform with its own purpose-built dashboard for AI-specific workflows. OpenLIT instruments LLM calls as standard OpenTelemetry spans, routing traces into whatever observability backend teams already operate — Grafana, Datadog, Jaeger, or any OTel-compatible system.

OpenLITLangfuse

Langfuse vs Portkey — Open-Source LLM Observability vs AI Gateway and Routing Platform

Langfuse and Portkey address different layers of LLM operations. Langfuse is an open-source observability platform for tracing, evaluation, and prompt management of LLM applications. Portkey is an AI gateway that routes requests across 200+ providers with caching, fallbacks, load balancing, and cost tracking, adding monitoring on top of its core routing functionality.

LangfusePortkey

Traceloop vs Langfuse — OpenTelemetry-Native LLM Observability vs Dedicated Tracing Platform

Traceloop (OpenLLMetry) and Langfuse both provide LLM application observability, but through different architectural approaches. Traceloop extends the OpenTelemetry standard with LLM-specific instrumentation, sending data to any OTEL backend. Langfuse offers a dedicated tracing platform with prompt management and evaluation built in. This comparison helps teams choose between infrastructure integration and purpose-built LLM analytics.

TraceloopLangfuse

Langfuse vs Helicone — Open-Source LLM Tracing vs Lightweight Observability Proxy

Langfuse and Helicone are the two leading open-source LLM observability platforms, but they differ in architecture and depth. Langfuse provides comprehensive tracing with prompt management, evaluation, and dataset curation. Helicone operates as a lightweight proxy that requires zero code changes — just swap your API base URL. This comparison helps teams choose between deep observability and frictionless integration for their LLM applications.

LangfuseHelicone

Langfuse vs LangSmith — Open-Source vs Commercial LLM Observability Platforms Compared

Langfuse and LangSmith are the leading LLM observability platforms for monitoring, tracing, and evaluating AI applications in production. Langfuse is open-source and self-hostable with a generous free tier, supporting integrations across LangChain, LlamaIndex, OpenAI, and dozens of frameworks. LangSmith is LangChain's commercial platform with zero-config integration for the LangChain ecosystem. Both help developers understand what their LLM applications are doing — the choice depends on your stack and deployment requirements.

LangfuseLangSmith

Monte Carlo vs Langfuse vs Braintrust — AI Observability & Data Quality Platforms Compared

AI observability spans two distinct domains: monitoring the quality of data flowing into AI systems and monitoring the quality of AI outputs themselves. This comparison examines three platforms covering different parts of this spectrum: Monte Carlo as the enterprise leader in data observability that has expanded into AI monitoring, Langfuse as an open-source LLM engineering platform focused on tracing and evaluation, and Braintrust as a modern AI product quality platform with evaluation and prompt management.

LangfuseBraintrustMonte Carlo

OpenLLMetry vs Langfuse vs Helicone — Open-Source LLM Observability Platforms Compared

LLM observability has become a non-negotiable requirement for production AI applications in 2026. Teams need to trace prompts and completions, track token costs, debug latency issues, and evaluate output quality. This comparison examines three leading open-source approaches: OpenLLMetry as a vendor-neutral instrumentation layer built on OpenTelemetry standards, Langfuse as a full-featured LLM observability platform with evaluation workflows, and Helicone as a proxy-based solution optimized for instant setup and cost tracking.

OpenLLMetryLangfuseHelicone

LangSmith vs Langfuse vs Helicone — LLM Observability Platform Comparison

Three platforms for monitoring, debugging, and evaluating LLM applications in production. LangSmith is LangChain's integrated solution, Langfuse is the most popular open-source alternative acquired by ClickHouse, and Helicone offers the simplest setup through a single-line proxy integration.

LangSmithLangfuseHelicone