aicoolies logo

Traceway Review: The 90-Second Self-Hosted Observability Stack with LLM Tracing Built In

Traceway is a new MIT-licensed observability platform that bundles logs, traces, metrics, exceptions, session replay, and AI/LLM tracing into a single self-hosted stack that deploys in about ninety seconds. For teams running LLM-powered applications who want production-grade tracing without third-party SaaS, it removes the choice between cobbling Prometheus together and paying Datadog rates.

Reviewed by Raşit Akyol on May 15, 2026

Share
Overall
84
Speed
88
Privacy
92
Dev Experience
86

What Traceway Does

Traceway is an MIT-licensed observability platform that runs every signal a modern team needs — logs, distributed traces, metrics, exceptions, session replay, and AI/LLM tracing — through a single self-hosted stack. It is OpenTelemetry-native: you point any OTel SDK at the backend over OTLP/HTTP and data flows in without a Collector, without a proprietary vendor SDK, and without per-event metering. The deploy story is the differentiator. A single docker-compose up brings the entire stack online in roughly ninety seconds, which is closer to how developers expect modern infrastructure to behave than the typical Prometheus-Loki-Tempo-Grafana assembly job.

Architecture and the 90-Second Deploy Promise

Under the hood, Traceway is a deliberately minimal Go/Gin backend that fronts ClickHouse for telemetry storage and PostgreSQL for relational data. A SvelteKit dashboard handles the UI, and the entire system is shipped as Docker images that wire themselves together via compose. For teams that do not want to operate ClickHouse, there is an embedded SQLite mode that runs inside any Go application with zero external dependencies — useful for smaller deployments, local development, or single-tenant SaaS products that want observability without spinning up extra infrastructure.

The 90-second deploy claim holds up in practice. On a fresh VPS or laptop, the gap between cloning the repo and pointing your first OTel SDK at it is genuinely a couple of minutes, not the half-day of YAML wrangling that the equivalent open-source stack typically takes. That bar matters because the alternative for most teams is either paying Datadog rates or building their own stack from a half-dozen open-source pieces, and both are expensive in different ways.

AI and LLM Tracing as a First-Class Signal

Where Traceway pulls clearly ahead of general-purpose alternatives is its AI observability layer. It captures LLM cost, token counts, latency, and full conversation traces across providers, with native support for OpenRouter and any OTel-compatible AI gateway. For teams building LLM-powered applications, this means production-grade tracing without routing sensitive prompts through a third-party SaaS — the entire trail stays inside your own infrastructure, which is often the difference between a tool you can use on customer data and one you cannot.

This positions Traceway in the same conversation as Langfuse, Pydantic Logfire, and OpenLIT, but with a meaningfully wider scope. Those three are AI-observability tools that also do some general telemetry; Traceway is a general telemetry tool that treats AI tracing as a peer signal rather than a bolt-on. For a team running mixed workloads — traditional services plus a few LLM-powered features — that wider scope removes the need to operate two observability stacks side by side.

SDK Coverage and Alerting

Backend SDK coverage spans Go (Gin, Chi, Fiber, net/http), Node.js (NestJS, Hono, Cloudflare Workers), and PHP (Symfony). On the frontend, Next.js, React, Vue, and Svelte are supported with session replay included rather than as a separate $200/month add-on. Mobile coverage extends to Flutter, Android, and React Native. The list reads like a checklist for a typical modern stack, which is the right bar for a platform claiming to be a single-pane-of-glass replacement for assembling multiple tools.

Alerts route to Slack, GitHub, email, or arbitrary webhooks, and the dashboard ranks endpoints by Apdex score and an Impact-Score that surfaces what actually matters under load. The alerting model is deliberately straightforward — there is no PromQL-flavored rule engine to learn before you can get a useful page. For most teams that is the right tradeoff; for teams that already have a Prometheus-native alerting culture it may feel underspecified.

Pricing, Licensing, and Production Tradeoffs

The license model is the cleanest in the category. Traceway is MIT with no open-core split, meaning every feature in the GitHub repo is available in the self-hosted build, including session replay, AI tracing, and the full dashboard. There is a managed Traceway Cloud option for teams that prefer not to operate their own infrastructure, but it runs the same MIT codebase rather than a proprietary fork — pricing is the only meaningful difference, not capability. The HN reception in May 2026 (167 points, 69 comments) reflected exactly this clarity: most of the conversation was about the deploy speed and the no-strings license rather than about feature gates.

Production tradeoffs are real but bounded. ClickHouse and PostgreSQL still need to be operated, and at very high event volumes the ClickHouse layer becomes the operational center of gravity. The 29-release shipping cadence and active Discord community suggest the project is well past the experiment phase, but it is not yet at the scale of Grafana's ecosystem or Datadog's enterprise tooling. For teams that want a single Apache-or-similar-licensed observability stack with AI tracing built in and a deploy that takes minutes rather than days, the gap is acceptable.

The Bottom Line

Traceway hits a specific and underserved niche: a modern, MIT-licensed, OpenTelemetry-native observability platform that treats AI tracing as a peer signal and deploys in roughly the time it takes to make coffee. For self-hosted-first teams running LLM-powered applications, it removes the choice between cobbling Prometheus and friends together or paying Datadog rates. The community is still smaller than the incumbents and the team is still small, but the architectural decisions and the licensing clarity make it the most interesting open-source observability project to land in 2026 so far.

Pros

  • MIT license with no open-core split — session replay, AI tracing, and the full dashboard are all in the free build
  • OpenTelemetry-native ingest over OTLP/HTTP with no Collector, no proprietary SDK, and no per-event metering
  • 90-second Docker Compose deploy genuinely works — closer to modern infra ergonomics than the typical Prometheus-Loki-Tempo assembly
  • AI observability captures LLM cost, tokens, latency, and full conversation traces inside your own infrastructure
  • Embedded SQLite mode runs inside any Go app with zero external dependencies — useful for small or single-tenant deployments
  • Broad SDK coverage across Go, Node.js, PHP, Next.js, React, Vue, Svelte, Flutter, Android, and React Native
  • Managed Traceway Cloud runs the same MIT codebase rather than a proprietary fork — no capability gap to migrate later

Cons

  • ClickHouse remains the operational center of gravity at high event volumes — you still need to know how to operate it
  • Alerting is deliberately simple — no PromQL-flavored rule engine, which may feel underspecified for Prometheus-native teams
  • Ecosystem and community are still small compared to Grafana, Datadog, or the broader OpenTelemetry vendor field
  • AI gateway support is currently focused on OpenRouter and OTel-compatible providers — direct integrations for major SaaS LLM vendors are still maturing

Verdict

Traceway is the most interesting open-source observability project to land in 2026. The MIT license with no open-core split, the OpenTelemetry-native ingest, and the 90-second deploy together hit a specific niche that the incumbents have left underserved. Pair it with ClickHouse comfort and an LLM-heavy workload, and it is the strongest single-tool choice in the category.

View Traceway on aicoolies

Pricing, platforms, and community stacks — explore the full tool page

Alternatives to Traceway

OpenLIT logo

OpenLIT

OpenTelemetry-native observability for LLM applications with evals and GPU monitoring

OpenLIT is an open-source AI engineering platform that provides OpenTelemetry-native observability for LLM applications. It combines distributed tracing, evaluation, prompt management, a secrets vault, and GPU telemetry in a single self-hostable stack. With 50+ integrations across LLM providers and frameworks, it lets teams monitor AI applications using their existing observability backends like Grafana, Datadog, or Jaeger.

open-sourceOpen Source
OpenObserve logo

OpenObserve

All-in-one open-source observability — logs, metrics, traces, RUM

OpenObserve is an open-source observability platform that unifies logs, metrics, traces, and real user monitoring in a single binary. It claims 140x lower storage costs than Elasticsearch through columnar storage and compression, with native OpenTelemetry support, a built-in query UI, dashboards, and alerts. Designed for AI and cloud-native workloads at petabyte scale. Over 15,000 GitHub stars.

open-sourceOpen Source
Pydantic Logfire logo

Pydantic Logfire

Observability platform purpose-built for Python and Pydantic AI apps

Pydantic Logfire is an observability platform built by the Pydantic team specifically for Python AI applications. It provides structured logging, distributed tracing, and metrics with native understanding of Pydantic models, FastAPI, and AI framework data types. Auto-instruments OpenAI, Anthropic, LangChain, and other LLM providers. Built on OpenTelemetry for vendor-neutral data export. Offers a managed cloud dashboard with a generous free tier for development and small-scale production use.

freemium