aicoolies logo
TensorZero logo

TensorZero

Open-source LLM gateway with built-in optimization and A/B testing

Share
open-sourceOpen Source
Visit Website →

TensorZero is an open-source LLMOps platform in Rust that unifies an LLM gateway, observability, prompt optimization, and A/B experimentation in a single binary. It routes requests across providers with sub-millisecond P99 latency at 10K+ QPS while capturing structured data for continuous improvement. Supports dynamic in-context learning, fine-tuning workflows, and production feedback loops. Backed by $7.3M seed funding, 11K+ GitHub stars.

TensorZero addresses a fundamental gap in the LLM tooling landscape: most gateway and observability tools treat inference as a stateless request-response cycle, missing the opportunity to learn from production data. TensorZero combines routing, observability, and optimization into a single Rust binary that processes LLM requests with sub-millisecond overhead while building a structured dataset of inputs, outputs, and feedback signals. This data flywheel enables continuous improvement through A/B testing of prompts, dynamic in-context learning that adapts examples based on recent successes, and automated fine-tuning pipelines that produce specialized models from production traffic.

The gateway layer supports all major model providers including OpenAI, Anthropic, Google, AWS Bedrock, and any OpenAI-compatible endpoint. Configuration is declarative through TOML files that define functions, variants, and routing strategies. Each function can have multiple variants — different prompts, models, or parameters — with traffic automatically split for experimentation. Built-in metrics track latency, cost, token usage, and custom evaluation scores, with all data stored in ClickHouse for high-performance analytics. The system handles structured outputs through JSON schemas, tool calling, and multi-step inference chains with full observability at each step.

TensorZero's Rust implementation delivers sub-millisecond P99 latency overhead at sustained throughput exceeding 10,000 queries per second, making it suitable for latency-sensitive production deployments. The platform manages approximately one percent of global LLM API spend and is used by Fortune 10 companies. With over 11,000 GitHub stars, $7.3M in seed funding, and a trajectory that briefly made it the top trending repository on GitHub, TensorZero represents the emerging category of LLM platforms that close the loop between inference and optimization.

Pricing

Free self-hosted (Apache-2.0); TensorZero Cloud coming soon

Platforms

Self-hosted Rust binary — Docker, any platform with ClickHouse

Categories

Tags

Use Cases

Alternatives

LiteLLM logo

LiteLLM

Unified API proxy for 100+ LLMs

Drop-in OpenAI-compatible proxy supporting 100+ LLM providers with load balancing, spend tracking, rate limiting, and fallback routing. Acts as a unified gateway for all your AI model calls, letting teams switch between providers, enforce budgets, and add reliability layers without changing application code. Essential infrastructure for multi-model AI architectures.

open-sourceOpen Source
Portkey logo

Portkey

AI gateway with observability, routing, and guardrails

Portkey is an AI gateway and observability platform providing a unified API for 200+ LLM providers with intelligent routing, caching, rate limiting, and guardrails. Route requests across OpenAI, Anthropic, Google, and more with automatic failover, load balancing, and cost optimization. Features request logging, prompt management, evaluation tools, and real-time monitoring. The open-source gateway can be self-hosted; Portkey Cloud adds managed observability and team features.

freemiumOpen Source
Helicone logo

Helicone

Open-source LLM observability through a single-line proxy

Helicone is an open-source LLM observability and AI gateway platform with proxy-based request logging, cost tracking, latency monitoring, caching, rate limits, user analytics, prompt tools, and HQL. It supports OpenAI, Anthropic, Azure, LiteLLM, Anyscale, Together AI, and OpenRouter integrations, and now presents itself as part of Mintlify while continuing managed and self-hosted gateway/observability workflows.

freemiumOpen Source
OpenRouter logo

OpenRouter

Unified API gateway for 200+ AI models

Unified API gateway providing access to 500+ AI models from leading providers through a single OpenAI-compatible interface. OpenRouter eliminates the need to manage separate keys, billing, and integrations across providers like OpenAI, Anthropic, Google, and Meta, with built-in plugins for web search, PDF processing, automatic fallback routing, and per-model cost tracking.

api-usage-based

Related Tools

Latitude

Sentry-style observability for AI agent conversations

Latitude is an agent observability platform for teams that need to inspect LLM traces, conversations, issues, and evaluation feedback in one workflow. Its public repo and docs position it as a Sentry-style monitor for AI agents, with semantic search, issue detection, annotations, MCP-assisted fixes, and cloud or self-hosted deployment paths for production debugging.

freemiumOpen SourceTelemetry

Spotlight by Backplanes

Session reports for Claude Code and Codex runs

Spotlight by Backplanes turns completed Claude Code and Codex sessions into concise reports for engineering, security, and spend review. The CLI installs on macOS, Linux, or WSL 2, watches sessions after they finish, redacts PII and credentials locally before upload, then summarizes files touched, commands run, external domains reached, scope drift, risky actions, and next-session improvements.

freemiumTelemetry
Traceway logo

Traceway

OpenTelemetry-native observability with AI tracing, logs, traces, metrics, and session replay — self-hosted in 90 seconds.

Traceway is an open-source, OpenTelemetry-native observability platform that combines logs, traces, metrics, exceptions, session replay, and AI tracing in a single self-hosted system. MIT licensed with no open-core restrictions, it deploys in 90 seconds via Docker Compose and accepts OTLP/HTTP from any OTel SDK without a Collector or per-language vendor SDK.

open-sourceOpen Source
Judgeval logo

Judgeval

Open-source post-building layer for agents — tracing, evals, and online monitoring

Judgeval is the open-source post-building layer for AI agents from Judgment Labs, providing OpenTelemetry-based tracing, hosted and custom evaluation scorers, and online behavior monitoring for LLM-powered applications. Instrument any function with a single decorator, score live production traffic against faithfulness and instruction-adherence checks, and feed real-world failures back into reinforcement learning or supervised fine-tuning loops.

open-sourceOpen Source
TraceRoot logo

TraceRoot

Open-source observability and self-healing layer for AI agents

TraceRoot is a YC S25-backed open-source observability platform purpose-built for AI agents and LLM apps. It combines OpenTelemetry-compatible tracing with an agentic debugging runtime that reads your source code, correlates failures with recent commits, and proposes fix PRs automatically. BYOK support spans seven LLM providers; the entire stack runs self-hosted via Docker Compose, with TraceRoot Cloud available for managed deployments.

open-sourceOpen Source
OpenSRE logo

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.

open-sourceOpen Source