Arize Phoenix is an open-source observability and evaluation tool specifically designed for LLM and ML applications. Part of the Arize AI ecosystem, Phoenix provides a self-hosted, pip-installable alternative to cloud observability platforms.
Key differentiators include 3D UMAP visualization for analyzing embedding distributions, detecting drift, and identifying clusters in production data. RAG-specific evaluations measure retrieval quality, relevance, and groundedness across different chunking and retrieval strategies.
LLM-as-judge scoring automates output quality assessment using configurable evaluation templates. Detailed trace inspection follows requests through multi-step agent workflows with latency, token usage, and cost breakdowns at each step.
OpenTelemetry-based instrumentation provides zero-code setup with auto-instrumentation for LangChain, LlamaIndex, OpenAI, Anthropic, and other frameworks. Phoenix runs locally with a simple pip install, making it the fastest way to add observability to LLM applications.