What This Stack Does
As AI applications move to production, managing LLM provider interactions becomes a significant operational concern. This stack provides the infrastructure layer for routing, fallback, cost optimization, and monitoring across all your LLM API calls.
Local Development and Unified Routing
Ollama provides local inference for development, testing, and privacy-sensitive workloads. Running models locally at zero marginal cost means you can iterate on prompts, run test suites, and develop without burning API credits.
LiteLLM serves as the API gateway that unifies 100+ LLM providers behind a single OpenAI-compatible interface. Route requests to different providers based on model capability, cost, or latency with automatic fallbacks.
Observability and Production Optimization
Langfuse provides the observability layer — tracing every LLM call with input/output, latency, token counts, and costs. Self-hostable and open-source, it gives you the monitoring data needed to optimize prompts and track quality.
Portkey adds production-grade gateway features: semantic caching, request retries, budget limits, and detailed analytics. For teams spending significantly on LLM APIs, Portkey's optimization features can reduce costs by 30-50%.
The Bottom Line
Together, these tools create a complete LLM infrastructure stack: local development with Ollama, production routing with LiteLLM, observability with Langfuse, and optimization with Portkey. Each component is independently replaceable.