Higress is an API gateway developed by Alibaba Cloud that bridges traditional API infrastructure and the emerging requirements of AI-native applications. While conventional gateways handle request routing, rate limiting, and authentication based on HTTP semantics, Higress extends these concepts to understand AI-specific traffic patterns. Token-based rate limiting controls costs by counting LLM tokens rather than raw requests. Model routing directs traffic to different LLM providers based on request characteristics. Prompt caching reduces latency and cost for repeated queries.
The gateway is built on the battle-tested Envoy proxy and Istio service mesh, inheriting their performance, reliability, and extensibility while adding an AI-aware control plane. A particularly distinctive feature is native MCP server hosting, which lets teams expose tools and data sources to AI agents through the Model Context Protocol directly from the gateway layer. This eliminates the need for separate MCP server infrastructure and centralizes agent-to-tool communication through existing API management workflows.
Higress powers production workloads at Alibaba Cloud, supporting their Tongyi Bailian and PAI AI platforms. The project has over 8,000 GitHub stars and is Apache 2.0 licensed. It represents a category of infrastructure that barely exists in Western developer tool directories — AI-native API gateways that understand the specific traffic patterns, cost models, and integration requirements of LLM-powered applications. Plugin support via WASM allows custom routing logic without gateway restarts.