aicoolies logo
Higress logo

Higress

AI-native API gateway by Alibaba with MCP server hosting and LLM routing

Share
open-sourceOpen Source
Visit Website →

Higress is an open-source AI-native API gateway developed by Alibaba that combines traditional API management with LLM-specific capabilities like token-based rate limiting, model routing, prompt caching, and MCP server hosting. Built on Envoy and Istio, it provides enterprise-grade traffic management while natively understanding AI workload patterns including streaming responses, long-lived connections, and multi-model fallback chains.

Higress is an API gateway developed by Alibaba Cloud that bridges traditional API infrastructure and the emerging requirements of AI-native applications. While conventional gateways handle request routing, rate limiting, and authentication based on HTTP semantics, Higress extends these concepts to understand AI-specific traffic patterns. Token-based rate limiting controls costs by counting LLM tokens rather than raw requests. Model routing directs traffic to different LLM providers based on request characteristics. Prompt caching reduces latency and cost for repeated queries.

The gateway is built on the battle-tested Envoy proxy and Istio service mesh, inheriting their performance, reliability, and extensibility while adding an AI-aware control plane. A particularly distinctive feature is native MCP server hosting, which lets teams expose tools and data sources to AI agents through the Model Context Protocol directly from the gateway layer. This eliminates the need for separate MCP server infrastructure and centralizes agent-to-tool communication through existing API management workflows.

Higress powers production workloads at Alibaba Cloud, supporting their Tongyi Bailian and PAI AI platforms. The project has over 8,000 GitHub stars and is Apache 2.0 licensed. It represents a category of infrastructure that barely exists in Western developer tool directories — AI-native API gateways that understand the specific traffic patterns, cost models, and integration requirements of LLM-powered applications. Plugin support via WASM allows custom routing logic without gateway restarts.

Pricing

Free open-source; enterprise support via Alibaba Cloud

Platforms

Linux, Kubernetes, Docker

Categories

Tags

Use Cases

Alternatives

Related Tools

Hermes Agent logo

Hermes Agent

Top Pick

Open-source AI agent framework with persistent memory, reusable skills, tools, and messaging gateways

Hermes Agent is an open-source AI agent framework with persistent memory, reusable skills, 40+ tools, cron jobs, and messaging gateways.

open-sourceOpen Source

Safari MCP Server

Apple's Safari-native MCP server for web debugging agents

Safari MCP Server is Apple's safaridriver-based MCP server in Safari Technology Preview, giving compatible coding agents local access to Safari page content, console logs, network requests, screenshots, JavaScript evaluation, interactions, viewport controls, and accessibility/performance checks.

freeTelemetry

Headroom

Context compression for LLM apps and coding agents

Headroom is an Apache-2.0 context compression layer for LLM apps and coding agents. It compresses tool output, logs, files, RAG chunks, and agent history through a local library, proxy, wrapper, or MCP server, with retrieval hooks for bringing originals back when needed. Treat its savings numbers as Headroom-reported benchmarks, not independent aicoolies measurements.

open-sourceOpen SourceTelemetry

Codebase Memory MCP

Codebase knowledge graph MCP server for AI coding agents

Codebase Memory MCP is an MIT-licensed MCP server that turns a repository into a persistent code knowledge graph for AI coding agents. It gives Claude Code, Cursor, Codex-style agents, and other MCP clients structural queries for functions, classes, call chains, routes, and architecture, helping them explore large projects without repeatedly rereading files or relying only on broad search.

open-sourceOpen SourceTelemetry

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
BeeAI Framework logo

BeeAI Framework

Python and TypeScript framework for production multi-agent systems

BeeAI Framework is an Apache-2.0 toolkit for building production-ready AI agents and multi-agent systems in Python and TypeScript. Its docs cover agents, tools, RAG, memory, workflows, backend providers, serving, and A2A/MCP integration surfaces, making it a vendor-neutral option for teams comparing LangGraph, CrewAI, Mastra, and related agent runtimes.

open-sourceOpen SourceTelemetry