Best tools for Self-Hosted Deployment
Deploying and managing applications on self-hosted infrastructure
Showing 24 of 171 tools
Headroom
Context compression for LLM apps and coding agents
Headroom is an Apache-2.0 context compression layer for LLM apps and coding agents. It compresses tool output, logs, files, RAG chunks, and agent history through a local library, proxy, wrapper, or MCP server, with retrieval hooks for bringing originals back when needed. Treat its savings numbers as Headroom-reported benchmarks, not independent aicoolies measurements.
Codebase Memory MCP
Codebase knowledge graph MCP server for AI coding agents
Codebase Memory MCP is an MIT-licensed MCP server that turns a repository into a persistent code knowledge graph for AI coding agents. It gives Claude Code, Cursor, Codex-style agents, and other MCP clients structural queries for functions, classes, call chains, routes, and architecture, helping them explore large projects without repeatedly rereading files or relying only on broad search.
KubeAI
Kubernetes operator for serving AI inference workloads
KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.
BeeAI Framework
Python and TypeScript framework for production multi-agent systems
BeeAI Framework is an Apache-2.0 toolkit for building production-ready AI agents and multi-agent systems in Python and TypeScript. Its docs cover agents, tools, RAG, memory, workflows, backend providers, serving, and A2A/MCP integration surfaces, making it a vendor-neutral option for teams comparing LangGraph, CrewAI, Mastra, and related agent runtimes.
Supabase MCP
MCP server for connecting AI assistants to Supabase projects
Supabase MCP is Supabase's Apache-2.0 server for connecting AI assistants to Supabase projects. It can expose database, configuration, and project-management workflows to MCP clients such as Cursor, Claude, and Windsurf, while the official docs emphasize permission and security review before production use, SQL changes, or high-privilege database access.
Klavis AI
MCP integration platform for agent tool use at scale
Klavis AI is an Apache-2.0 MCP integration platform for teams connecting AI agents to external SaaS tools and APIs. The public repo and official docs position it as infrastructure for reliable tool access at scale, so it fits teams that want reusable MCP connectors without treating every integration as a one-off script or custom OAuth maintenance project.
Superserve
Open-source Firecracker sandboxes for long-running AI agents
Superserve is an open-source sandbox infrastructure layer for AI agents that need durable computers instead of short-lived shells. It runs isolated Firecracker microVMs, supports pause, resume, snapshot, fork, preview URLs, MCP connectivity, SDK/API control, Docker workloads, and self-hosting, while the hosted service adds pay-as-you-go agent sandboxes for teams.
pgvectorscale
DiskANN-powered vector search extension for PostgreSQL
pgvectorscale is an open-source PostgreSQL extension from Timescale that complements pgvector with DiskANN-based approximate vector search. It is useful for teams that want faster embedding retrieval while keeping vectors, filters, and application data inside the Postgres ecosystem instead of adopting a separate hosted vector database.
CLIProxyAPI
Self-hosted proxy API for routing AI CLI accounts into OpenAI-compatible endpoints
CLIProxyAPI is an open-source Go proxy server that wraps Gemini CLI, Claude Code, OpenAI Codex, Grok Build, and related CLI account flows behind OpenAI/Gemini/Claude-compatible API endpoints. Use it carefully: it can touch OAuth sessions, auth files, logs, and provider account policies, so production use needs credential and ToS review.
Freestyle
Sandboxes for coding agents — Linux VMs, Git, and deploys in one box
Freestyle is YC-backed sandbox infrastructure built for AI coding agents, shipping secure Linux VMs with nested virtualization, Git servers, and one-click web deploys. It lets agents run real workloads, branch repos, and deploy apps under short-lived identities while billing only for active compute. Used in production by vly.ai, Rork, and Vibeflow.
GraphBit
Rust-native multi-agent orchestration for production
GraphBit is a Rust-native, multi-agent orchestration framework built for production. It targets the gap between Python-first frameworks like LangGraph and the operational expectations of enterprise systems — predictable memory, low latency, deterministic concurrency, and the ability to embed an agent runtime in services that already run Rust without dragging in a Python interpreter.
VectorChord
High-recall Postgres vector search at billion scale
VectorChord is a Postgres extension from the supervc-stack/VectorChord project that brings high-recall vector search to PostgreSQL. As the spiritual successor to pgvecto.rs, it combines IVF indexes with RaBitQ quantization to deliver Pinecone-class performance at billion-vector scale while keeping all data inside a single Postgres database — no separate vector store, no two-system sync, no rewrites when the workload grows.
Infinity
AI-native database for hybrid RAG retrieval
Infinity is an AI-native database from InfiniFlow that unifies dense vectors, sparse vectors, tensors, and full-text search in a single engine. Built for retrieval-augmented generation (RAG) at scale, it powers hybrid search workflows where lexical matching, semantic similarity, and reranking all happen against one storage layer instead of four loosely coupled services.
OpenSRE
Open-source toolkit for building AI SRE incident response agents
OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.
Baseten
ML inference platform for production AI models
Baseten is the inference platform for deploying AI models at scale with dedicated and pre-optimized model APIs and performance-optimized infrastructure. Specializes in image generation, transcription, text-to-speech, LLM serving, embeddings, and compound AI workloads. Delivers 75% latency reduction with 415ms cold starts and 3000+ concurrent scaling. Available as managed cloud or self-hosted, trusted by Cursor, Notion, Descript, and Sourcegraph for production inference.
Elkeid
Kernel-space host intrusion detection system
Elkeid is ByteDance's open-source HIDS for hosts, containers, Kubernetes, and serverless workloads. Its kernel-level data collection via Kprobe hooks captures process lineage, privilege escalation attempts, file access patterns, and network connections with minimal overhead. Includes an Agent for telemetry, Detector for rule evaluation, Controller for policy management, and a Dashboard for alerts and investigation.
Meltano
Declarative code-first ELT data integration
Meltano is a declarative, code-first data integration engine with 500+ Singer connectors for building ELT pipelines. It replaces custom API integration code with configuration-driven pipeline definitions that live in version control alongside application code. Integrates with dbt for transformation, supports scheduling and monitoring through a unified CLI, and powers production pipelines at scale.
Concourse
Container-based CI/CD automation system
Concourse is an open-source CI/CD system built on composable primitives: resources for external artifacts, tasks for containerized work units, and jobs for orchestration. All pipelines are declarative YAML with version control, every task runs in an isolated container, and stateless workers enable horizontal scaling. Deployable via BOSH, Helm, Docker Compose, or standalone binary across any infrastructure.
Nexa SDK
Cross-platform on-device AI model runtime
Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.
Triton Inference Server
NVIDIA's optimized AI model serving platform
Triton Inference Server is NVIDIA's open-source inference serving platform that deploys AI models from TensorRT, PyTorch, ONNX, TensorFlow, OpenVINO, Python, and more across cloud, data center, and edge environments. It supports dynamic batching, model ensembles, concurrent model execution on GPUs and CPUs, and real-time, streaming, and batch inference patterns. Includes Model Analyzer for profiling and Model Navigator for automated optimization.
Logto
Open-source auth infrastructure for modern apps
Logto is an open-source authentication and authorization platform built on OIDC and OAuth 2.1, serving as an alternative to Auth0, Cognito, and Firebase Auth. It provides pre-built sign-in flows with customizable UI, social login, Google One Tap, MFA, enterprise SSO via SAML, and role-based access control. SDKs cover 30+ frameworks including React, Next.js, Vue, Flutter, Go, and Python, with multi-tenancy support for SaaS applications.
MinIO
High-performance S3-compatible object storage
MinIO is a high-performance, S3-compatible object storage server designed for AI, machine learning, and data-intensive workloads. Written in Go, it delivers industry-leading throughput for both read and write operations while maintaining full compatibility with the Amazon S3 API. MinIO includes an embedded web console for bucket management, a command-line client, and supports erasure coding, bitrot protection, and encryption at rest for enterprise-grade data durability.
agentOS
Lightweight OS for running AI agents in-process
agentOS is a portable open-source operating system for AI agents that delivers ~6ms cold starts at 32x lower cost than traditional sandboxes. Powered by WebAssembly and V8 isolates, it runs agents like Claude Code and Codex directly inside your process with granular permissions and host-managed tool access for S3, GitHub, and databases. Available as a simple npm package with no special infrastructure or vendor lock-in required.
Armeria
Versatile microservice framework for any protocol
Armeria is an open-source microservice framework from the creator of Netty at LINE Corporation that supports gRPC, Thrift, REST, and GraphQL on a single server and port. It provides built-in decorators for metrics, distributed tracing, load balancing, authentication, rate limiting, circuit breakers, and automatic retries. The framework integrates seamlessly with Spring Boot, Dropwizard, and Reactive Streams while serving automated API documentation with interactive request testing.