aicoolies logo

Best tools for Self-Hosted Deployment

Deploying and managing applications on self-hosted infrastructure

Showing 24 of 171 tools

Headroom

Context compression for LLM apps and coding agents

Headroom is an Apache-2.0 context compression layer for LLM apps and coding agents. It compresses tool output, logs, files, RAG chunks, and agent history through a local library, proxy, wrapper, or MCP server, with retrieval hooks for bringing originals back when needed. Treat its savings numbers as Headroom-reported benchmarks, not independent aicoolies measurements.

open-sourceOpen SourceTelemetry

Codebase Memory MCP

Codebase knowledge graph MCP server for AI coding agents

Codebase Memory MCP is an MIT-licensed MCP server that turns a repository into a persistent code knowledge graph for AI coding agents. It gives Claude Code, Cursor, Codex-style agents, and other MCP clients structural queries for functions, classes, call chains, routes, and architecture, helping them explore large projects without repeatedly rereading files or relying only on broad search.

open-sourceOpen SourceTelemetry

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
BeeAI Framework logo

BeeAI Framework

Python and TypeScript framework for production multi-agent systems

BeeAI Framework is an Apache-2.0 toolkit for building production-ready AI agents and multi-agent systems in Python and TypeScript. Its docs cover agents, tools, RAG, memory, workflows, backend providers, serving, and A2A/MCP integration surfaces, making it a vendor-neutral option for teams comparing LangGraph, CrewAI, Mastra, and related agent runtimes.

open-sourceOpen SourceTelemetry

Supabase MCP

MCP server for connecting AI assistants to Supabase projects

Supabase MCP is Supabase's Apache-2.0 server for connecting AI assistants to Supabase projects. It can expose database, configuration, and project-management workflows to MCP clients such as Cursor, Claude, and Windsurf, while the official docs emphasize permission and security review before production use, SQL changes, or high-privilege database access.

open-sourceOpen SourceTelemetry
Klavis AI logo

Klavis AI

MCP integration platform for agent tool use at scale

Klavis AI is an Apache-2.0 MCP integration platform for teams connecting AI agents to external SaaS tools and APIs. The public repo and official docs position it as infrastructure for reliable tool access at scale, so it fits teams that want reusable MCP connectors without treating every integration as a one-off script or custom OAuth maintenance project.

open-sourceOpen SourceTelemetry
Superserve logo

Superserve

Open-source Firecracker sandboxes for long-running AI agents

Superserve is an open-source sandbox infrastructure layer for AI agents that need durable computers instead of short-lived shells. It runs isolated Firecracker microVMs, supports pause, resume, snapshot, fork, preview URLs, MCP connectivity, SDK/API control, Docker workloads, and self-hosting, while the hosted service adds pay-as-you-go agent sandboxes for teams.

open-sourceOpen Source

pgvectorscale

DiskANN-powered vector search extension for PostgreSQL

pgvectorscale is an open-source PostgreSQL extension from Timescale that complements pgvector with DiskANN-based approximate vector search. It is useful for teams that want faster embedding retrieval while keeping vectors, filters, and application data inside the Postgres ecosystem instead of adopting a separate hosted vector database.

open-sourceOpen Source

CLIProxyAPI

Self-hosted proxy API for routing AI CLI accounts into OpenAI-compatible endpoints

CLIProxyAPI is an open-source Go proxy server that wraps Gemini CLI, Claude Code, OpenAI Codex, Grok Build, and related CLI account flows behind OpenAI/Gemini/Claude-compatible API endpoints. Use it carefully: it can touch OAuth sessions, auth files, logs, and provider account policies, so production use needs credential and ToS review.

open-sourceOpen SourceTelemetry
Freestyle logo

Freestyle

Sandboxes for coding agents — Linux VMs, Git, and deploys in one box

Freestyle is YC-backed sandbox infrastructure built for AI coding agents, shipping secure Linux VMs with nested virtualization, Git servers, and one-click web deploys. It lets agents run real workloads, branch repos, and deploy apps under short-lived identities while billing only for active compute. Used in production by vly.ai, Rork, and Vibeflow.

freemium
GraphBit logo

GraphBit

Rust-native multi-agent orchestration for production

GraphBit is a Rust-native, multi-agent orchestration framework built for production. It targets the gap between Python-first frameworks like LangGraph and the operational expectations of enterprise systems — predictable memory, low latency, deterministic concurrency, and the ability to embed an agent runtime in services that already run Rust without dragging in a Python interpreter.

open-sourceOpen Source
VectorChord logo

VectorChord

High-recall Postgres vector search at billion scale

VectorChord is a Postgres extension from the supervc-stack/VectorChord project that brings high-recall vector search to PostgreSQL. As the spiritual successor to pgvecto.rs, it combines IVF indexes with RaBitQ quantization to deliver Pinecone-class performance at billion-vector scale while keeping all data inside a single Postgres database — no separate vector store, no two-system sync, no rewrites when the workload grows.

open-sourceOpen Source
Infinity logo

Infinity

AI-native database for hybrid RAG retrieval

Infinity is an AI-native database from InfiniFlow that unifies dense vectors, sparse vectors, tensors, and full-text search in a single engine. Built for retrieval-augmented generation (RAG) at scale, it powers hybrid search workflows where lexical matching, semantic similarity, and reranking all happen against one storage layer instead of four loosely coupled services.

open-sourceOpen Source
OpenSRE logo

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.

open-sourceOpen Source
Baseten logo

Baseten

ML inference platform for production AI models

Baseten is the inference platform for deploying AI models at scale with dedicated and pre-optimized model APIs and performance-optimized infrastructure. Specializes in image generation, transcription, text-to-speech, LLM serving, embeddings, and compound AI workloads. Delivers 75% latency reduction with 415ms cold starts and 3000+ concurrent scaling. Available as managed cloud or self-hosted, trusted by Cursor, Notion, Descript, and Sourcegraph for production inference.

api-usage-based
Elkeid logo

Elkeid

Kernel-space host intrusion detection system

Elkeid is ByteDance's open-source HIDS for hosts, containers, Kubernetes, and serverless workloads. Its kernel-level data collection via Kprobe hooks captures process lineage, privilege escalation attempts, file access patterns, and network connections with minimal overhead. Includes an Agent for telemetry, Detector for rule evaluation, Controller for policy management, and a Dashboard for alerts and investigation.

open-sourceOpen Source
Meltano logo

Meltano

Declarative code-first ELT data integration

Meltano is a declarative, code-first data integration engine with 500+ Singer connectors for building ELT pipelines. It replaces custom API integration code with configuration-driven pipeline definitions that live in version control alongside application code. Integrates with dbt for transformation, supports scheduling and monitoring through a unified CLI, and powers production pipelines at scale.

open-sourceOpen Source
Concourse logo

Concourse

Container-based CI/CD automation system

Concourse is an open-source CI/CD system built on composable primitives: resources for external artifacts, tasks for containerized work units, and jobs for orchestration. All pipelines are declarative YAML with version control, every task runs in an isolated container, and stateless workers enable horizontal scaling. Deployable via BOSH, Helm, Docker Compose, or standalone binary across any infrastructure.

open-sourceOpen Source
Nexa SDK logo

Nexa SDK

Cross-platform on-device AI model runtime

Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.

open-sourceOpen Source
Triton Inference Server logo

Triton Inference Server

NVIDIA's optimized AI model serving platform

Triton Inference Server is NVIDIA's open-source inference serving platform that deploys AI models from TensorRT, PyTorch, ONNX, TensorFlow, OpenVINO, Python, and more across cloud, data center, and edge environments. It supports dynamic batching, model ensembles, concurrent model execution on GPUs and CPUs, and real-time, streaming, and batch inference patterns. Includes Model Analyzer for profiling and Model Navigator for automated optimization.

open-sourceOpen Source
Logto logo

Logto

Open-source auth infrastructure for modern apps

Logto is an open-source authentication and authorization platform built on OIDC and OAuth 2.1, serving as an alternative to Auth0, Cognito, and Firebase Auth. It provides pre-built sign-in flows with customizable UI, social login, Google One Tap, MFA, enterprise SSO via SAML, and role-based access control. SDKs cover 30+ frameworks including React, Next.js, Vue, Flutter, Go, and Python, with multi-tenancy support for SaaS applications.

freemiumOpen Source
MinIO logo

MinIO

High-performance S3-compatible object storage

MinIO is a high-performance, S3-compatible object storage server designed for AI, machine learning, and data-intensive workloads. Written in Go, it delivers industry-leading throughput for both read and write operations while maintaining full compatibility with the Amazon S3 API. MinIO includes an embedded web console for bucket management, a command-line client, and supports erasure coding, bitrot protection, and encryption at rest for enterprise-grade data durability.

freemiumOpen Source
agentOS logo

agentOS

Lightweight OS for running AI agents in-process

agentOS is a portable open-source operating system for AI agents that delivers ~6ms cold starts at 32x lower cost than traditional sandboxes. Powered by WebAssembly and V8 isolates, it runs agents like Claude Code and Codex directly inside your process with granular permissions and host-managed tool access for S3, GitHub, and databases. Available as a simple npm package with no special infrastructure or vendor lock-in required.

open-sourceOpen Source
Armeria logo

Armeria

Versatile microservice framework for any protocol

Armeria is an open-source microservice framework from the creator of Netty at LINE Corporation that supports gRPC, Thrift, REST, and GraphQL on a single server and port. It provides built-in decorators for metrics, distributed tracing, load balancing, authentication, rate limiting, circuit breakers, and automatic retries. The framework integrates seamlessly with Spring Boot, Dropwizard, and Reactive Streams while serving automated API documentation with interactive request testing.

open-sourceOpen Source