aicoolies logo
Sourcebot logo

Sourcebot

Self-hosted code understanding for humans and agents

Share
open-sourceOpen Source
Visit Website →

Sourcebot is a self-hosted code intelligence platform that helps developers and AI agents understand large codebases through intelligent search, navigation, and inline-cited answers. Deployed as a Docker container with MCP server support, it indexes thousands of repositories without source code leaving your infrastructure.

Sourcebot addresses one of the biggest bottlenecks in agentic coding: understanding large, complex codebases. While generic RAG solutions provide surface-level context, Sourcebot delivers deep code comprehension through intelligent search that understands code structure, relationships between modules, and cross-file dependencies. It provides detailed answers grounded with inline citations pointing to exact file locations and line numbers, giving both human developers and AI agents verifiable, trustworthy code context.

The platform deploys as a self-hosted Docker container, which is critical for organizations that cannot send their source code to external services. Once deployed, Sourcebot indexes repositories from GitHub, GitLab, Bitbucket, and local sources, creating a searchable knowledge layer that both human developers and AI coding agents can query. The MCP server integration means agents running in Claude Desktop, Cursor, or other MCP-compatible environments can directly query Sourcebot for code context during their workflows.

With 3,200+ GitHub stars and growing adoption among teams managing large codebases, Sourcebot fills an important gap in the agentic development stack. It serves as the 'knowledge layer' that sits between your repositories and your AI agents, ensuring that autonomous coding operations are grounded in accurate, up-to-date understanding of the actual codebase rather than stale training data or incomplete file snippets.

Pricing

Free and open-source, self-hosted

Platforms

Docker, MCP Server, Web UI, GitHub/GitLab/Bitbucket

Categories

Tags

Use Cases

Alternatives

Serena

LSP-powered semantic coding agent via MCP

Serena is a free, open-source coding agent toolkit that provides IDE-like semantic code retrieval and editing capabilities to any LLM via the Model Context Protocol (MCP). Built on Language Server Protocol (LSP) integration, it enables symbol-level navigation, cross-file refactoring, and relational code understanding across Python, TypeScript, Go, Rust, Java, PHP, and more — without requiring file-level reads or text-based search.

open-sourceOpen Source
Docling logo

Docling

Get your documents ready for gen AI

Docling is an open-source document processing toolkit by IBM Research that converts complex documents into structured formats optimized for generative AI applications. It parses PDF, DOCX, PPTX, XLSX, HTML, images, audio, and LaTeX with advanced PDF understanding including layout analysis, reading order detection, and table structure recognition. Docling exports to Markdown, HTML, JSON, and DocTags, and integrates natively with LangChain, LlamaIndex, and other AI frameworks for RAG workflows.

open-sourceOpen Source

MarkItDown

Convert any file to Markdown for LLM pipelines

MarkItDown is a lightweight Python utility by Microsoft that converts files into clean Markdown optimized for LLM pipelines and text analysis. It supports PDF, Word, Excel, PowerPoint, HTML, images with OCR, audio with transcription, and text formats like CSV, JSON, and XML. The tool preserves document structure including headings, tables, lists, and links while keeping output token-efficient. It offers a CLI, a four-line Python API, Docker support, and a plugin architecture for extensions.

open-sourceOpen Source

Related Tools

Hermes Agent logo

Hermes Agent

Top Pick

Open-source AI agent framework with persistent memory, reusable skills, tools, and messaging gateways

Hermes Agent is an open-source AI agent framework with persistent memory, reusable skills, 40+ tools, cron jobs, and messaging gateways.

open-sourceOpen Source

Safari MCP Server

Apple's Safari-native MCP server for web debugging agents

Safari MCP Server is Apple's safaridriver-based MCP server in Safari Technology Preview, giving compatible coding agents local access to Safari page content, console logs, network requests, screenshots, JavaScript evaluation, interactions, viewport controls, and accessibility/performance checks.

freeTelemetry

Headroom

Context compression for LLM apps and coding agents

Headroom is an Apache-2.0 context compression layer for LLM apps and coding agents. It compresses tool output, logs, files, RAG chunks, and agent history through a local library, proxy, wrapper, or MCP server, with retrieval hooks for bringing originals back when needed. Treat its savings numbers as Headroom-reported benchmarks, not independent aicoolies measurements.

open-sourceOpen SourceTelemetry

Codebase Memory MCP

Codebase knowledge graph MCP server for AI coding agents

Codebase Memory MCP is an MIT-licensed MCP server that turns a repository into a persistent code knowledge graph for AI coding agents. It gives Claude Code, Cursor, Codex-style agents, and other MCP clients structural queries for functions, classes, call chains, routes, and architecture, helping them explore large projects without repeatedly rereading files or relying only on broad search.

open-sourceOpen SourceTelemetry

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
BeeAI Framework logo

BeeAI Framework

Python and TypeScript framework for production multi-agent systems

BeeAI Framework is an Apache-2.0 toolkit for building production-ready AI agents and multi-agent systems in Python and TypeScript. Its docs cover agents, tools, RAG, memory, workflows, backend providers, serving, and A2A/MCP integration surfaces, making it a vendor-neutral option for teams comparing LangGraph, CrewAI, Mastra, and related agent runtimes.

open-sourceOpen SourceTelemetry