Database platforms, management tools, ORMs, and migration utilities
Showing 24 of 93 tools
Cloud-native distributed vector search engine built for Kubernetes with automatic indexing and horizontal scaling.
Vald is a highly scalable distributed approximate nearest neighbor (ANN) vector search engine designed for cloud-native, Kubernetes-based architectures. Maintained by LY Corporation and listed in the CNCF Landscape, it uses the NGT algorithm (developed at Yahoo Japan), supports automatic incremental index backup, and handles billion-scale datasets across loosely coupled microservice components that scale horizontally via Helm.
Library for efficient similarity search and clustering of dense vectors at billion-scale.
FAISS is Meta AI Research's open-source library for efficient similarity search and clustering of dense vectors. It implements approximate nearest-neighbor algorithms designed to scale to billions of vectors, with optimized indexes that fit in RAM and GPU acceleration for the largest workloads. Engineering teams use FAISS as the retrieval primitive underneath custom RAG pipelines, recommendation systems, and large-scale embedding search infrastructure.
Header-only C++ implementation of HNSW for fast approximate nearest-neighbor search.
hnswlib is a header-only C++ library implementing the Hierarchical Navigable Small World (HNSW) graph algorithm for approximate nearest-neighbor search, with Python bindings and a tiny dependency footprint. Originally developed by the nmslib team, it has become the default HNSW implementation embedded inside many vector databases and search products. Engineers use it directly when they want HNSW retrieval without pulling in a heavyweight vector DB.
Embedding-first search and discovery engine for AI-powered product experiences.
Marqo is an open-source tensor search engine that combines embedding generation and vector search in a single API, removing the need to manage separate embedding pipelines and vector databases. Built for product discovery and multi-modal search, it lets teams index text, images, and structured data together, returning ranked results based on semantic similarity rather than keyword overlap.
High-recall Postgres vector search at billion scale
VectorChord is a Postgres extension from TensorChord that brings high-recall vector search to PostgreSQL. As the spiritual successor to pgvecto.rs, it combines IVF indexes with RaBitQ quantization to deliver Pinecone-class performance at billion-vector scale while keeping all data inside a single Postgres database — no separate vector store, no two-system sync, no rewrites when the workload grows.
AI-native database for hybrid RAG retrieval
Infinity is an AI-native database from InfiniFlow that unifies dense vectors, sparse vectors, tensors, and full-text search in a single engine. Built for retrieval-augmented generation (RAG) at scale, it powers hybrid search workflows where lexical matching, semantic similarity, and reranking all happen against one storage layer instead of four loosely coupled services.
Constrained generation that guarantees valid LLM outputs every time
Guidance is Microsoft's structured generation library that enforces output constraints directly within LLM decoding. It supports JSON schemas, regex patterns, grammars, and interleaved generation-and-control flow to guarantee valid outputs from any compatible model. Works with local models via llama.cpp, Transformers, and remote APIs including OpenAI and Anthropic. Eliminates retry loops and post-processing for structured data extraction.
Vector search extension for SQLite that runs anywhere
sqlite-vec is a lightweight vector search extension for SQLite written in pure C with zero dependencies. It brings nearest-neighbor search capabilities directly into SQLite databases, enabling AI applications to store and query embeddings without running a separate vector database. The extension works everywhere SQLite runs including Linux, macOS, Windows, WebAssembly in browsers, and even Raspberry Pi devices. Sponsored by Mozilla Builders, Fly.io, and Turso.
Enterprise RAG framework by Tencent
WeKnora is a Tencent-developed LLM-powered knowledge management and Q&A framework for enterprise document understanding and semantic retrieval. Supports 10+ document formats including PDF, Word, Excel, and images with seamless IM platform integration for WeCom, Feishu, Slack, and Telegram. Offers Quick Q&A mode using RAG pipelines and Intelligent Reasoning mode with ReACT agents for complex multi-step reasoning tasks across organizational knowledge bases.
Declarative multimodal AI data infrastructure
Pixeltable is a declarative data infrastructure for multimodal AI that stores video, audio, images, and documents as first-class column types. Define Python computed columns for inference and transformations, and Pixeltable auto-orchestrates execution with incremental updates. Built-in vector search eliminates the need for separate vector databases while supporting RAG and semantic search workflows.
Fast embeddable vector search engine
USearch is a high-performance vector search engine implementing HNSW algorithms for approximate nearest neighbor queries across C++, Python, JavaScript, Rust, Java, Go, and more. It supports user-defined distance metrics, memory-mapped persistence for datasets larger than RAM, and filtered search with predicates. Used by YugabyteDB and ScyllaDB as their production vector indexing backend.
Real-time analytics OLAP database
ClickHouse is an open-source column-oriented database built for real-time analytical queries on massive datasets. Its columnar storage with advanced compression and vectorized query execution using SIMD instructions deliver exceptional performance for aggregations and scans. It handles billions of rows per second, supports SQL with analytical extensions, and scales horizontally for petabyte-scale data warehousing and real-time dashboards.
Open-source MCP server for database access
MCP Toolbox for Databases is an open-source MCP server by Google that connects AI agents to databases through a managed control plane. It handles connection pooling, authentication, and tool distribution, letting developers integrate database tools in under 10 lines of code. Supports PostgreSQL, MySQL, BigQuery, AlloyDB, Snowflake, MongoDB, Redis, ClickHouse, Neo4j, and more with ready-to-use toolsets for Claude Code, Gemini CLI, and other MCP clients.
CI-friendly database documentation generator
tbls is an open-source database documentation tool that automatically generates schema documentation in Markdown, with built-in linting to enforce documentation standards and coverage metrics for tables and columns. It supports 13+ databases including PostgreSQL, MySQL, BigQuery, Snowflake, MongoDB, and ClickHouse. Designed for CI integration with GitHub Actions support, tbls runs schema diff detection and documentation enforcement as part of automated pipelines.
In-process vector database — the SQLite of vector DBs
Zvec is an open-source in-process vector database from Alibaba designed as the SQLite of vector search. It runs as an embedded library directly inside applications without requiring external servers, delivering 8,000+ QPS with high recall rates. Zvec supports dense and sparse embeddings, multi-vector queries, and combined semantic plus structured filtering. Built on Alibaba's proven Proxima engine, it provides a lightweight alternative to server-based vector databases for local AI workflows.
In-process analytical SQL database
DuckDB is a high-performance analytical database that runs as an in-process SQL OLAP engine. Unlike traditional client-server databases, DuckDB embeds directly within your application, similar to SQLite but optimized for analytical queries. It supports complex SQL including window functions, CTEs, and nested types while processing columnar data with vectorized execution. DuckDB reads Parquet, CSV, JSON, and Arrow formats natively and integrates with Python and R data science workflows.
Lightweight open-source browser-based database GUI
Outerbase Studio is a lightweight, open-source database GUI that runs directly in your browser. It supports PostgreSQL, MySQL, and SQLite with a modern interface featuring intelligent query editor with auto-completion, multi-query execution, advanced data editing with staging and preview, and high-performance table rendering for thousands of rows. Also available as an Electron desktop client for databases requiring specialized drivers.
Database CI/CD and DevSecOps platform at scale
Bytebase is an open-source database DevSecOps platform that automates schema migrations, enforces SQL standards across 200+ lint rules, and provides fine-grained access control with dynamic data masking. Teams use it for GitOps-based database change management, SQL review, and compliance across PostgreSQL, MySQL, MongoDB, Snowflake, Oracle, SQL Server, and 20+ other databases. Available as self-hosted Docker or Kubernetes deployment, or as a managed cloud service.
Instant database schema diagrams from a single query
ChartDB is an open-source database diagramming tool that converts SQL queries into instant visual entity-relationship diagrams. It supports PostgreSQL, MySQL, SQL Server, MariaDB, SQLite, CockroachDB, and ClickHouse, enabling database engineers to visualize schemas, plan migrations, and generate DDL scripts across different SQL dialects using AI assistance. Available as a free cloud app at chartdb.io or self-hosted via Docker deployment.
Open-source Airtable alternative with database power
NocoDB is a free, self-hostable open-source platform that turns any database into a smart spreadsheet interface. It offers grid, gallery, form, Kanban, and calendar views with support for rich field types including links, lookups, rollups, and formulas. NocoDB provides role-based access control, REST APIs, workflow automation, and integrations with services like Slack and Discord — making it a powerful Airtable alternative for teams who want full data ownership.
Instant GraphQL and REST APIs on any database
Hasura auto-generates real-time GraphQL and REST APIs directly from your database schema—PostgreSQL, MySQL, SQL Server, MongoDB, and more. It provides fine-grained row-level and column-level access control, event triggers on database changes, remote schema stitching, and real-time subscriptions out of the box. Available as a managed cloud service or self-hosted, Hasura eliminates weeks of boilerplate API development while maintaining full control over authorization logic.
Context retrieval layer for AI agents and RAG
Airweave is an open-source context retrieval platform that connects AI agents and RAG systems to 50+ apps and databases through a unified search interface. It continuously syncs data from sources like Notion, Slack, GitHub, and databases, making it searchable through LLM-friendly APIs. Airweave includes Python and TypeScript SDKs, MCP support, and a CLI for managing data connections.
Google Zanzibar-inspired authorization database
SpiceDB is an open-source authorization database inspired by Google's Zanzibar system, providing relationship-based access control (ReBAC) at scale. It defines permissions through a schema language that models relationships between users, resources, and roles, then evaluates authorization checks in single-digit milliseconds. Used by companies like Netflix and GitHub, SpiceDB handles millions of permission checks per second.
Full-text and vector search engine in under 2KB
Orama is a complete search engine and RAG pipeline that runs in browsers, servers, and edge environments in under 2KB. It provides full-text search, vector search, and hybrid search with built-in faceting, filters, geo-search, and typo tolerance. Orama requires no external dependencies and works entirely client-side for instant search experiences, or server-side with Node.js and Deno for larger datasets.