Vector Databases

Purpose-built vector databases and extensions for similarity search, embeddings storage, and AI/ML retrieval.

Showing 24 of 24 tools

Deep Lake

AI data runtime for multimodal datasets and vector search

Deep Lake is an open-source AI data runtime from Activeloop for storing, versioning, and querying multimodal data and embeddings. It fits teams building RAG, training, evaluation, or dataset-heavy agent workflows that need a bridge between vector search, structured metadata, and large image, text, audio, or video collections.

open-sourceOpen Source

SeekDB

AI-native state store with hybrid vector and full-text search

SeekDB is an open-source AI-native state store from the OceanBase ecosystem that combines MySQL-compatible data access with hybrid vector and full-text retrieval. It targets agent and AI application teams that need embedded or server deployment, copy-on-write style sandboxes, and searchable state without gluing together several separate storage layers.

open-sourceOpen Source

pgvectorscale

DiskANN-powered vector search extension for PostgreSQL

pgvectorscale is an open-source PostgreSQL extension from Timescale that complements pgvector with DiskANN-based approximate vector search. It is useful for teams that want faster embedding retrieval while keeping vectors, filters, and application data inside the Postgres ecosystem instead of adopting a separate hosted vector database.

open-sourceOpen Source

Vald

Cloud-native distributed vector search engine built for Kubernetes with automatic indexing and horizontal scaling.

Vald is a highly scalable distributed approximate nearest neighbor (ANN) vector search engine designed for cloud-native, Kubernetes-based architectures. Maintained by LY Corporation and listed in the CNCF Landscape, it uses the NGT algorithm (developed at Yahoo Japan), supports automatic incremental index backup, and handles billion-scale datasets across loosely coupled microservice components that scale horizontally via Helm.

open-sourceOpen Source

FAISS

Library for efficient similarity search and clustering of dense vectors at billion-scale.

FAISS is Meta AI Research's open-source library for efficient similarity search and clustering of dense vectors. It implements approximate nearest-neighbor algorithms designed to scale to billions of vectors, with optimized indexes that fit in RAM and GPU acceleration for the largest workloads. Engineering teams use FAISS as the retrieval primitive underneath custom RAG pipelines, recommendation systems, and large-scale embedding search infrastructure.

free

hnswlib

Header-only C++ implementation of HNSW for fast approximate nearest-neighbor search.

hnswlib is a header-only C++ library implementing the Hierarchical Navigable Small World (HNSW) graph algorithm for approximate nearest-neighbor search, with Python bindings and a tiny dependency footprint. Originally developed by the nmslib team, it has become the default HNSW implementation embedded inside many vector databases and search products. Engineers use it directly when they want HNSW retrieval without pulling in a heavyweight vector DB.

free

Marqo

Embedding-first search and discovery engine for AI-powered product experiences.

Marqo is an open-source tensor search engine that combines embedding generation and vector search in a single API, removing the need to manage separate embedding pipelines and vector databases. Built for product discovery and multi-modal search, it lets teams index text, images, and structured data together, returning ranked results based on semantic similarity rather than keyword overlap.

freemium

VectorChord

High-recall Postgres vector search at billion scale

VectorChord is a Postgres extension from the supervc-stack/VectorChord project that brings high-recall vector search to PostgreSQL. As the spiritual successor to pgvecto.rs, it combines IVF indexes with RaBitQ quantization to deliver Pinecone-class performance at billion-vector scale while keeping all data inside a single Postgres database — no separate vector store, no two-system sync, no rewrites when the workload grows.

open-sourceOpen Source

Infinity

AI-native database for hybrid RAG retrieval

Infinity is an AI-native database from InfiniFlow that unifies dense vectors, sparse vectors, tensors, and full-text search in a single engine. Built for retrieval-augmented generation (RAG) at scale, it powers hybrid search workflows where lexical matching, semantic similarity, and reranking all happen against one storage layer instead of four loosely coupled services.

open-sourceOpen Source

sqlite-vec

Vector search extension for SQLite that runs anywhere

sqlite-vec is a lightweight vector search extension for SQLite written in pure C with zero dependencies. It brings nearest-neighbor search capabilities directly into SQLite databases, enabling AI applications to store and query embeddings without running a separate vector database. The extension works everywhere SQLite runs including Linux, macOS, Windows, WebAssembly in browsers, and even Raspberry Pi devices. Sponsored by Mozilla Builders, Fly.io, and Turso.

freeOpen Source

Pixeltable

Declarative multimodal AI data infrastructure

Pixeltable is a declarative data infrastructure for multimodal AI that stores video, audio, images, and documents as first-class column types. Define Python computed columns for inference and transformations, and Pixeltable auto-orchestrates execution with incremental updates. Built-in vector search eliminates the need for separate vector databases while supporting RAG and semantic search workflows.

open-sourceOpen Source

USearch

Fast embeddable vector search engine

USearch is a high-performance vector search engine implementing HNSW algorithms for approximate nearest neighbor queries across C++, Python, JavaScript, Rust, Java, Go, and more. It supports user-defined distance metrics, memory-mapped persistence for datasets larger than RAM, and filtered search with predicates. Used by YugabyteDB and ScyllaDB as their production vector indexing backend.

open-sourceOpen Source

Zvec

In-process vector database — the SQLite of vector DBs

Zvec is an open-source in-process vector database from Alibaba designed as the SQLite of vector search. It runs as an embedded library directly inside applications without requiring external servers, delivering 8,000+ QPS with high recall rates. Zvec supports dense and sparse embeddings, multi-vector queries, and combined semantic plus structured filtering. Built on Alibaba's proven Proxima engine, it provides a lightweight alternative to server-based vector databases for local AI workflows.

open-sourceOpen Source

SurrealDB

Multi-model database for the AI era — document, graph, vector, and relational in one

SurrealDB is a multi-model database that natively combines document, graph, relational, key-value, and vector storage in a single engine. It eliminates the need for separate databases by handling structured queries, graph traversals, full-text search, and vector similarity in one SQL-like query language called SurrealQL. Built in Rust for performance and safety, it supports real-time subscriptions, row-level permissions, and embedded or distributed deployment modes.

open-sourceOpen Source

Ragie

Fully managed RAG-as-a-Service platform for enterprise AI applications

Ragie is a managed retrieval-augmented generation platform that handles document ingestion, indexing, and retrieval so developers can build grounded AI applications without managing vector databases or chunking pipelines. It connects to Google Drive, Notion, Slack, Confluence, and other enterprise data sources with simple APIs for hybrid search and entity extraction.

api-usage-based

turbopuffer

Serverless vector and full-text search on object storage

turbopuffer is a serverless vector and full-text search engine built on object storage and vendor-positioned as roughly 10x cheaper than traditional vector databases. Used by Anthropic, Cursor, Notion, and Atlassian for production search workloads. Official site reports 4T+ documents, 10M+ writes/s, and 25k+ queries/s in production systems. Funded by Thrive Capital.

paid

LanceDB

Embedded vector database for multimodal AI with petabyte scale

LanceDB is an open-source embedded vector database built on the Lance columnar format for multimodal AI. It delivers near in-memory performance from disk with zero-copy architecture, supporting vector search, full-text search, and SQL. Native SDKs for Python, TypeScript, and Rust integrate with LangChain, LlamaIndex, and DuckDB. Backed by a $30M Series A, used by Harvey AI and Runway, with 18,000+ GitHub stars.

open-sourceOpen Source

Vespa

Hybrid search and ML ranking engine at scale

Vespa is an open-source serving engine with 6K+ GitHub stars for hybrid search combining vector similarity, BM25 text ranking, and structured filtering in a single query. Built by Yahoo for web-scale, it handles billions of documents with millisecond latency. Features real-time indexing, ML model serving, tensor computation, and ACID-compliant writes. Supports custom ranking models, query federation, and geographic search. Used for recommendation systems, personalization, and RAG.

open-sourceOpen Source

Milvus

GPU-accelerated open-source vector database

Milvus is an open-source vector database with 33K+ GitHub stars for billion-scale similarity search. Features GPU-accelerated indexing, hybrid search combining vector and scalar filtering, multi-tenancy, partitioning, and horizontal scaling. Supports HNSW, IVF, DiskANN, and GPU index types. SDKs for Python, Java, Go, and Node.js. Zilliz Cloud offers a managed version. A production-grade foundation for RAG pipelines and recommendation systems at enterprise scale.

open-sourceOpen Source

pgvector

Vector similarity search for PostgreSQL

pgvector is an open-source PostgreSQL extension with 14K+ GitHub stars adding vector similarity search to your existing Postgres database. Store embeddings alongside relational data, perform exact and approximate nearest neighbor search using L2, inner product, cosine, and L1 metrics. Supports HNSW and IVFFlat indexes for fast similarity queries at scale. Eliminates the need for a separate vector database by bringing vector capabilities into existing PostgreSQL infrastructure.

open-sourceOpen Source

Chroma

Open-source embedding database — the AI-native way to store and query embeddings.

Chroma is an open-source embedding database designed for simplicity and developer experience. Runs in-memory, as a Python library, or as a client-server deployment. Popular for prototyping RAG applications, local development, and lightweight vector search. Integrates natively with LangChain, LlamaIndex, and OpenAI.

open-sourceOpen Source

Pinecone

Fully managed vector database built for AI applications at production scale.

Pinecone is a leading managed vector database designed for high-performance similarity search at scale. Purpose-built for AI applications including RAG, recommendation systems, and semantic search. Offers managed serverless infrastructure with automatic scaling, filtering, hybrid retrieval, and namespacing. No infrastructure management required.

freemium

Qdrant

High-performance vector database written in Rust for similarity search at scale.

Qdrant is a high-performance vector similarity search engine and database written in Rust. Designed for production-grade AI applications with advanced filtering, payload indexing, and distributed deployment. Supports billion-scale vector collections with sub-second query times. Popular choice for RAG, recommendation systems, and anomaly detection.

freemiumOpen Source

Weaviate

Open-source vector database for AI-native applications and semantic search.

Weaviate is an open-source vector database purpose-built for AI applications. Supports vector, keyword, and hybrid search with built-in vectorization modules for OpenAI, Cohere, Hugging Face, and more. Used for RAG pipelines, semantic search, recommendation engines, and multimodal search. Written in Go for high performance.

freemiumOpen Source