aicoolies logo
Airbyte logo

Airbyte

Open-source ELT platform with 350+ data connectors

Share
freemiumOpen Source
Visit Website →

Airbyte is an open-source ELT platform with 350+ pre-built connectors for syncing data from any source to warehouses, lakes, and AI pipelines. It handles incremental syncs, schema evolution, and change data capture with a connector builder for custom integrations. Used by DoorDash, Replit, and thousands of data teams. Over 15,000 GitHub stars and $150M+ in funding.

Airbyte has established itself as the open-source standard for data movement, providing the connective tissue between operational databases, SaaS applications, and the analytical systems where data creates value. Its catalog of over 350 connectors covers popular sources like PostgreSQL, MySQL, Salesforce, Stripe, HubSpot, Google Analytics, and Slack, along with destinations including Snowflake, BigQuery, Databricks, Redshift, and Pinecone. Each connector handles the complexities of API pagination, rate limiting, authentication, and schema mapping, so data engineers can configure reliable pipelines through a visual UI or Terraform provider without writing custom extraction code.

The platform supports multiple sync modes including full refresh, incremental append, and incremental deduplication with change data capture for database sources. Schema evolution is handled automatically — when source schemas change, Airbyte propagates the changes downstream without manual intervention. For teams with unique data sources, the Connector Builder provides a low-code interface for creating custom connectors using a YAML-based specification, and the CDK (Connector Development Kit) supports Python and Java for full programmatic control. All connectors run in isolated Docker containers, ensuring that credential handling and data processing remain secure.

Airbyte's growing relevance to AI teams comes from its role as the data ingestion layer for RAG pipelines and vector databases. Dedicated destination connectors for Pinecone, Weaviate, Qdrant, and Chroma enable teams to sync structured and unstructured data directly into embedding stores for retrieval-augmented generation. The Airbyte Agent Connectors project extends this further by exposing data syncs as callable tools for LLM agents. Self-hosted via Docker Compose or Kubernetes with over 15,000 GitHub stars and more than $150M in venture funding, Airbyte bridges the gap between traditional data engineering and the emerging AI data stack.

Pricing

Free self-hosted (MIT/Elastic); Cloud usage-based pricing

Platforms

Docker Compose, Kubernetes — self-hosted or Airbyte Cloud

Categories

Tags

Use Cases

Alternatives

Related Tools

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
Freestyle logo

Freestyle

Sandboxes for coding agents — Linux VMs, Git, and deploys in one box

Freestyle is YC-backed sandbox infrastructure built for AI coding agents, shipping secure Linux VMs with nested virtualization, Git servers, and one-click web deploys. It lets agents run real workloads, branch repos, and deploy apps under short-lived identities while billing only for active compute. Used in production by vly.ai, Rork, and Vibeflow.

freemium
VectorChord logo

VectorChord

High-recall Postgres vector search at billion scale

VectorChord is a Postgres extension from the supervc-stack/VectorChord project that brings high-recall vector search to PostgreSQL. As the spiritual successor to pgvecto.rs, it combines IVF indexes with RaBitQ quantization to deliver Pinecone-class performance at billion-vector scale while keeping all data inside a single Postgres database — no separate vector store, no two-system sync, no rewrites when the workload grows.

open-sourceOpen Source
Infinity logo

Infinity

AI-native database for hybrid RAG retrieval

Infinity is an AI-native database from InfiniFlow that unifies dense vectors, sparse vectors, tensors, and full-text search in a single engine. Built for retrieval-augmented generation (RAG) at scale, it powers hybrid search workflows where lexical matching, semantic similarity, and reranking all happen against one storage layer instead of four loosely coupled services.

open-sourceOpen Source
OpenSRE logo

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.

open-sourceOpen Source
sqlite-vec logo

sqlite-vec

Vector search extension for SQLite that runs anywhere

sqlite-vec is a lightweight vector search extension for SQLite written in pure C with zero dependencies. It brings nearest-neighbor search capabilities directly into SQLite databases, enabling AI applications to store and query embeddings without running a separate vector database. The extension works everywhere SQLite runs including Linux, macOS, Windows, WebAssembly in browsers, and even Raspberry Pi devices. Sponsored by Mozilla Builders, Fly.io, and Turso.

freeOpen Source