aicoolies logo
dlt logo

dlt

Python library for declarative data loading that LLMs can generate

Share
open-sourceOpen Source
Visit Website →

dlt (data load tool) is a Python library for building data pipelines with declarative, schema-aware loading that is simple enough for LLMs to generate correctly. It extracts data from APIs, databases, and files, normalizes nested structures, handles schema evolution, and loads into warehouses and lakes. Supports 30+ destinations including BigQuery, Snowflake, DuckDB, and PostgreSQL. Over 5,200 GitHub stars.

dlt simplifies data pipeline development by providing a Python-native approach to extracting, normalizing, and loading data that handles the tedious parts of data engineering automatically. Developers define data sources as Python generators that yield records, and dlt handles schema inference from the data shape, nested structure flattening into relational tables, incremental loading with state management, and type-appropriate loading into the destination warehouse or database.

The schema evolution capability is particularly valuable for pipelines consuming APIs where response shapes change over time. dlt detects new fields, changed types, and structural modifications, automatically evolving the destination schema to accommodate changes without pipeline failures. This resilience reduces the maintenance burden that makes data pipelines fragile in production environments.

With over 5,200 GitHub stars, dlt has positioned itself as the data loading library that LLMs can generate correctly due to its simple, declarative API. AI coding assistants produce working dlt pipelines more reliably than they generate equivalent code for complex ETL frameworks, making dlt a natural fit for AI-augmented data engineering. The library supports over 30 destinations including BigQuery, Snowflake, Redshift, DuckDB, PostgreSQL, and filesystem-based data lakes with Parquet and Delta Lake formats.

Pricing

Free and open-source under Apache 2.0

Platforms

Python, 30+ data destinations, any OS

Categories

Tags

Use Cases

Alternatives

Related Tools

VectorChord logo

VectorChord

High-recall Postgres vector search at billion scale

VectorChord is a Postgres extension from the supervc-stack/VectorChord project that brings high-recall vector search to PostgreSQL. As the spiritual successor to pgvecto.rs, it combines IVF indexes with RaBitQ quantization to deliver Pinecone-class performance at billion-vector scale while keeping all data inside a single Postgres database — no separate vector store, no two-system sync, no rewrites when the workload grows.

open-sourceOpen Source
Infinity logo

Infinity

AI-native database for hybrid RAG retrieval

Infinity is an AI-native database from InfiniFlow that unifies dense vectors, sparse vectors, tensors, and full-text search in a single engine. Built for retrieval-augmented generation (RAG) at scale, it powers hybrid search workflows where lexical matching, semantic similarity, and reranking all happen against one storage layer instead of four loosely coupled services.

open-sourceOpen Source
sqlite-vec logo

sqlite-vec

Vector search extension for SQLite that runs anywhere

sqlite-vec is a lightweight vector search extension for SQLite written in pure C with zero dependencies. It brings nearest-neighbor search capabilities directly into SQLite databases, enabling AI applications to store and query embeddings without running a separate vector database. The extension works everywhere SQLite runs including Linux, macOS, Windows, WebAssembly in browsers, and even Raspberry Pi devices. Sponsored by Mozilla Builders, Fly.io, and Turso.

freeOpen Source
WeKnora logo

WeKnora

Enterprise RAG framework by Tencent

WeKnora is a Tencent-developed LLM-powered knowledge management and Q&A framework for enterprise document understanding and semantic retrieval. Supports 10+ document formats including PDF, Word, Excel, and images with seamless IM platform integration for WeCom, Feishu, Slack, and Telegram. Offers Quick Q&A mode using RAG pipelines and Intelligent Reasoning mode with ReACT agents for complex multi-step reasoning tasks across organizational knowledge bases.

open-sourceOpen Source
Pixeltable logo

Pixeltable

Declarative multimodal AI data infrastructure

Pixeltable is a declarative data infrastructure for multimodal AI that stores video, audio, images, and documents as first-class column types. Define Python computed columns for inference and transformations, and Pixeltable auto-orchestrates execution with incremental updates. Built-in vector search eliminates the need for separate vector databases while supporting RAG and semantic search workflows.

open-sourceOpen Source
USearch logo

USearch

Fast embeddable vector search engine

USearch is a high-performance vector search engine implementing HNSW algorithms for approximate nearest neighbor queries across C++, Python, JavaScript, Rust, Java, Go, and more. It supports user-defined distance metrics, memory-mapped persistence for datasets larger than RAM, and filtered search with predicates. Used by YugabyteDB and ScyllaDB as their production vector indexing backend.

open-sourceOpen Source