aicoolies logo

Pathway

Real-time ETL and RAG engine with Python API and Rust core

Share
open-sourceOpen Source
Visit Website →

Pathway is an open-source Python ETL framework with a high-performance Rust engine for stream processing, real-time analytics, and RAG pipelines. It handles both batch and streaming data in a unified API, enabling live-updating vector indexes, real-time document processing, and AI agent memory that refreshes as new data arrives. 63,000+ GitHub stars, used for production RAG systems where static vector databases create stale context. Apache 2.0 licensed with enterprise cloud options.

Pathway bridges the gap between data engineering and AI agent orchestration by providing a unified framework for real-time data processing. Traditional RAG pipelines suffer from stale context — documents are indexed once and queries hit outdated embeddings. Pathway solves this with streaming ETL that continuously processes incoming data, updates vector indexes in real-time, and serves fresh context to LLM applications without batch reprocessing jobs.

The architecture features a Python API for developer ergonomics backed by a Rust execution engine for performance. You write transformation logic in familiar Pandas-like syntax while Pathway handles parallelism, fault tolerance, and incremental computation under the hood. Built-in connectors support Kafka, S3, PostgreSQL, Google Drive, SharePoint, and dozens of other data sources. The LLM integration layer includes document parsers, embedding generators, and vector index maintenance — a complete real-time RAG stack in one framework.

With 63,000+ GitHub stars, Pathway is one of the most-starred AI data infrastructure projects. It is Apache 2.0 licensed with a managed cloud offering for production deployments. The project is particularly relevant for teams building AI agents that need continuously updated context — competitive intelligence monitoring, real-time document analysis, and live knowledge bases that evolve alongside the data they represent.

Pricing

Free open-source (Apache 2.0); Enterprise cloud available

Platforms

Python library with Rust engine; Docker, Cloud managed

Categories

Tags

Use Cases

Alternatives

Kestra logo

Kestra

Declarative orchestration for data, AI, and infra

Kestra is an open-source orchestration platform that uses declarative YAML to define event-driven and scheduled workflows for data pipelines, infrastructure automation, and AI workloads. With over 1,200 plugins, it connects to databases, cloud services, APIs, and SaaS tools without custom glue code. Kestra reached version 1.0 LTS with agentic AI capabilities, SDKs for Python, TypeScript, Java, and Go, and SOC 2 compliance. Clients include Leroy Merlin, Huawei, Tencent, and Decathlon.

freemiumOpen Source
LangFlow logo

LangFlow

Visual framework for building multi-agent AI apps

LangFlow is an open-source visual framework for building multi-agent AI apps with drag-and-drop. Built on LangChain, it lets developers compose chains, agents, and RAG pipelines by connecting modular components visually. Features real-time interaction, Python customization, one-click deployment, and export to LangChain code. Supports all major LLM providers, vector stores, and tools. With 146K+ GitHub stars, it bridges visual prototyping and production deployment.

open-sourceOpen Source
Mastra logo

Mastra

TypeScript AI agent framework

TypeScript-native framework for building AI agents and workflows with great developer experience. Provides primitives for agents with tool calling, RAG pipelines, workflow orchestration with branching/parallel steps, and integration connectors. First-class TypeScript support with type-safe tool definitions. Local dev server with playground UI for testing. Growing as a LangChain alternative for TypeScript developers building AI apps.

open-sourceOpen Source

Related Tools

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
Deep Lake logo

Deep Lake

AI data runtime for multimodal datasets and vector search

Deep Lake is an open-source AI data runtime from Activeloop for storing, versioning, and querying multimodal data and embeddings. It fits teams building RAG, training, evaluation, or dataset-heavy agent workflows that need a bridge between vector search, structured metadata, and large image, text, audio, or video collections.

open-sourceOpen Source
SeekDB logo

SeekDB

AI-native state store with hybrid vector and full-text search

SeekDB is an open-source AI-native state store from the OceanBase ecosystem that combines MySQL-compatible data access with hybrid vector and full-text retrieval. It targets agent and AI application teams that need embedded or server deployment, copy-on-write style sandboxes, and searchable state without gluing together several separate storage layers.

open-sourceOpen Source
Marqo logo

Marqo

Embedding-first search and discovery engine for AI-powered product experiences.

Marqo is an open-source tensor search engine that combines embedding generation and vector search in a single API, removing the need to manage separate embedding pipelines and vector databases. Built for product discovery and multi-modal search, it lets teams index text, images, and structured data together, returning ranked results based on semantic similarity rather than keyword overlap.

freemium
Freestyle logo

Freestyle

Sandboxes for coding agents — Linux VMs, Git, and deploys in one box

Freestyle is YC-backed sandbox infrastructure built for AI coding agents, shipping secure Linux VMs with nested virtualization, Git servers, and one-click web deploys. It lets agents run real workloads, branch repos, and deploy apps under short-lived identities while billing only for active compute. Used in production by vly.ai, Rork, and Vibeflow.

freemium
OpenSRE logo

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is Tracer Cloud’s open-source public-alpha Python toolkit for building AI SRE agents that investigate and respond to production incidents. It ships 60+ tools across observability, databases, incident management, communications, deployment and protocol integrations, plus simulation/evaluation workflows for benchmarking agent accuracy before live pager use.

open-sourceOpen Source

Used in Stacks