11 tools tagged
Showing 11 of 11 tools
Open-source financial data platform for quants, analysts, and AI agents
OpenBB is an open-source financial data platform that normalizes data from 100+ providers into a unified Python SDK, REST API, and Excel Add-in. It serves as the open-source alternative to Bloomberg Terminal for developers building fintech applications, quantitative research pipelines, and AI-powered financial analysis tools. With over 65,000 GitHub stars and SOC 2 Type II certification, it is one of the most popular open-source developer tools for financial data.
Git for data — version-controlled SQL database with branch, merge, and diff
Dolt is a SQL database that implements Git-style version control directly on your data. Every write creates a commit, and you can branch, merge, diff, and revert tables just like source code. It speaks the MySQL wire protocol so existing MySQL clients, ORMs, and tools work out of the box. Dolt is used for AI training data management, reproducible analytics, collaborative data editing, and agent memory stores.
Conversational data analysis with natural language queries over databases
PandasAI enables natural-language queries against databases, data lakes, CSVs, and parquet files using LLMs and RAG pipelines. With 23,400+ GitHub stars, it bridges the gap between database tools and AI by letting developers and analysts interact with data conversationally, supporting SQL, PostgreSQL, and various file formats.
Reactive Python notebooks that version with git and deploy as apps
Marimo is a reactive Python notebook environment with 20,000+ GitHub stars and $4M seed funding. Unlike Jupyter, marimo notebooks automatically update dependent cells when values change, version cleanly with git as pure Python files, and deploy directly as interactive web applications without conversion steps.
Open-source ELT platform with 350+ data connectors
Airbyte is an open-source ELT platform with 350+ pre-built connectors for syncing data from any source to warehouses, lakes, and AI pipelines. It handles incremental syncs, schema evolution, and change data capture with a connector builder for custom integrations. Used by DoorDash, Replit, and thousands of data teams. Over 15,000 GitHub stars and $150M+ in funding.
Data and AI observability for enterprise teams
Monte Carlo is the leading data and AI observability platform using ML to monitor pipelines, warehouses, and lakes for quality issues. It detects freshness delays, volume anomalies, schema changes, and distribution shifts before they impact analytics. With 500+ deployments at Nasdaq, Honeywell, and Roche, it provides automated root cause analysis, field-level lineage, and incident management. Available on AWS and Azure Marketplace.
Code-first agent framework for data analytics tasks
TaskWeaver is Microsoft's open-source code-first agent framework that converts natural language requests into executable Python code for data analytics and workflow automation. Unlike text-based agent frameworks, it preserves rich in-memory data structures like DataFrames across conversation turns, supports custom algorithm plugins as callable functions, and verifies generated code before execution. It includes a Planner for task decomposition and a Code Interpreter for generation and execution.
Declarative orchestration for data, AI, and infra
Kestra is an open-source orchestration platform that uses declarative YAML to define event-driven and scheduled workflows for data pipelines, infrastructure automation, and AI workloads. With over 1,200 plugins, it connects to databases, cloud services, APIs, and SaaS tools without custom glue code. Kestra reached version 1.0 LTS with agentic AI capabilities, SDKs for Python, TypeScript, Java, and Go, and SOC 2 compliance. Clients include Leroy Merlin, Huawei, Tencent, and Decathlon.
ETL for LLMs — preprocess any document format
Unstructured is an open-source ETL library that preprocesses and transforms documents from diverse formats into clean, structured data ready for LLM ingestion and RAG pipelines. It handles PDF, HTML, Word, PowerPoint, and many other file types through partitioning, cleaning, and chunking operations. The library offers connector-based architecture for integrating with various data sources and destinations, making it a key component in document processing workflows for AI applications.
Get your documents ready for gen AI
Docling is an open-source document processing toolkit by IBM Research that converts complex documents into structured formats optimized for generative AI applications. It parses PDF, DOCX, PPTX, XLSX, HTML, images, audio, and LaTeX with advanced PDF understanding including layout analysis, reading order detection, and table structure recognition. Docling exports to Markdown, HTML, JSON, and DocTags, and integrates natively with LangChain, LlamaIndex, and other AI frameworks for RAG workflows.
Convert any file to Markdown for LLM pipelines
MarkItDown is a lightweight Python utility by Microsoft that converts files into clean Markdown optimized for LLM pipelines and text analysis. It supports PDF, Word, Excel, PowerPoint, HTML, images with OCR, audio with transcription, and text formats like CSV, JSON, and XML. The tool preserves document structure including headings, tables, lists, and links while keeping output token-efficient. It offers a CLI, a four-line Python API, Docker support, and a plugin architecture for extensions.