# pdf

4 tools tagged

Showing 4 of 4 tools

OpenDataLoader PDF

AI-ready PDF parser with benchmark-leading accuracy

OpenDataLoader PDF is a high-performance parser that extracts structured, AI-ready data from PDFs with industry-leading 0.907 benchmark accuracy. Combines deterministic local processing with optional AI hybrid mode for complex layouts, OCR support across 80+ languages, formula extraction in LaTeX, chart descriptions, and built-in prompt injection filtering. Available as Python, Node.js, and Java SDKs for seamless RAG pipeline and data preparation integration.

freemiumOpen Source

Dolphin

ByteDance multimodal document image parser

Dolphin is ByteDance's multimodal document parsing model that handles intertwined text, tables, formulas, and figures in complex documents. Using a two-stage analyze-then-parse approach with a Swin Transformer vision encoder and MBart decoder, it performs layout analysis and parallel element parsing with heterogeneous anchor prompts. Dolphin-v2 adds document-type awareness for invoices, papers, and forms.

open-sourceOpen Source

Quarkdown

Programmable Markdown typesetting for docs, books, and slides

Quarkdown is a Turing-complete Markdown typesetting system that compiles a single source into print-ready books, academic papers, knowledge bases, or interactive presentations. It extends Markdown with a built-in scripting language featuring functions, variables, and a standard library for full document control. Supports HTML, PDF, and plain text output with live preview and real-time reloading during authoring.

free

Kreuzberg

Polyglot document intelligence framework with Rust core

Kreuzberg is a polyglot document intelligence framework with a high-performance Rust core that extracts text, metadata, images, and structured data from 91+ file formats. Available for Python, Ruby, Java, Go, PHP, C#, TypeScript, plus CLI, REST API, and MCP server. Features multiple OCR backends (Tesseract, EasyOCR, PaddleOCR), table extraction with structure preservation, and native async support.

open-sourceOpen Source