Docling is an open-source toolkit developed by IBM Research Zurich and now hosted under the Linux Foundation's AI and Data Foundation. It streamlines the process of converting unstructured documents into structured, machine-readable formats that large language models and foundation models can easily digest. With over 56,000 GitHub stars and more than 100 releases, Docling has become one of the most popular open-source document intelligence tools, praised by developers for its output quality compared to other solutions.
The toolkit provides advanced PDF understanding that goes beyond simple OCR, using computer vision models to recognize and categorize visual elements on a page including page layout, reading order, table structure, code blocks, formulas, and image classification. It supports a wide range of input formats including PDF, DOCX, PPTX, XLSX, HTML, images, audio files, LaTeX, and plain text. Output can be exported as Markdown, HTML, JSON, WebVTT, or the proprietary DocTags format designed for maximum LLM readability. The companion Granite-Docling vision-language model provides end-to-end document conversion in a single pass at just 258 million parameters.
Docling features a command-line interface, a Python API, and is lightweight enough to run on a standard laptop including Apple Silicon acceleration via MLX. It integrates seamlessly with LangChain, LlamaIndex, and other popular AI frameworks for retrieval-augmented generation and question-answering applications. For enterprise deployments, the Docling OpenShift Operator enables large-scale document ingestion on Kubernetes clusters with Ray Data for distributed processing. An MCP server component allows AI agents to use Docling's conversion capabilities directly within agentic workflows.