RAG-Anything builds on the popular LightRAG project to deliver an all-in-one multimodal Retrieval-Augmented Generation system. Traditional RAG pipelines handle text well but struggle with images, tables, equations, and diagrams embedded in real-world documents. RAG-Anything eliminates the need for multiple specialized extraction tools by parsing all content modalities through a unified pipeline that preserves cross-modal relationships and hierarchical document structure.
At its core, the framework constructs multimodal knowledge graphs that capture entities and relationships across text and visual content. When a document includes images or charts, the VLM-Enhanced Query mode feeds them directly into a vision-language model alongside the textual context, enabling answers that draw on both visual and semantic information. This multi-stage architecture supports context-aware processing where each content type receives format-appropriate extraction before merging into the unified graph.
Developed by the HKUDS research group at the University of Hong Kong, RAG-Anything is released under an open-source license and installable via pip. It integrates cleanly with existing LLM orchestration workflows and supports customizable retrieval strategies. The project has rapidly gained community traction, reflecting strong demand for RAG systems that move beyond text-only retrieval to handle the multimodal reality of modern enterprise documents.