--- title: Docling emoji: 🌍 colorFrom: purple colorTo: gray sdk: gradio sdk_version: 5.38.0 app_file: app.py pinned: false license: mit short_description: Converts your documents into machine-readable formats. --- Docling is an open-source tool designed to convert various document formats into structured, machine-readable formats. It supports a wide range of formats such as PDF, DOCX, HTML, PPTX, and more. By converting documents into formats like Markdown, HTML, JSON, Text, or Doctags, Docling enables seamless integration with Large Language Models (LLMs) and other machine learning systems for enhanced document processing and understanding. This functionality is perfect for automating data extraction, content analysis, and enabling efficient processing of textual information for AI-driven applications. This space serves as a **UI demo** for showcasing how **Docling** works and how it can convert documents into different formats for easy integration into LLMs and machine understanding. ![Docling Demo](https://huggingface.co/spaces/BluescarfAI/docling/raw/main/docling.png) ## Supported Input Formats - PDF - DOCX - PPTX - HTML - PNG, JPG, JPEG, TIFF (Image formats) - WAV, MP3 (Audio formats) ## Supported Output Formats - Markdown - HTML - JSON (Lossless serialization of Docling Document) - Text (Plain text, i.e., without Markdown markers) - Doctags (A JSON-like structure preserving the document's original format) ## How Docling Works 1. **Upload your document** in any supported format (PDF, DOCX, PPTX, images, or audio). 2. **Choose your desired output format** (Markdown, HTML, JSON, Text, or Doctags). 3. **Download your converted document** in the selected format for further machine processing or analysis. 👉 [Visit the official Docling GitHub repository here](https://github.com/DS4SD/docling) ## Example Use Cases - **Chunking**: Split the document into manageable chunks for further processing. - **Integration with LlamalIndex**: Organize documents in an indexable format for faster retrieval. - **Machine Understanding**: Convert documents into JSON or Doctags to make them easily consumable by AI systems. --- For more information, please visit the [official Docling GitHub repository](https://github.com/DS4SD/docling)