Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
title: Docling
emoji: 🌍
colorFrom: purple
colorTo: gray
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
short_description: Converts your documents into machine-readable formats.
Docling is an open-source tool designed to convert various document formats into structured, machine-readable formats. It supports a wide range of formats such as PDF, DOCX, HTML, PPTX, and more. By converting documents into formats like Markdown, HTML, JSON, Text, or Doctags, Docling enables seamless integration with Large Language Models (LLMs) and other machine learning systems for enhanced document processing and understanding. This functionality is perfect for automating data extraction, content analysis, and enabling efficient processing of textual information for AI-driven applications.
This space serves as a UI demo for showcasing how Docling works and how it can convert documents into different formats for easy integration into LLMs and machine understanding.
Supported Input Formats
- DOCX
- PPTX
- HTML
- PNG, JPG, JPEG, TIFF (Image formats)
- WAV, MP3 (Audio formats)
Supported Output Formats
- Markdown
- HTML
- JSON (Lossless serialization of Docling Document)
- Text (Plain text, i.e., without Markdown markers)
- Doctags (A JSON-like structure preserving the document's original format)
How Docling Works
- Upload your document in any supported format (PDF, DOCX, PPTX, images, or audio).
- Choose your desired output format (Markdown, HTML, JSON, Text, or Doctags).
- Download your converted document in the selected format for further machine processing or analysis.
👉 Visit the official Docling GitHub repository here
Example Use Cases
- Chunking: Split the document into manageable chunks for further processing.
- Integration with LlamalIndex: Organize documents in an indexable format for faster retrieval.
- Machine Understanding: Convert documents into JSON or Doctags to make them easily consumable by AI systems.
For more information, please visit the official Docling GitHub repository
