A newer version of the Gradio SDK is available: 6.9.0
metadata
title: DocAI
emoji: π
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false
license: apache-2.0
π DocAI - Document Intelligence Demo
A Hugging Face Space showcasing document parsing and chart intelligence using Docling, Granite Vision, and Chart2CSV.
π― Features
- π Document Parsing - Extract text & figures from PDFs using Docling
- πΌοΈ Figure Detection - Automatically identify and extract figures from documents
- π¬ Chart Q&A - Ask questions about charts using Granite Vision
- π Chart-to-CSV - Convert charts to structured data (CSV tables)
π Pipeline
Upload PDF
β
Parse with Docling (extract text + figures)
β
Select a figure/chart
β
Ask questions OR Extract to CSV
π Getting Started
Local Testing
# Install dependencies
pip install -r requirements.txt
# Run app
python app.py
Then open http://localhost:7860
Deploy to HF Spaces
# Commit and push (already set up as HF Space)
git add -A
git commit -m "Update DocAI demo"
git push
π¦ Architecture
DocAI/
βββ app.py # Main Gradio interface
βββ requirements.txt # Dependencies
βββ src/
β βββ ui_state.py # State management & caching
β βββ pdf_io.py # PDF rendering (PyMuPDF)
β βββ docling_parse.py # Document parsing
β βββ crops.py # Figure extraction
β βββ infer_vision_qa.py # Granite Vision Q&A
β βββ infer_chart2csv.py # Chart2CSV extraction
β βββ utils.py # Utilities
βββ README.md
π€ Models Used
- Docling:
ibm-granite/granite-docling-258M- Document understanding - Granite Vision:
ibm-granite/granite-vision-3.3-2b- Image Q&A - Chart2CSV:
ibm-granite/granite-vision-3.3-2b-chart2csv-preview- Table extraction
βοΈ Configuration
HuggingFace Hub
The space configuration is defined in README.md (YAML frontmatter):
- SDK: Gradio 6.6.0
- App File:
app.py - License: Apache 2.0
Environment
No secrets required. All models are public on Hugging Face Hub.
π Usage Examples
Chart Q&A
"What is the trend in this chart?" β Granite Vision analyzes and responds
Chart2CSV
Upload a bar chart β Get structured CSV data with axis labels and values
β οΈ Known Limitations
- PDF parsing depends on Docling model accuracy
- Figure detection may miss complex multi-panel layouts
- Q&A quality depends on image clarity and model capabilities
- Large PDFs (100+ pages) may be slow on CPU
π License
Apache 2.0 - See LICENSE file
π Credits
Built with: