AhmedBou's picture
Update README.md
8564c6a verified
---
title: Smart PDF Chapter Splitter
emoji: πŸ“š
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '
---
# πŸ“š Smart PDF Chapter Splitter
Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β€” **fast, local, and deterministic**.
This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents.
---
## ✨ Features
- πŸ“– Splits PDFs into individual chapter files
- βš™οΈ Uses **embedded bookmarks** (no AI, no guesswork)
- πŸš€ Extremely fast (local processing)
- 🧼 Safe filenames (cross-platform)
- πŸ“‚ Batch-ready and automation-friendly
---
## 🧠 How It Works
Most modern PDFs contain an internal **Table of Contents (bookmarks)**.
This Space:
1. Reads the PDF outline
2. Identifies top-level chapters
3. Calculates page ranges
4. Exports each chapter as its own PDF
> βœ… Deterministic
> ❌ No OCR
> ❌ No AI hallucinations
---
## πŸ“Š Accuracy Expectations
| PDF Type | Accuracy |
|-------|---------|
| Digital-first published books | ⭐⭐⭐⭐⭐ (~100%) |
| Technical manuals | ⭐⭐⭐⭐⭐ |
| Semi-digital PDFs | ⭐⭐⭐⭐ |
| Scanned PDFs (no bookmarks) | ❌ Not supported |
---
## πŸ—οΈ Ideal Use Cases
- πŸ“š Published books (Springer, O’Reilly, Wiley, Packt…)
- βš™οΈ Engineering manuals
- 🧾 Technical specifications
- 🏭 PLM & documentation pipelines
- πŸ“‚ Large PDF libraries
---
## 🚫 Limitations
This tool **requires bookmarks**.
If your PDF:
- Is scanned
- Has no outline
- Has broken TOC metadata
➑️ You will need **OCR or AI-based structure detection** (not included here).
---
## πŸ› οΈ Tech Stack
- **Python**
- **PyMuPDF (fitz)**
- Local execution (no cloud dependency)
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference