Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: Smart PDF Chapter Splitter
emoji: π
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '
π Smart PDF Chapter Splitter
Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β fast, local, and deterministic.
This tool uses PDF bookmarks (Table of Contents) to extract chapters with near-perfect accuracy for professionally published documents.
β¨ Features
- π Splits PDFs into individual chapter files
- βοΈ Uses embedded bookmarks (no AI, no guesswork)
- π Extremely fast (local processing)
- π§Ό Safe filenames (cross-platform)
- π Batch-ready and automation-friendly
π§ How It Works
Most modern PDFs contain an internal Table of Contents (bookmarks).
This Space:
- Reads the PDF outline
- Identifies top-level chapters
- Calculates page ranges
- Exports each chapter as its own PDF
β Deterministic
β No OCR
β No AI hallucinations
π Accuracy Expectations
| PDF Type | Accuracy |
|---|---|
| Digital-first published books | βββββ (~100%) |
| Technical manuals | βββββ |
| Semi-digital PDFs | ββββ |
| Scanned PDFs (no bookmarks) | β Not supported |
ποΈ Ideal Use Cases
- π Published books (Springer, OβReilly, Wiley, Packtβ¦)
- βοΈ Engineering manuals
- π§Ύ Technical specifications
- π PLM & documentation pipelines
- π Large PDF libraries
π« Limitations
This tool requires bookmarks.
If your PDF:
- Is scanned
- Has no outline
- Has broken TOC metadata
β‘οΈ You will need OCR or AI-based structure detection (not included here).
π οΈ Tech Stack
- Python
- PyMuPDF (fitz)
- Local execution (no cloud dependency)
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference