Spaces:
Sleeping
Sleeping
File size: 2,021 Bytes
39afa3f 8564c6a 39afa3f 8564c6a 39afa3f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | ---
title: Smart PDF Chapter Splitter
emoji: π
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '
---
# π Smart PDF Chapter Splitter
Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β **fast, local, and deterministic**.
This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents.
---
## β¨ Features
- π Splits PDFs into individual chapter files
- βοΈ Uses **embedded bookmarks** (no AI, no guesswork)
- π Extremely fast (local processing)
- π§Ό Safe filenames (cross-platform)
- π Batch-ready and automation-friendly
---
## π§ How It Works
Most modern PDFs contain an internal **Table of Contents (bookmarks)**.
This Space:
1. Reads the PDF outline
2. Identifies top-level chapters
3. Calculates page ranges
4. Exports each chapter as its own PDF
> β
Deterministic
> β No OCR
> β No AI hallucinations
---
## π Accuracy Expectations
| PDF Type | Accuracy |
|-------|---------|
| Digital-first published books | βββββ (~100%) |
| Technical manuals | βββββ |
| Semi-digital PDFs | ββββ |
| Scanned PDFs (no bookmarks) | β Not supported |
---
## ποΈ Ideal Use Cases
- π Published books (Springer, OβReilly, Wiley, Packtβ¦)
- βοΈ Engineering manuals
- π§Ύ Technical specifications
- π PLM & documentation pipelines
- π Large PDF libraries
---
## π« Limitations
This tool **requires bookmarks**.
If your PDF:
- Is scanned
- Has no outline
- Has broken TOC metadata
β‘οΈ You will need **OCR or AI-based structure detection** (not included here).
---
## π οΈ Tech Stack
- **Python**
- **PyMuPDF (fitz)**
- Local execution (no cloud dependency)
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|