Spaces:
Sleeping
Sleeping
| title: Smart PDF Chapter Splitter | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 6.5.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: 'Split large PDFs (books) into clean, per-chapter files ' | |
| # π Smart PDF Chapter Splitter | |
| Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β **fast, local, and deterministic**. | |
| This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents. | |
| --- | |
| ## β¨ Features | |
| - π Splits PDFs into individual chapter files | |
| - βοΈ Uses **embedded bookmarks** (no AI, no guesswork) | |
| - π Extremely fast (local processing) | |
| - π§Ό Safe filenames (cross-platform) | |
| - π Batch-ready and automation-friendly | |
| --- | |
| ## π§ How It Works | |
| Most modern PDFs contain an internal **Table of Contents (bookmarks)**. | |
| This Space: | |
| 1. Reads the PDF outline | |
| 2. Identifies top-level chapters | |
| 3. Calculates page ranges | |
| 4. Exports each chapter as its own PDF | |
| > β Deterministic | |
| > β No OCR | |
| > β No AI hallucinations | |
| --- | |
| ## π Accuracy Expectations | |
| | PDF Type | Accuracy | | |
| |-------|---------| | |
| | Digital-first published books | βββββ (~100%) | | |
| | Technical manuals | βββββ | | |
| | Semi-digital PDFs | ββββ | | |
| | Scanned PDFs (no bookmarks) | β Not supported | | |
| --- | |
| ## ποΈ Ideal Use Cases | |
| - π Published books (Springer, OβReilly, Wiley, Packtβ¦) | |
| - βοΈ Engineering manuals | |
| - π§Ύ Technical specifications | |
| - π PLM & documentation pipelines | |
| - π Large PDF libraries | |
| --- | |
| ## π« Limitations | |
| This tool **requires bookmarks**. | |
| If your PDF: | |
| - Is scanned | |
| - Has no outline | |
| - Has broken TOC metadata | |
| β‘οΈ You will need **OCR or AI-based structure detection** (not included here). | |
| --- | |
| ## π οΈ Tech Stack | |
| - **Python** | |
| - **PyMuPDF (fitz)** | |
| - Local execution (no cloud dependency) | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |