--- title: Smart PDF Chapter Splitter emoji: πŸ“š colorFrom: gray colorTo: yellow sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false license: mit short_description: 'Split large PDFs (books) into clean, per-chapter files ' --- # πŸ“š Smart PDF Chapter Splitter Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β€” **fast, local, and deterministic**. This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents. --- ## ✨ Features - πŸ“– Splits PDFs into individual chapter files - βš™οΈ Uses **embedded bookmarks** (no AI, no guesswork) - πŸš€ Extremely fast (local processing) - 🧼 Safe filenames (cross-platform) - πŸ“‚ Batch-ready and automation-friendly --- ## 🧠 How It Works Most modern PDFs contain an internal **Table of Contents (bookmarks)**. This Space: 1. Reads the PDF outline 2. Identifies top-level chapters 3. Calculates page ranges 4. Exports each chapter as its own PDF > βœ… Deterministic > ❌ No OCR > ❌ No AI hallucinations --- ## πŸ“Š Accuracy Expectations | PDF Type | Accuracy | |-------|---------| | Digital-first published books | ⭐⭐⭐⭐⭐ (~100%) | | Technical manuals | ⭐⭐⭐⭐⭐ | | Semi-digital PDFs | ⭐⭐⭐⭐ | | Scanned PDFs (no bookmarks) | ❌ Not supported | --- ## πŸ—οΈ Ideal Use Cases - πŸ“š Published books (Springer, O’Reilly, Wiley, Packt…) - βš™οΈ Engineering manuals - 🧾 Technical specifications - 🏭 PLM & documentation pipelines - πŸ“‚ Large PDF libraries --- ## 🚫 Limitations This tool **requires bookmarks**. If your PDF: - Is scanned - Has no outline - Has broken TOC metadata ➑️ You will need **OCR or AI-based structure detection** (not included here). --- ## πŸ› οΈ Tech Stack - **Python** - **PyMuPDF (fitz)** - Local execution (no cloud dependency) Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference