Spaces:

AhmedBou
/

Smart_PDF_Chapter_Splitter

Sleeping

App Files Files Community

Smart_PDF_Chapter_Splitter / README.md

AhmedBou

Update README.md

8564c6a verified about 1 month ago

preview code

raw

history blame contribute delete

2.02 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Smart PDF Chapter Splitter
emoji: 📚
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '

📚 Smart PDF Chapter Splitter

Split large PDFs (books, manuals, technical documents) into clean, per-chapter files — fast, local, and deterministic.

This tool uses PDF bookmarks (Table of Contents) to extract chapters with near-perfect accuracy for professionally published documents.

✨ Features

📖 Splits PDFs into individual chapter files
⚙️ Uses embedded bookmarks (no AI, no guesswork)
🚀 Extremely fast (local processing)
🧼 Safe filenames (cross-platform)
📂 Batch-ready and automation-friendly

🧠 How It Works

Most modern PDFs contain an internal Table of Contents (bookmarks).

This Space:

Reads the PDF outline
Identifies top-level chapters
Calculates page ranges
Exports each chapter as its own PDF

✅ Deterministic
❌ No OCR
❌ No AI hallucinations

📊 Accuracy Expectations

PDF Type	Accuracy
Digital-first published books	⭐⭐⭐⭐⭐ (~100%)
Technical manuals	⭐⭐⭐⭐⭐
Semi-digital PDFs	⭐⭐⭐⭐
Scanned PDFs (no bookmarks)	❌ Not supported

🏗️ Ideal Use Cases

📚 Published books (Springer, O’Reilly, Wiley, Packt…)
⚙️ Engineering manuals
🧾 Technical specifications
🏭 PLM & documentation pipelines
📂 Large PDF libraries

🚫 Limitations

This tool requires bookmarks.

If your PDF:

Is scanned
Has no outline
Has broken TOC metadata

➡️ You will need OCR or AI-based structure detection (not included here).

🛠️ Tech Stack

Python
PyMuPDF (fitz)
Local execution (no cloud dependency)

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference