Spaces:

AhmedBou
/

Smart_PDF_Chapter_Splitter

Sleeping

App Files Files Community

Smart_PDF_Chapter_Splitter / README.md

AhmedBou

Update README.md

8564c6a verified about 1 month ago

preview code

raw

history blame contribute delete

2.02 kB

	---
	title: Smart PDF Chapter Splitter
	emoji: 📚
	colorFrom: gray
	colorTo: yellow
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: 'Split large PDFs (books) into clean, per-chapter files '
	---

	# 📚 Smart PDF Chapter Splitter

	Split large PDFs (books, manuals, technical documents) into clean, per-chapter files — fast, local, and deterministic.

	This tool uses PDF bookmarks (Table of Contents) to extract chapters with near-perfect accuracy for professionally published documents.

	---

	## ✨ Features

	- 📖 Splits PDFs into individual chapter files
	- ⚙️ Uses embedded bookmarks (no AI, no guesswork)
	- 🚀 Extremely fast (local processing)
	- 🧼 Safe filenames (cross-platform)
	- 📂 Batch-ready and automation-friendly

	---

	## 🧠 How It Works

	Most modern PDFs contain an internal Table of Contents (bookmarks).

	This Space:
	1. Reads the PDF outline
	2. Identifies top-level chapters
	3. Calculates page ranges
	4. Exports each chapter as its own PDF

	> ✅ Deterministic
	> ❌ No OCR
	> ❌ No AI hallucinations

	---

	## 📊 Accuracy Expectations

	\| PDF Type \| Accuracy \|
	\|-------\|---------\|
	\| Digital-first published books \| ⭐⭐⭐⭐⭐ (~100%) \|
	\| Technical manuals \| ⭐⭐⭐⭐⭐ \|
	\| Semi-digital PDFs \| ⭐⭐⭐⭐ \|
	\| Scanned PDFs (no bookmarks) \| ❌ Not supported \|

	---

	## 🏗️ Ideal Use Cases

	- 📚 Published books (Springer, O’Reilly, Wiley, Packt…)
	- ⚙️ Engineering manuals
	- 🧾 Technical specifications
	- 🏭 PLM & documentation pipelines
	- 📂 Large PDF libraries

	---

	## 🚫 Limitations

	This tool requires bookmarks.

	If your PDF:
	- Is scanned
	- Has no outline
	- Has broken TOC metadata

	➡️ You will need OCR or AI-based structure detection (not included here).

	---

	## 🛠️ Tech Stack

	- Python
	- PyMuPDF (fitz)
	- Local execution (no cloud dependency)


	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference