Spaces:

AhmedBou
/

Smart_PDF_Chapter_Splitter

Sleeping

File size: 2,021 Bytes

---
title: Smart PDF Chapter Splitter
emoji: 📚
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '
---

# 📚 Smart PDF Chapter Splitter

Split large PDFs (books, manuals, technical documents) into clean, per-chapter files — **fast, local, and deterministic**.

This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents.

---

## ✨ Features

- 📖 Splits PDFs into individual chapter files  
- ⚙️ Uses **embedded bookmarks** (no AI, no guesswork)  
- 🚀 Extremely fast (local processing)  
- 🧼 Safe filenames (cross-platform)  
- 📂 Batch-ready and automation-friendly  

---

## 🧠 How It Works

Most modern PDFs contain an internal **Table of Contents (bookmarks)**.

This Space:
1. Reads the PDF outline
2. Identifies top-level chapters
3. Calculates page ranges
4. Exports each chapter as its own PDF

> ✅ Deterministic  
> ❌ No OCR  
> ❌ No AI hallucinations  

---

## 📊 Accuracy Expectations

| PDF Type | Accuracy |
|-------|---------|
| Digital-first published books | ⭐⭐⭐⭐⭐ (~100%) |
| Technical manuals | ⭐⭐⭐⭐⭐ |
| Semi-digital PDFs | ⭐⭐⭐⭐ |
| Scanned PDFs (no bookmarks) | ❌ Not supported |

---

## 🏗️ Ideal Use Cases

- 📚 Published books (Springer, O’Reilly, Wiley, Packt…)
- ⚙️ Engineering manuals
- 🧾 Technical specifications
- 🏭 PLM & documentation pipelines
- 📂 Large PDF libraries

---

## 🚫 Limitations

This tool **requires bookmarks**.

If your PDF:
- Is scanned
- Has no outline
- Has broken TOC metadata  

➡️ You will need **OCR or AI-based structure detection** (not included here).

---

## 🛠️ Tech Stack

- **Python**
- **PyMuPDF (fitz)**
- Local execution (no cloud dependency)


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference