File size: 2,021 Bytes
39afa3f
 
8564c6a
39afa3f
 
 
 
 
 
 
 
 
 
8564c6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39afa3f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
title: Smart PDF Chapter Splitter
emoji: πŸ“š
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '
---

# πŸ“š Smart PDF Chapter Splitter

Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β€” **fast, local, and deterministic**.

This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents.

---

## ✨ Features

- πŸ“– Splits PDFs into individual chapter files  
- βš™οΈ Uses **embedded bookmarks** (no AI, no guesswork)  
- πŸš€ Extremely fast (local processing)  
- 🧼 Safe filenames (cross-platform)  
- πŸ“‚ Batch-ready and automation-friendly  

---

## 🧠 How It Works

Most modern PDFs contain an internal **Table of Contents (bookmarks)**.

This Space:
1. Reads the PDF outline
2. Identifies top-level chapters
3. Calculates page ranges
4. Exports each chapter as its own PDF

> βœ… Deterministic  
> ❌ No OCR  
> ❌ No AI hallucinations  

---

## πŸ“Š Accuracy Expectations

| PDF Type | Accuracy |
|-------|---------|
| Digital-first published books | ⭐⭐⭐⭐⭐ (~100%) |
| Technical manuals | ⭐⭐⭐⭐⭐ |
| Semi-digital PDFs | ⭐⭐⭐⭐ |
| Scanned PDFs (no bookmarks) | ❌ Not supported |

---

## πŸ—οΈ Ideal Use Cases

- πŸ“š Published books (Springer, O’Reilly, Wiley, Packt…)
- βš™οΈ Engineering manuals
- 🧾 Technical specifications
- 🏭 PLM & documentation pipelines
- πŸ“‚ Large PDF libraries

---

## 🚫 Limitations

This tool **requires bookmarks**.

If your PDF:
- Is scanned
- Has no outline
- Has broken TOC metadata  

➑️ You will need **OCR or AI-based structure detection** (not included here).

---

## πŸ› οΈ Tech Stack

- **Python**
- **PyMuPDF (fitz)**
- Local execution (no cloud dependency)


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference