AhmedBou's picture
Update README.md
8564c6a verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: Smart PDF Chapter Splitter
emoji: πŸ“š
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
short_description: 'Split large PDFs (books) into clean, per-chapter files '

πŸ“š Smart PDF Chapter Splitter

Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β€” fast, local, and deterministic.

This tool uses PDF bookmarks (Table of Contents) to extract chapters with near-perfect accuracy for professionally published documents.


✨ Features

  • πŸ“– Splits PDFs into individual chapter files
  • βš™οΈ Uses embedded bookmarks (no AI, no guesswork)
  • πŸš€ Extremely fast (local processing)
  • 🧼 Safe filenames (cross-platform)
  • πŸ“‚ Batch-ready and automation-friendly

🧠 How It Works

Most modern PDFs contain an internal Table of Contents (bookmarks).

This Space:

  1. Reads the PDF outline
  2. Identifies top-level chapters
  3. Calculates page ranges
  4. Exports each chapter as its own PDF

βœ… Deterministic
❌ No OCR
❌ No AI hallucinations


πŸ“Š Accuracy Expectations

PDF Type Accuracy
Digital-first published books ⭐⭐⭐⭐⭐ (~100%)
Technical manuals ⭐⭐⭐⭐⭐
Semi-digital PDFs ⭐⭐⭐⭐
Scanned PDFs (no bookmarks) ❌ Not supported

πŸ—οΈ Ideal Use Cases

  • πŸ“š Published books (Springer, O’Reilly, Wiley, Packt…)
  • βš™οΈ Engineering manuals
  • 🧾 Technical specifications
  • 🏭 PLM & documentation pipelines
  • πŸ“‚ Large PDF libraries

🚫 Limitations

This tool requires bookmarks.

If your PDF:

  • Is scanned
  • Has no outline
  • Has broken TOC metadata

➑️ You will need OCR or AI-based structure detection (not included here).


πŸ› οΈ Tech Stack

  • Python
  • PyMuPDF (fitz)
  • Local execution (no cloud dependency)

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference