pdf-master / README.md
algorembrant's picture
Upload 5 files
55fbe5b verified
---
tags:
- pdf
- document-processing
- pdf-manipulation
- python
- cli
- automation
language:
- en
license: mit
library_name: pdf-manipulator
pipeline_tag: other
---
# PDF Manipulator
![Python](https://img.shields.io/badge/Python-3.9%2B-blue?style=flat-square&logo=python)
![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)
![Version](https://img.shields.io/badge/Version-1.0.0-orange?style=flat-square)
![Maintained](https://img.shields.io/badge/Maintained-Yes-brightgreen?style=flat-square)
![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-blueviolet?style=flat-square)
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey?style=flat-square)
A comprehensive, single-file command-line toolkit for all PDF page manipulation operations. Merge, split, remove, rotate, crop, watermark, encrypt, number, reorder, and batch-process PDF files with a clean and intuitive CLI.
**Author:** algorembrant
---
## Features
| Feature | Command | Description |
|----------------------|------------------|-----------------------------------------------------|
| Merge PDFs | `merge` | Combine multiple PDFs into one, or interleave pages |
| Split PDF | `split` | Split into individual pages or page ranges |
| Remove Pages | `remove` | Remove one or more pages by number or range |
| Extract Pages | `extract` | Extract specific pages into a new PDF |
| Reorder Pages | `reorder` | Rearrange pages in any custom order |
| Rotate Pages | `rotate` | Rotate pages by 90, 180, or 270 degrees |
| Reverse Pages | `reverse` | Reverse the page order |
| Duplicate Pages | `duplicate` | Duplicate specific pages N times |
| Insert Blank Page | `insert-blank` | Insert blank page before or after a position |
| Insert PDF Pages | `insert` | Insert pages from another PDF at a position |
| Replace Pages | `replace` | Replace pages with pages from another PDF |
| Crop Pages | `crop` | Crop pages to a custom bounding box |
| Scale / Resize | `scale` | Scale pages by factor or resize to A4/letter |
| Watermark | `watermark` | Add text or PDF watermark to all pages |
| Stamp / Overlay | `stamp` | Overlay a stamp PDF on pages |
| Page Numbers | `number` | Add page numbers at any position |
| Encrypt | `encrypt` | Password-protect a PDF |
| Decrypt | `decrypt` | Remove password from a PDF |
| Metadata | `metadata` | View or edit PDF title, author, subject, keywords |
| Bookmarks | `bookmarks` | List or add bookmark/outline entries |
| Extract Text | `text` | Extract plain text from pages |
| Info / Inspect | `info` | Display page count, dimensions, and metadata |
| N-Up Layout | `nup` | Arrange multiple pages per sheet (2x1, 2x2, etc.) |
| Compress | `compress` | Losslessly compress PDF streams |
| Batch Remove | `batch-remove` | Remove pages from all PDFs in a directory |
| Batch Merge | `batch-merge` | Merge all PDFs in a directory into one |
| Batch Split | `batch-split` | Split all PDFs in a directory into pages |
---
## Requirements
- Python 3.9 or newer
- System dependency: **Poppler** (required for `nup` command only)
---
## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/algorembrant/pdf-manipulator.git
cd pdf-manipulator
```
### 2. Install Python Dependencies
```bash
pip install -r requirements.txt
```
### 3. Install Poppler (Required for N-Up Layout)
The `nup` command uses `pdf2image`, which requires Poppler to be installed on your system.
| Platform | Install Command |
|--------------|-------------------------------------------------------|
| Ubuntu/Debian| `sudo apt-get install -y poppler-utils` |
| macOS | `brew install poppler` |
| Windows | Download from https://github.com/oschwartz10612/poppler-windows/releases and add `bin/` to your PATH |
If you do not need the `nup` command, Poppler is not required.
---
## Usage
### Page Range Syntax
| Syntax | Meaning |
|----------|-------------------------------------|
| `3` | Page 3 only |
| `1,3,5` | Pages 1, 3, and 5 |
| `2-5` | Pages 2 through 5 inclusive |
| `1,3-5,7`| Pages 1, 3, 4, 5, and 7 |
Pages are always 1-indexed (first page = 1).
---
### Step-by-Step Guide
#### Merge PDFs
```bash
# Merge two or more PDFs in order
python pdf_manipulator.py merge -i file1.pdf file2.pdf file3.pdf -o merged.pdf
# Interleave pages (page 1 from file1, page 1 from file2, page 2 from file1, ...)
python pdf_manipulator.py merge -i file1.pdf file2.pdf -o interleaved.pdf --interleave
```
#### Split PDF
```bash
# Split into individual pages (saved to a directory)
python pdf_manipulator.py split -i input.pdf -o ./split_pages
# Extract a range of pages into a single file
python pdf_manipulator.py split -i input.pdf -o ./split_pages --range 1-5
python pdf_manipulator.py split -i input.pdf -o ./split_pages --range 2,4,6
```
#### Remove Pages
```bash
# Remove page 3
python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 3
# Remove multiple pages
python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 1,3,5
# Remove a range of pages
python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 2-5
# Remove mixed selection
python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 1,3-5,7
```
#### Extract Pages
```bash
# Extract pages 1-3 into a new PDF
python pdf_manipulator.py extract -i input.pdf -o output.pdf --pages 1-3
# Extract specific pages
python pdf_manipulator.py extract -i input.pdf -o output.pdf --pages 2,4,6
```
#### Reorder Pages
```bash
# Place page 3 first, then page 1, then page 2, then page 4
python pdf_manipulator.py reorder -i input.pdf -o output.pdf --order 3,1,2,4
```
#### Rotate Pages
```bash
# Rotate all pages 90 degrees clockwise
python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 90
# Rotate only pages 1 and 3 by 180 degrees
python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 180 --pages 1,3
# Rotate pages 2-4 by 270 degrees
python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 270 --pages 2-4
```
#### Reverse Page Order
```bash
python pdf_manipulator.py reverse -i input.pdf -o output.pdf
```
#### Duplicate Pages
```bash
# Duplicate page 2 so it appears 3 times in a row
python pdf_manipulator.py duplicate -i input.pdf -o output.pdf --pages 2 --times 3
```
#### Insert Blank Pages
```bash
# Insert a blank page after page 2
python pdf_manipulator.py insert-blank -i input.pdf -o output.pdf --after 2
# Insert a blank page before page 1
python pdf_manipulator.py insert-blank -i input.pdf -o output.pdf --before 1
```
#### Insert Pages from Another PDF
```bash
# Insert all pages from extra.pdf after page 3 of base.pdf
python pdf_manipulator.py insert -i base.pdf --insert-file extra.pdf -o output.pdf --after 3
# Insert before page 2
python pdf_manipulator.py insert -i base.pdf --insert-file extra.pdf -o output.pdf --before 2
```
#### Replace Pages
```bash
# Replace page 2 of base.pdf with page 1 of new.pdf
python pdf_manipulator.py replace -i base.pdf --replace-file new.pdf -o output.pdf --pages 2 --replace-pages 1
```
#### Crop Pages
```bash
# Crop all pages (coordinates in PDF points: left,bottom,right,top)
python pdf_manipulator.py crop -i input.pdf -o output.pdf --box "50,50,500,700"
# Crop only pages 1-3
python pdf_manipulator.py crop -i input.pdf -o output.pdf --box "50,50,500,700" --pages 1-3
```
#### Scale / Resize Pages
```bash
# Scale all pages to 50% of original size
python pdf_manipulator.py scale -i input.pdf -o output.pdf --factor 0.5
# Resize all pages to A4
python pdf_manipulator.py scale -i input.pdf -o output.pdf --to-size A4
# Resize to US Letter
python pdf_manipulator.py scale -i input.pdf -o output.pdf --to-size letter
```
#### Add Watermark
```bash
# Add a text watermark with defaults (red diagonal, 15% opacity)
python pdf_manipulator.py watermark -i input.pdf -o output.pdf --text "CONFIDENTIAL"
# Custom opacity and angle
python pdf_manipulator.py watermark -i input.pdf -o output.pdf --text "DRAFT" --opacity 0.3 --angle 45
# Use a PDF file as watermark
python pdf_manipulator.py watermark -i input.pdf -o output.pdf --watermark-pdf wm.pdf
```
#### Stamp / Overlay
```bash
# Overlay stamp.pdf on all pages
python pdf_manipulator.py stamp -i input.pdf -o output.pdf --stamp-pdf stamp.pdf
# Overlay on page 1 only
python pdf_manipulator.py stamp -i input.pdf -o output.pdf --stamp-pdf stamp.pdf --pages 1
```
#### Add Page Numbers
```bash
# Add page numbers at bottom center (default)
python pdf_manipulator.py number -i input.pdf -o output.pdf
# Custom position and starting number
python pdf_manipulator.py number -i input.pdf -o output.pdf --position bottom-right --start 1
# Custom format string
python pdf_manipulator.py number -i input.pdf -o output.pdf --position top-right --format "Page {n}"
```
Available positions: `bottom-center`, `bottom-left`, `bottom-right`, `top-center`, `top-left`, `top-right`
#### Encrypt / Decrypt
```bash
# Encrypt with a user password
python pdf_manipulator.py encrypt -i input.pdf -o output.pdf --user-pass mypassword
# Encrypt with both user and owner password
python pdf_manipulator.py encrypt -i input.pdf -o output.pdf --user-pass mypassword --owner-pass ownerpassword
# Decrypt / remove password
python pdf_manipulator.py decrypt -i encrypted.pdf -o decrypted.pdf --password mypassword
```
#### Metadata
```bash
# View metadata
python pdf_manipulator.py metadata -i input.pdf
# Set metadata fields
python pdf_manipulator.py metadata -i input.pdf -o output.pdf \
--set-title "Annual Report 2024" \
--set-author "algorembrant" \
--set-subject "Finance" \
--set-keywords "annual,report,finance"
```
#### Bookmarks / Outline
```bash
# List all bookmarks
python pdf_manipulator.py bookmarks -i input.pdf
# Add bookmarks
python pdf_manipulator.py bookmarks -i input.pdf -o output.pdf \
--add "Introduction:1,Chapter 1:3,Chapter 2:8"
```
#### Extract Text
```bash
# Print text from all pages
python pdf_manipulator.py text -i input.pdf
# Extract text from pages 1-3 and save to file
python pdf_manipulator.py text -i input.pdf --pages 1-3 -o extracted.txt
```
#### PDF Info
```bash
python pdf_manipulator.py info -i input.pdf
```
#### N-Up Layout (Requires Poppler)
```bash
# 2 pages side-by-side on one sheet
python pdf_manipulator.py nup -i input.pdf -o output.pdf --layout 2x1
# 4 pages in a 2x2 grid on one sheet
python pdf_manipulator.py nup -i input.pdf -o output.pdf --layout 2x2
```
#### Compress
```bash
python pdf_manipulator.py compress -i input.pdf -o output.pdf
```
#### Batch Operations
```bash
# Remove page 1 (e.g. cover page) from all PDFs in a directory
python pdf_manipulator.py batch-remove --dir ./pdfs --pages 1 --suffix _no_cover
# Merge all PDFs in a directory into one
python pdf_manipulator.py batch-merge --dir ./pdfs -o merged_all.pdf
# Split all PDFs in a directory into individual pages
python pdf_manipulator.py batch-split --dir ./pdfs --out-dir ./split_output
```
---
## Notes
- All page numbers are 1-indexed (first page is page 1).
- The `nup` command requires Poppler to be installed on your system.
- For encrypted PDFs, use the `--password` flag with any command that reads them (decrypt first, or add password support per command as needed).
- Output directories are created automatically if they do not exist.
---
## License
MIT License. See [LICENSE](LICENSE) for details.
---
## Author
**algorembrant**