Upload 10 files

Browse files

Files changed (10) hide show

.gitignore +31 -0
LICENSE +21 -0
README.md +89 -0
STACKS.md +57 -0
all_page.py +70 -0
dump/README.md +62 -0
dump/pdf_to_image.py +38 -0
dump/requirements.txt +1 -0
requirements.txt +5 -0
single_page.py +75 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,31 @@

+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Environment Variables
+.env
+.env.local
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+# Output directories
+output_images/
+*.png
+*.jpg
+*.jpeg
+# OS generated files
+.DS_Store
+Thumbs.db
+PDF/*
+output/*

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 Rembrant Oyangoren Albeos
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,89 @@

+# PDF to Image (Python)
+[![License](https://img.shields.io/github/license/not-algorembrant/pdf-to-image-python)](https://github.com/not-algorembrant/pdf-to-image-python)
+[![Repo Size](https://img.shields.io/github/repo-size/not-algorembrant/pdf-to-image-python)](https://github.com/not-algorembrant/pdf-to-image-python)
+[![Last Commit](https://img.shields.io/github/last-commit/not-algorembrant/pdf-to-image-python)](https://github.com/not-algorembrant/pdf-to-image-python)
+[![Python](https://img.shields.io/badge/python-3.14+-blue.svg)](https://www.python.org/)
+[![Markdown](https://img.shields.io/badge/markdown-%23000000.svg?style=flat&logo=markdown&logoColor=white)](https://en.wikipedia.org/wiki/Markdown)
+Simply convert PDF files into rendered image pages at high resolution.
+This project was inspired by and serves as a Python alternative to the PHP package [spatie/pdf-to-image](https://github.com/spatie/pdf-to-image).
+## System Overview
+```mermaid
+graph TD
+    A[PDF Input] --> B{Process Type}
+    B -->|Single| C[single_page.py]
+    B -->|Batch| D[all_page.py]
+    C --> E[First Page Export]
+    D --> F[Full Document Export]
+    E --> G[Output Folder]
+    F --> G
+```
+## Project Structure
+```text
+pdf-to-image-python/
+├── .gitignore          # Git ignore rules
+├── PDF/                # Input PDF directory
+├── output/             # Generated images directory (auto-created)
+├── LICENSE             # Project license
+├── README.md           # Main documentation
+├── STACKS.md           # Technical stack audit
+├── all_page.py         # Full PDF conversion script
+├── requirements.txt    # Dependency list
+└── single_page.py      # First page conversion script
+```
+## Requirements
+- Python 3.14+
+- PyMuPDF library
+## Setup Instructions
+Make sure your environment is ready before running the tool:
+1. Create a virtual environment:
+   ```bash
+   python -m venv venv
+   ```
+2. Activate the virtual environment:
+   ```bash
+   # On Windows
+   venv\Scripts\activate
+   ```
+3. Install the dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## Usage
+Basic usage to convert a PDF in the most efficient way:
+```bash
+# Convert all pages of a PDF (provide folder or file)
+python all_page.py "PDF_folder" --dpi 300 --format png
+# Convert only the first page (Cover) for quick previews
+python single_page.py "PDF_folder" --dpi 300 --format png
+```
+## Citation
+If you use this tool in your research or project, please cite it as follows:
+```bibtex
+@misc{pdf_to_image_2026,
+  author = {Rembrant Oyangoren Albeos},
+  title = {PDF to Image Python Utility},
+  year = {2026},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/not-algorembrant/*}},
+}
+```

STACKS.md ADDED Viewed

	@@ -0,0 +1,57 @@

+## Description
+This project, `pdf-to-image-python`, is a high-performance utility designed to convert PDF documents into high-quality images while perfectly preserving native page proportions. It leverages the PyMuPDF (fitz) engine to handle complex PDF layouts and provides both batch processing for entire documents and single-page extraction (specifically for cover pages). The system dynamically calculates resolution scaling based on target DPI, ensuring crisp output for any source page size.
+## System Overview
+```mermaid
+graph TD
+    A[PDF Input] --> B{Process Type}
+    B -->|Single| C[single_page.py]
+    B -->|Batch| D[all_page.py]
+    C --> E[First Page Export]
+    D --> F[Full Document Export]
+    E --> G[Output Folder]
+    F --> G
+```
+## Project Structure
+```text
+pdf-to-image-python/
+├── .gitignore          # Git ignore rules
+├── PDF/                # Input PDF directory
+├── output/             # Generated images directory (auto-created)
+├── LICENSE             # Project license
+├── README.md           # Main documentation
+├── STACKS.md           # Technical stack audit
+├── all_page.py         # Full PDF conversion script
+├── requirements.txt    # Dependency list
+└── single_page.py      # First page conversion script
+```
+## Techstack
+Audit of project files (excluding environment and cache):
+| File Type | Count | Size (KB) |
+| :--- | :--- | :--- |
+| PDF (.pdf) | 16 | 15152 |
+| PNG (.png) | 17 | 1952 |
+| Python (.py) | 3 | 9.8 |
+| Markdown (.md) | 3 | 4.3 |
+| Text (.txt) | 2 | 0.1 |
+| License | 1 | 1.1 |
+**Total Files**: 42
+## Dependencies
+- **Python**:
+  - `PyMuPDF` (fitz): Core PDF rendering and processing.
+  - `argparse`: Command-line argument parsing.
+  - `os`: File system operations.
+  - `glob`: Filename pattern matching.
+## Applications
+- Google Antigravity
+- Google Gemini Pro
+- Visual Studio Code
+- Windows PowerShell

all_page.py ADDED Viewed

	@@ -0,0 +1,70 @@

+import fitz  # PyMuPDF
+import argparse
+import os
+import glob
+def convert_pdf_to_images(pdf_path, output_dir, dpi=300, image_format="png"):
+    """
+    Convert a PDF to a series of images, perfectly maintaining native page proportions.
+    """
+    # Create a dedicated subfolder for each PDF's images to keep things organized
+    pdf_name = os.path.splitext(os.path.basename(pdf_path))[0]
+    pdf_output_dir = os.path.join(output_dir, pdf_name)
+    if not os.path.exists(pdf_output_dir):
+        os.makedirs(pdf_output_dir)
+    try:
+        doc = fitz.open(pdf_path)
+        print(f"Processing '{pdf_name}' | Total pages: {len(doc)}")
+        # PDFs have a standard base resolution of 72 DPI.
+        # We calculate a zoom factor to scale the native page size up to your desired DPI.
+        zoom = dpi / 72.0
+        mat = fitz.Matrix(zoom, zoom) # This matrix dynamically adapts to ANY page size
+        for page_num in range(len(doc)):
+            page = doc.load_page(page_num)
+            # Apply the matrix to render the high-quality, perfectly proportioned image
+            pix = page.get_pixmap(matrix=mat, alpha=False)
+            output_file = os.path.join(pdf_output_dir, f"page_{page_num + 1:02d}.{image_format}")
+            pix.save(output_file)
+            print(f"  -> Saved page {page_num + 1} (Dynamic Resolution: {pix.width}x{pix.height})")
+    except Exception as e:
+        print(f"Error processing {pdf_path}: {e}")
+def process_all_pdfs(input_path, output_dir, dpi=300, image_format="png"):
+    """Determines if the input is a single file or a directory of files."""
+    if os.path.isfile(input_path):
+        convert_pdf_to_images(input_path, output_dir, dpi, image_format)
+    elif os.path.isdir(input_path):
+        pdf_files = glob.glob(os.path.join(input_path, "*.pdf"))
+        if not pdf_files:
+            print(f"No PDF files found in directory: {input_path}")
+            return
+        print(f"Found {len(pdf_files)} PDF(s). Starting batch conversion...\n")
+        for pdf in pdf_files:
+            convert_pdf_to_images(pdf, output_dir, dpi, image_format)
+            print("-" * 40)
+        print("All batch conversions complete!")
+    else:
+        print("Invalid input path. Please provide a valid PDF file or folder.")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Convert PDFs to auto-adjusting high-quality images.")
+    parser.add_argument("input_path", help="Path to a single PDF file OR a folder containing PDFs")
+    parser.add_argument("--output", "-o", default="output", help="Output directory (default: 'output')")
+    parser.add_argument("--dpi", type=int, default=300, help="Output image DPI (default: 300)")
+    parser.add_argument("--format", "-f", default="png", help="Output image format (default: png)")
+    args = parser.parse_args()
+    # Ensure base output directory exists
+    if not os.path.exists(args.output):
+        os.makedirs(args.output)
+    process_all_pdfs(args.input_path, args.output, args.dpi, args.format)

dump/README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# PDF to Image (Python)
+![License](https://img.shields.io/github/license/unban-algorembrant/pdf-to-image-python)
+![Repo Size](https://img.shields.io/github/repo-size/unban-algorembrant/pdf-to-image-python)
+![Last Commit](https://img.shields.io/github/last-commit/unban-algorembrant/pdf-to-image-python)
+![Python 3.14+](https://img.shields.io/badge/python-3.14+-blue.svg)
+This is a Python-based utility to convert PDF files into rendered images at high resolution.
+It converts PDFs to standard US Letter dimensions (8.5 x 11 inches) using 300 DPI by default.
+This project was inspired by and serves as a Python alternative to the PHP package [spatie/pdf-to-image](https://github.com/spatie/pdf-to-image).
+## Requirements
+- Python 3.14+
+- PyMuPDF library
+## Setup Instructions
+Make sure your environment is ready before running the tool:
+1. Create a virtual environment:
+   ```bash
+   python -m venv venv
+   ```
+2. Activate the virtual environment:
+   ```bash
+   # On Windows
+   venv\Scripts\activate
+   ```
+3. Install the dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## Usage
+Basic usage to convert a PDF:
+```bash
+python pdf_to_image.py "path/to/your/document.pdf" --output "output_folder"
+```
+## Citation
+If you use this tool in your research or project, please cite it as follows:
+```bibtex
+@misc{pdf_to_image_2026,
+  author = {Rembrant Oyangoren Albeos},
+  title = {PDF to Image Python Utility},
+  year = {2026},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/unban-algorembrant/pdf-to-image-python}},
+  note = {cite: {https://github.com/unban-algorembrant/*}}
+}
+```
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

dump/pdf_to_image.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import fitz  # PyMuPDF
+import argparse
+import os
+def convert_pdf_to_images(pdf_path, output_dir, dpi=300, image_format="png"):
+    """
+    Convert a PDF to a series of images.
+    """
+    if not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+    try:
+        doc = fitz.open(pdf_path)
+        print(f"Opened {pdf_path}. Total pages: {len(doc)}")
+        for page_num in range(len(doc)):
+            page = doc.load_page(page_num)
+            # alpha=False ensures a white background instead of transparent
+            pix = page.get_pixmap(dpi=dpi, alpha=False)
+            output_file = os.path.join(output_dir, f"page_{page_num + 1:02d}.{image_format}")
+            pix.save(output_file)
+            print(f"Saved {output_file}")
+        print("Conversion complete.")
+    except Exception as e:
+        print(f"Error processing PDF: {e}")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Convert a PDF to images.")
+    parser.add_argument("pdf_path", help="Path to the input PDF file")
+    parser.add_argument("--output", "-o", default="output", help="Output directory")
+    parser.add_argument("--dpi", type=int, default=300, help="Output image DPI (default: 300)")
+    parser.add_argument("--format", "-f", default="png", help="Output image format (default: png)")
+    args = parser.parse_args()
+    convert_pdf_to_images(args.pdf_path, args.output, args.dpi, args.format)

dump/requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ PyMuPDF

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+PyMuPDF
+fitz
+argparse
+os
+glob

single_page.py ADDED Viewed

	@@ -0,0 +1,75 @@

+import fitz  # PyMuPDF
+import argparse
+import os
+import glob
+def convert_pdf_to_images(pdf_path, output_dir, dpi=300, image_format="png"):
+    """
+    Convert the FIRST PAGE of a PDF to an image, perfectly maintaining native page proportions.
+    """
+    # Create a dedicated subfolder for each PDF's images to keep things organized
+    pdf_name = os.path.splitext(os.path.basename(pdf_path))[0]
+    pdf_output_dir = os.path.join(output_dir, pdf_name)
+    if not os.path.exists(pdf_output_dir):
+        os.makedirs(pdf_output_dir)
+    try:
+        doc = fitz.open(pdf_path)
+        print(f"Processing '{pdf_name}' | Total pages: {len(doc)} (Extracting page 1 only)")
+        # Prevent errors if an empty PDF is somehow loaded
+        if len(doc) == 0:
+            print(f"  -> Skipping '{pdf_name}': PDF has no pages.")
+            return
+        # PDFs have a standard base resolution of 72 DPI.
+        # We calculate a zoom factor to scale the native page size up to your desired DPI.
+        zoom = dpi / 72.0
+        mat = fitz.Matrix(zoom, zoom) # This matrix dynamically adapts to ANY page size
+        # Load ONLY the first page (Index 0)
+        page = doc.load_page(0)
+        # Apply the matrix to render the high-quality, perfectly proportioned image
+        pix = page.get_pixmap(matrix=mat, alpha=False)
+        output_file = os.path.join(pdf_output_dir, f"cover_page.{image_format}")
+        pix.save(output_file)
+        print(f"  -> Saved first page (Dynamic Resolution: {pix.width}x{pix.height})")
+    except Exception as e:
+        print(f"Error processing {pdf_path}: {e}")
+def process_all_pdfs(input_path, output_dir, dpi=300, image_format="png"):
+    """Determines if the input is a single file or a directory of files."""
+    if os.path.isfile(input_path):
+        convert_pdf_to_images(input_path, output_dir, dpi, image_format)
+    elif os.path.isdir(input_path):
+        pdf_files = glob.glob(os.path.join(input_path, "*.pdf"))
+        if not pdf_files:
+            print(f"No PDF files found in directory: {input_path}")
+            return
+        print(f"Found {len(pdf_files)} PDF(s). Starting batch conversion of first pages...\n")
+        for pdf in pdf_files:
+            convert_pdf_to_images(pdf, output_dir, dpi, image_format)
+            print("-" * 40)
+        print("All batch conversions complete!")
+    else:
+        print("Invalid input path. Please provide a valid PDF file or folder.")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Convert the first page of PDFs to auto-adjusting high-quality images.")
+    parser.add_argument("input_path", help="Path to a single PDF file OR a folder containing PDFs")
+    parser.add_argument("--output", "-o", default="output", help="Output directory (default: 'output')")
+    parser.add_argument("--dpi", type=int, default=300, help="Output image DPI (default: 300)")
+    parser.add_argument("--format", "-f", default="png", help="Output image format (default: png)")
+    args = parser.parse_args()
+    # Ensure base output directory exists
+    if not os.path.exists(args.output):
+        os.makedirs(args.output)
+    process_all_pdfs(args.input_path, args.output, args.dpi, args.format)