Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: PDF Incorporate
emoji: 📦
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.3.0
python_version: 3.12
app_file: app.py
pinned: false
license: mit
📦 PDF Incorporate
Transform your PDF documents into data containers by incorporating files directly within them. PDFs aren't just static documents—they can carry datasets, supplementary files, and supporting materials alongside your content.
Overview
PDF Incorporate is a web-based application that allows you to embed files as attachments within PDF documents. Built with Gradio and powered by Parxy, it provides an interface for managing PDF attachments.
Why Embed Files in PDFs?
- 📊 Research & Reports: Attach raw datasets, analysis scripts, or supplementary tables to academic papers and technical reports
- 📈 Business Documents: Include spreadsheets, financial data, or supporting evidence within proposals and presentations
- 📝 Documentation: Bundle configuration files, code samples, or reference materials with technical documentation
- 🔗 Data Provenance: Keep source data and processed documents together for complete traceability
- ✉️ Simplified Sharing: Send one file instead of managing multiple attachments—everything travels together
Features
- Upload PDFs: Support for PDF files up to 25 MB
- View Existing Attachments: Automatically detects and displays files already embedded in uploaded PDFs
- Add New Attachments: Embed files (up to 10 MB each) with custom descriptions
- Download Attachments: Extract and download individual files from PDFs
- Session Isolation: Multiple users can work simultaneously without interference
- Self-Contained Output: Creates a new PDF with all attachments embedded
Installation
Prerequisites
- Python 3.9 or higher
- uv package manager (recommended) or pip
Setup
- Clone the repository:
git clone https://huggingface.co/spaces/oneofftech/pdf-incorporate
cd pdf-incorporate
- Install dependencies using uv:
uv sync
Or using pip:
pip install -r requirements.txt
- (Optional) Create a
.envfile for environment variables if needed.
Usage
Running Locally
Start the application:
uv run python app.py
Or with standard Python:
python app.py
The application will launch in your default web browser at http://127.0.0.1:7860
Using the Application
- Upload a PDF: Select a PDF file (max 25 MB) to work with
- View Existing Attachments: Any files already embedded in the PDF will be displayed
- Add New Attachments:
- Select a file to attach (max 10 MB)
- Provide a description
- Click "Add Attachment"
- Download Attachments: Select an attachment from the dropdown and click download to extract it
- Process PDF: Click "Incorporate Attachments" to create a new PDF with all attachments embedded
Technical Details
Dependencies
- Gradio: Web interface framework
- Parxy: Gateway to process PDF documents using various services
- Python-dotenv: Environment variable management
Architecture
- Session Management: Gradio's State management ensures user isolation
- PDF Processing: PdfService from Parxy handles all PDF operations
- File Size Limits:
- PDF uploads: 25 MB maximum
- Attachments: 10 MB maximum per file
Development
Project Structure
pdf-attachment-space/
├── app.py # Main application file
├── pyproject.toml # Project dependencies
├── uv.lock # Locked dependencies
├── README.md # This file
└── .env # Environment variables (optional)
Key Functions
add_attachment_to_pdf(): Embeds a single file into a PDFremove_attachment_from_pdf(): Removes an attachment from a PDFlist_pdf_attachments(): Lists all files embedded in a PDFextract_and_download_attachment(): Extracts a file from a PDF for download
Credits
Brought to you by OneOffTech
Built using Parxy.
License
MIT License - See LICENSE file for details