Spaces:
Running
Running
Alessio Vertemati
commited on
Commit
·
d6bbfc5
1
Parent(s):
f531a74
Add proper readme
Browse files
README.md
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: PDF Incorporate
|
| 3 |
+
emoji: 📦
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 6.3.0
|
| 8 |
+
python_version: 3.12
|
| 9 |
+
app_file: app.py
|
| 10 |
+
pinned: false
|
| 11 |
+
license: mit
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# 📦 PDF Incorporate
|
| 15 |
+
|
| 16 |
+
Transform your PDF documents into data containers by incorporating files directly within them. PDFs aren't just static documents—they can carry datasets, supplementary files, and supporting materials alongside your content.
|
| 17 |
+
|
| 18 |
+
## Overview
|
| 19 |
+
|
| 20 |
+
PDF Incorporate is a web-based application that allows you to embed files as attachments within PDF documents. Built with [Gradio](https://gradio.app/) and powered by [Parxy](https://github.com/OneOffTech/parxy), it provides an interface for managing PDF attachments.
|
| 21 |
+
|
| 22 |
+
## Why Embed Files in PDFs?
|
| 23 |
+
|
| 24 |
+
- **📊 Research & Reports**: Attach raw datasets, analysis scripts, or supplementary tables to academic papers and technical reports
|
| 25 |
+
- **📈 Business Documents**: Include spreadsheets, financial data, or supporting evidence within proposals and presentations
|
| 26 |
+
- **📝 Documentation**: Bundle configuration files, code samples, or reference materials with technical documentation
|
| 27 |
+
- **🔗 Data Provenance**: Keep source data and processed documents together for complete traceability
|
| 28 |
+
- **✉️ Simplified Sharing**: Send one file instead of managing multiple attachments—everything travels together
|
| 29 |
+
|
| 30 |
+
## Features
|
| 31 |
+
|
| 32 |
+
- **Upload PDFs**: Support for PDF files up to 25 MB
|
| 33 |
+
- **View Existing Attachments**: Automatically detects and displays files already embedded in uploaded PDFs
|
| 34 |
+
- **Add New Attachments**: Embed files (up to 10 MB each) with custom descriptions
|
| 35 |
+
- **Download Attachments**: Extract and download individual files from PDFs
|
| 36 |
+
- **Session Isolation**: Multiple users can work simultaneously without interference
|
| 37 |
+
- **Self-Contained Output**: Creates a new PDF with all attachments embedded
|
| 38 |
+
|
| 39 |
+
## Installation
|
| 40 |
+
|
| 41 |
+
### Prerequisites
|
| 42 |
+
|
| 43 |
+
- Python 3.9 or higher
|
| 44 |
+
- [uv](https://github.com/astral-sh/uv) package manager (recommended) or pip
|
| 45 |
+
|
| 46 |
+
### Setup
|
| 47 |
+
|
| 48 |
+
1. Clone the repository:
|
| 49 |
+
```bash
|
| 50 |
+
git clone https://huggingface.co/spaces/oneofftech/pdf-incorporate
|
| 51 |
+
cd pdf-incorporate
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
2. Install dependencies using uv:
|
| 55 |
+
```bash
|
| 56 |
+
uv sync
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
Or using pip:
|
| 60 |
+
```bash
|
| 61 |
+
pip install -r requirements.txt
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
3. (Optional) Create a `.env` file for environment variables if needed.
|
| 65 |
+
|
| 66 |
+
## Usage
|
| 67 |
+
|
| 68 |
+
### Running Locally
|
| 69 |
+
|
| 70 |
+
Start the application:
|
| 71 |
+
```bash
|
| 72 |
+
uv run python app.py
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Or with standard Python:
|
| 76 |
+
```bash
|
| 77 |
+
python app.py
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
The application will launch in your default web browser at `http://127.0.0.1:7860`
|
| 81 |
+
|
| 82 |
+
### Using the Application
|
| 83 |
+
|
| 84 |
+
1. **Upload a PDF**: Select a PDF file (max 25 MB) to work with
|
| 85 |
+
2. **View Existing Attachments**: Any files already embedded in the PDF will be displayed
|
| 86 |
+
3. **Add New Attachments**:
|
| 87 |
+
- Select a file to attach (max 10 MB)
|
| 88 |
+
- Provide a description
|
| 89 |
+
- Click "Add Attachment"
|
| 90 |
+
4. **Download Attachments**: Select an attachment from the dropdown and click download to extract it
|
| 91 |
+
5. **Process PDF**: Click "Incorporate Attachments" to create a new PDF with all attachments embedded
|
| 92 |
+
|
| 93 |
+
## Technical Details
|
| 94 |
+
|
| 95 |
+
### Dependencies
|
| 96 |
+
|
| 97 |
+
- **Gradio**: Web interface framework
|
| 98 |
+
- **Parxy**: Gateway to process PDF documents using various services
|
| 99 |
+
- **Python-dotenv**: Environment variable management
|
| 100 |
+
|
| 101 |
+
### Architecture
|
| 102 |
+
|
| 103 |
+
- **Session Management**: Gradio's State management ensures user isolation
|
| 104 |
+
- **PDF Processing**: PdfService from Parxy handles all PDF operations
|
| 105 |
+
- **File Size Limits**:
|
| 106 |
+
- PDF uploads: 25 MB maximum
|
| 107 |
+
- Attachments: 10 MB maximum per file
|
| 108 |
+
|
| 109 |
+
## Development
|
| 110 |
+
|
| 111 |
+
### Project Structure
|
| 112 |
+
|
| 113 |
+
```
|
| 114 |
+
pdf-attachment-space/
|
| 115 |
+
├── app.py # Main application file
|
| 116 |
+
├── pyproject.toml # Project dependencies
|
| 117 |
+
├── uv.lock # Locked dependencies
|
| 118 |
+
├── README.md # This file
|
| 119 |
+
└── .env # Environment variables (optional)
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
### Key Functions
|
| 123 |
+
|
| 124 |
+
- `add_attachment_to_pdf()`: Embeds a single file into a PDF
|
| 125 |
+
- `remove_attachment_from_pdf()`: Removes an attachment from a PDF
|
| 126 |
+
- `list_pdf_attachments()`: Lists all files embedded in a PDF
|
| 127 |
+
- `extract_and_download_attachment()`: Extracts a file from a PDF for download
|
| 128 |
+
|
| 129 |
+
## Credits
|
| 130 |
+
|
| 131 |
+
Brought to you by [OneOffTech](https://oneofftech.xyz)
|
| 132 |
+
|
| 133 |
+
Built using [Parxy](https://github.com/OneOffTech/parxy).
|
| 134 |
+
|
| 135 |
+
## License
|
| 136 |
+
|
| 137 |
+
MIT License - See LICENSE file for details
|
| 138 |
+
|