Alessio Vertemati commited on
Commit
d6bbfc5
·
1 Parent(s): f531a74

Add proper readme

Browse files
Files changed (1) hide show
  1. README.md +138 -0
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: PDF Incorporate
3
+ emoji: 📦
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 6.3.0
8
+ python_version: 3.12
9
+ app_file: app.py
10
+ pinned: false
11
+ license: mit
12
+ ---
13
+
14
+ # 📦 PDF Incorporate
15
+
16
+ Transform your PDF documents into data containers by incorporating files directly within them. PDFs aren't just static documents—they can carry datasets, supplementary files, and supporting materials alongside your content.
17
+
18
+ ## Overview
19
+
20
+ PDF Incorporate is a web-based application that allows you to embed files as attachments within PDF documents. Built with [Gradio](https://gradio.app/) and powered by [Parxy](https://github.com/OneOffTech/parxy), it provides an interface for managing PDF attachments.
21
+
22
+ ## Why Embed Files in PDFs?
23
+
24
+ - **📊 Research & Reports**: Attach raw datasets, analysis scripts, or supplementary tables to academic papers and technical reports
25
+ - **📈 Business Documents**: Include spreadsheets, financial data, or supporting evidence within proposals and presentations
26
+ - **📝 Documentation**: Bundle configuration files, code samples, or reference materials with technical documentation
27
+ - **🔗 Data Provenance**: Keep source data and processed documents together for complete traceability
28
+ - **✉️ Simplified Sharing**: Send one file instead of managing multiple attachments—everything travels together
29
+
30
+ ## Features
31
+
32
+ - **Upload PDFs**: Support for PDF files up to 25 MB
33
+ - **View Existing Attachments**: Automatically detects and displays files already embedded in uploaded PDFs
34
+ - **Add New Attachments**: Embed files (up to 10 MB each) with custom descriptions
35
+ - **Download Attachments**: Extract and download individual files from PDFs
36
+ - **Session Isolation**: Multiple users can work simultaneously without interference
37
+ - **Self-Contained Output**: Creates a new PDF with all attachments embedded
38
+
39
+ ## Installation
40
+
41
+ ### Prerequisites
42
+
43
+ - Python 3.9 or higher
44
+ - [uv](https://github.com/astral-sh/uv) package manager (recommended) or pip
45
+
46
+ ### Setup
47
+
48
+ 1. Clone the repository:
49
+ ```bash
50
+ git clone https://huggingface.co/spaces/oneofftech/pdf-incorporate
51
+ cd pdf-incorporate
52
+ ```
53
+
54
+ 2. Install dependencies using uv:
55
+ ```bash
56
+ uv sync
57
+ ```
58
+
59
+ Or using pip:
60
+ ```bash
61
+ pip install -r requirements.txt
62
+ ```
63
+
64
+ 3. (Optional) Create a `.env` file for environment variables if needed.
65
+
66
+ ## Usage
67
+
68
+ ### Running Locally
69
+
70
+ Start the application:
71
+ ```bash
72
+ uv run python app.py
73
+ ```
74
+
75
+ Or with standard Python:
76
+ ```bash
77
+ python app.py
78
+ ```
79
+
80
+ The application will launch in your default web browser at `http://127.0.0.1:7860`
81
+
82
+ ### Using the Application
83
+
84
+ 1. **Upload a PDF**: Select a PDF file (max 25 MB) to work with
85
+ 2. **View Existing Attachments**: Any files already embedded in the PDF will be displayed
86
+ 3. **Add New Attachments**:
87
+ - Select a file to attach (max 10 MB)
88
+ - Provide a description
89
+ - Click "Add Attachment"
90
+ 4. **Download Attachments**: Select an attachment from the dropdown and click download to extract it
91
+ 5. **Process PDF**: Click "Incorporate Attachments" to create a new PDF with all attachments embedded
92
+
93
+ ## Technical Details
94
+
95
+ ### Dependencies
96
+
97
+ - **Gradio**: Web interface framework
98
+ - **Parxy**: Gateway to process PDF documents using various services
99
+ - **Python-dotenv**: Environment variable management
100
+
101
+ ### Architecture
102
+
103
+ - **Session Management**: Gradio's State management ensures user isolation
104
+ - **PDF Processing**: PdfService from Parxy handles all PDF operations
105
+ - **File Size Limits**:
106
+ - PDF uploads: 25 MB maximum
107
+ - Attachments: 10 MB maximum per file
108
+
109
+ ## Development
110
+
111
+ ### Project Structure
112
+
113
+ ```
114
+ pdf-attachment-space/
115
+ ├── app.py # Main application file
116
+ ├── pyproject.toml # Project dependencies
117
+ ├── uv.lock # Locked dependencies
118
+ ├── README.md # This file
119
+ └── .env # Environment variables (optional)
120
+ ```
121
+
122
+ ### Key Functions
123
+
124
+ - `add_attachment_to_pdf()`: Embeds a single file into a PDF
125
+ - `remove_attachment_from_pdf()`: Removes an attachment from a PDF
126
+ - `list_pdf_attachments()`: Lists all files embedded in a PDF
127
+ - `extract_and_download_attachment()`: Extracts a file from a PDF for download
128
+
129
+ ## Credits
130
+
131
+ Brought to you by [OneOffTech](https://oneofftech.xyz)
132
+
133
+ Built using [Parxy](https://github.com/OneOffTech/parxy).
134
+
135
+ ## License
136
+
137
+ MIT License - See LICENSE file for details
138
+