File size: 9,604 Bytes
3ab509c
ee9ebb9
 
 
 
3ab509c
ee9ebb9
 
 
3ab509c
 
25fef2d
3ab509c
25fef2d
3ab509c
25fef2d
ee9ebb9
25fef2d
ee9ebb9
b1dd1e0
58e6b77
25fef2d
b1dd1e0
 
 
25fef2d
b1dd1e0
 
58e6b77
b1dd1e0
25fef2d
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
 
58e6b77
b1dd1e0
 
25fef2d
b1dd1e0
 
 
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
 
 
b1dd1e0
25fef2d
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
b1dd1e0
 
25fef2d
b1dd1e0
25fef2d
 
b1dd1e0
 
25fef2d
b541c6e
25fef2d
b1dd1e0
25fef2d
 
 
 
b1dd1e0
25fef2d
 
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
25fef2d
 
 
 
 
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
25fef2d
 
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
25fef2d
 
b1dd1e0
25fef2d
 
 
b1dd1e0
 
25fef2d
 
 
 
 
b1dd1e0
25fef2d
 
 
 
b1dd1e0
 
25fef2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
 
25fef2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1dd1e0
 
25fef2d
b1dd1e0
 
25fef2d
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
 
25fef2d
 
 
 
 
 
 
 
 
b1dd1e0
 
25fef2d
 
b1dd1e0
 
 
 
25fef2d
b1dd1e0
25fef2d
 
 
 
 
 
 
 
b1dd1e0
 
 
25fef2d
 
 
b1dd1e0
25fef2d
 
 
 
b1dd1e0
 
 
 
 
25fef2d
 
b1dd1e0
 
 
 
25fef2d
b1dd1e0
 
25fef2d
 
 
 
 
b1dd1e0
 
 
 
25fef2d
 
 
 
 
b1dd1e0
 
 
 
25fef2d
b1dd1e0
25fef2d
 
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
 
 
 
 
 
 
 
 
 
 
 
b1dd1e0
25fef2d
 
 
 
 
 
 
 
b1dd1e0
25fef2d
 
 
 
 
b1dd1e0
25fef2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1dd1e0
25fef2d
b1dd1e0
25fef2d
b1dd1e0
 
 
25fef2d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
---
title: HTML to PDF Converter
emoji: πŸ“„
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
health_check:
  path: /health
---

# HTML to PDF Converter API πŸ“„

Convert HTML files to PDF with automatic image embedding and page break management. Perfect for generating reports, presentations, and documents from HTML.

## πŸš€ Quick Start

### Basic Conversion (HTML only)

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@your_file.html" \
  -o output.pdf
```

### With Images

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@report.html" \
  -F "images=@image1.png" \
  -F "images=@image2.jpg" \
  -F "images=@logo.svg" \
  -o output.pdf
```

### Custom Aspect Ratio

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@presentation.html" \
  -F "aspect_ratio=16:9" \
  -F "auto_detect=false" \
  -o slides.pdf
```

## πŸ“‹ API Endpoints

### `POST /convert`

Convert HTML file to PDF with optional images.

**Parameters:**
- `html_file` (required): HTML file to convert
- `images` (optional): Image files referenced in HTML (can upload multiple)
- `aspect_ratio` (optional): `16:9`, `1:1`, or `9:16`
- `auto_detect` (optional): Auto-detect aspect ratio from HTML (default: `true`)

**Response:**
- PDF file (application/pdf)
- Headers include metadata: aspect ratio, image count, PDF size

### `POST /convert-string`

Convert HTML string to PDF (for HTML without external images).

**Parameters:**
- `html_content` (required): HTML content as string
- `aspect_ratio` (optional): `16:9`, `1:1`, or `9:16`
- `auto_detect` (optional): Auto-detect aspect ratio (default: `true`)

**Example:**

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert-string \
  -F "html_content=<html><body><h1>Hello World</h1></body></html>" \
  -o output.pdf
```

### `GET /health`

Health check endpoint.

```bash
curl https://abdallalswaiti-htmlpdfs.hf.space/health
```

## 🎨 Features

### βœ… Automatic Image Path Normalization

The API automatically converts complex image paths to simple filenames:

**Before:**
```html
<img src="../../../assets/images/logo.png">
<img src="images/photo.jpg">
```

**After (automatically):**
```html
<img src="logo.png">
<img src="photo.jpg">
```

Just upload your images with the `images` parameter, and they'll work!

### βœ… Aspect Ratio Detection

The API automatically detects aspect ratio from:
- HTML `<meta name="viewport">` tags
- CSS `aspect-ratio` properties
- Keywords like "presentation", "slide"

Supported ratios:
- **16:9** - Landscape (presentations, slides) β†’ A4 Landscape
- **9:16** - Portrait (reports, documents) β†’ A4 Portrait
- **1:1** - Square (social media posts) β†’ 210mm Γ— 210mm

### βœ… Automatic Page Breaks

The API intelligently handles page breaks:
- Elements with classes: `.page`, `.slide`, `section.page`
- Top-level `<section>`, `<article>`, `<div>` elements
- Prevents breaking inside: headings, images, tables, code blocks

### βœ… Color Preservation

All colors, backgrounds, and gradients are preserved in the PDF with `print-color-adjust: exact`.

## πŸ’‘ Usage Examples

### Example 1: Simple Report

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@report.html" \
  -o report.pdf
```

### Example 2: Presentation with Images

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@slides.html" \
  -F "images=@chart1.png" \
  -F "images=@chart2.png" \
  -F "images=@logo.svg" \
  -F "aspect_ratio=16:9" \
  -o presentation.pdf
```

### Example 3: Multiple Images from Directory

```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@document.html" \
  $(for img in images/*.{png,jpg}; do echo "-F images=@$img"; done) \
  -o document.pdf
```

### Example 4: Python Script

```python
import requests

# Prepare files
files = {
    'html_file': open('report.html', 'rb'),
}

# Add images
images = [
    ('images', open('image1.png', 'rb')),
    ('images', open('image2.jpg', 'rb')),
]

# Optional parameters
data = {
    'aspect_ratio': '9:16',
    'auto_detect': 'false'
}

# Make request
response = requests.post(
    'https://abdallalswaiti-htmlpdfs.hf.space/convert',
    files=files,
    data=data,
    files=files + images
)

# Save PDF
if response.status_code == 200:
    with open('output.pdf', 'wb') as f:
        f.write(response.content)
    print("PDF generated successfully!")
else:
    print(f"Error: {response.status_code}")
    print(response.text)
```

### Example 5: JavaScript/Node.js

```javascript
const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');

async function convertToPDF() {
    const form = new FormData();
    
    // Add HTML file
    form.append('html_file', fs.createReadStream('report.html'));
    
    // Add images
    form.append('images', fs.createReadStream('image1.png'));
    form.append('images', fs.createReadStream('image2.jpg'));
    
    // Optional parameters
    form.append('aspect_ratio', '9:16');
    
    const response = await fetch(
        'https://abdallalswaiti-htmlpdfs.hf.space/convert',
        {
            method: 'POST',
            body: form
        }
    );
    
    if (response.ok) {
        const buffer = await response.arrayBuffer();
        fs.writeFileSync('output.pdf', Buffer.from(buffer));
        console.log('PDF generated successfully!');
    } else {
        console.error('Error:', await response.text());
    }
}

convertToPDF();
```

## πŸ“ HTML Best Practices

### For Multi-Page Documents

Use page classes to control page breaks:

```html
<div class="page">
    <h1>Page 1</h1>
    <p>Content here...</p>
</div>

<div class="page">
    <h1>Page 2</h1>
    <p>More content...</p>
</div>
```

### For Presentations (16:9)

```html
<!DOCTYPE html>
<html>
<head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0, orientation=landscape">
    <style>
        .slide {
            width: 100vw;
            height: 100vh;
            display: flex;
            flex-direction: column;
            justify-content: center;
            align-items: center;
        }
    </style>
</head>
<body>
    <div class="slide">
        <h1>Slide 1</h1>
        <img src="chart.png" alt="Chart">
    </div>
    
    <div class="slide">
        <h1>Slide 2</h1>
        <img src="graph.png" alt="Graph">
    </div>
</body>
</html>
```

### For Reports (9:16)

```html
<!DOCTYPE html>
<html>
<head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0, orientation=portrait">
    <style>
        body {
            font-family: Arial, sans-serif;
            padding: 20px;
        }
        .page {
            min-height: 100vh;
        }
    </style>
</head>
<body>
    <section class="page">
        <h1>Annual Report 2024</h1>
        <img src="logo.png" alt="Logo" style="width: 200px;">
        <p>Report content...</p>
    </section>
</body>
</html>
```

## 🎯 Image Handling

### Supported Formats
- PNG, JPG/JPEG
- GIF, SVG
- WebP, BMP

### Image Path Examples

Your HTML can have **any** of these formats:
```html
<!-- All of these work! -->
<img src="logo.png">
<img src="images/logo.png">
<img src="../../../assets/images/logo.png">
<img src="./photos/image.jpg">

<!-- CSS backgrounds too -->
<div style="background-image: url('bg.jpg')"></div>
<div style="background-image: url('../images/bg.jpg')"></div>
```

Just upload the images:
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@index.html" \
  -F "images=@logo.png" \
  -F "images=@bg.jpg" \
  -o output.pdf
```

The API automatically:
1. Extracts filenames from paths
2. Normalizes all references to simple filenames
3. Saves images to the same directory as HTML
4. Generates PDF with all images embedded

## πŸ”§ Troubleshooting

### Images Not Showing
- Ensure image filenames match exactly (case-sensitive)
- Upload ALL images referenced in your HTML
- Check that image paths are normalized (the API does this automatically)

### Wrong Aspect Ratio
- Set `auto_detect=false` and specify `aspect_ratio` manually
- Check HTML for viewport meta tags that might override

### Page Breaks in Wrong Places
- Add `class="no-page-break"` to elements that should stay together
- Use `class="page-break"` to force breaks at specific points

### PDF Too Large
- Optimize images before uploading (compress, resize)
- Use appropriate image formats (WebP for photos, PNG for graphics)

## πŸ“Š Response Headers

The API includes useful metadata in response headers:

- `X-Aspect-Ratio`: Detected or specified aspect ratio
- `X-Path-Replacements`: Number of image paths normalized
- `X-PDF-Size`: Size of generated PDF in bytes

**Example:**
```bash
curl -I -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@test.html"

# Response headers:
# X-Aspect-Ratio: 9:16
# X-Path-Replacements: 3
# X-PDF-Size: 245678
```

## πŸ› οΈ Technical Details

- **Engine**: Puppeteer (Chromium-based)
- **Backend**: FastAPI (Python)
- **Max Timeout**: 60 seconds per conversion
- **Page Sizes**:
  - 16:9 β†’ A4 Landscape (297mm Γ— 210mm)
  - 9:16 β†’ A4 Portrait (210mm Γ— 297mm)
  - 1:1 β†’ Square (210mm Γ— 210mm)

## πŸ“„ License

This API is provided as-is for public use on Hugging Face Spaces.

## 🀝 Support

For issues or questions, please visit the [Space discussion page](https://huggingface.co/spaces/abdallalswaiti/htmlpdfs/discussions).

---

**Made with ❀️ using FastAPI and Puppeteer**