htmlpdfs / README.md
ABDALLALSWAITI's picture
Update README.md
25fef2d verified
---
title: HTML to PDF Converter
emoji: πŸ“„
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
health_check:
path: /health
---
# HTML to PDF Converter API πŸ“„
Convert HTML files to PDF with automatic image embedding and page break management. Perfect for generating reports, presentations, and documents from HTML.
## πŸš€ Quick Start
### Basic Conversion (HTML only)
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@your_file.html" \
-o output.pdf
```
### With Images
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@report.html" \
-F "images=@image1.png" \
-F "images=@image2.jpg" \
-F "images=@logo.svg" \
-o output.pdf
```
### Custom Aspect Ratio
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@presentation.html" \
-F "aspect_ratio=16:9" \
-F "auto_detect=false" \
-o slides.pdf
```
## πŸ“‹ API Endpoints
### `POST /convert`
Convert HTML file to PDF with optional images.
**Parameters:**
- `html_file` (required): HTML file to convert
- `images` (optional): Image files referenced in HTML (can upload multiple)
- `aspect_ratio` (optional): `16:9`, `1:1`, or `9:16`
- `auto_detect` (optional): Auto-detect aspect ratio from HTML (default: `true`)
**Response:**
- PDF file (application/pdf)
- Headers include metadata: aspect ratio, image count, PDF size
### `POST /convert-string`
Convert HTML string to PDF (for HTML without external images).
**Parameters:**
- `html_content` (required): HTML content as string
- `aspect_ratio` (optional): `16:9`, `1:1`, or `9:16`
- `auto_detect` (optional): Auto-detect aspect ratio (default: `true`)
**Example:**
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert-string \
-F "html_content=<html><body><h1>Hello World</h1></body></html>" \
-o output.pdf
```
### `GET /health`
Health check endpoint.
```bash
curl https://abdallalswaiti-htmlpdfs.hf.space/health
```
## 🎨 Features
### βœ… Automatic Image Path Normalization
The API automatically converts complex image paths to simple filenames:
**Before:**
```html
<img src="../../../assets/images/logo.png">
<img src="images/photo.jpg">
```
**After (automatically):**
```html
<img src="logo.png">
<img src="photo.jpg">
```
Just upload your images with the `images` parameter, and they'll work!
### βœ… Aspect Ratio Detection
The API automatically detects aspect ratio from:
- HTML `<meta name="viewport">` tags
- CSS `aspect-ratio` properties
- Keywords like "presentation", "slide"
Supported ratios:
- **16:9** - Landscape (presentations, slides) β†’ A4 Landscape
- **9:16** - Portrait (reports, documents) β†’ A4 Portrait
- **1:1** - Square (social media posts) β†’ 210mm Γ— 210mm
### βœ… Automatic Page Breaks
The API intelligently handles page breaks:
- Elements with classes: `.page`, `.slide`, `section.page`
- Top-level `<section>`, `<article>`, `<div>` elements
- Prevents breaking inside: headings, images, tables, code blocks
### βœ… Color Preservation
All colors, backgrounds, and gradients are preserved in the PDF with `print-color-adjust: exact`.
## πŸ’‘ Usage Examples
### Example 1: Simple Report
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@report.html" \
-o report.pdf
```
### Example 2: Presentation with Images
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@slides.html" \
-F "images=@chart1.png" \
-F "images=@chart2.png" \
-F "images=@logo.svg" \
-F "aspect_ratio=16:9" \
-o presentation.pdf
```
### Example 3: Multiple Images from Directory
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@document.html" \
$(for img in images/*.{png,jpg}; do echo "-F images=@$img"; done) \
-o document.pdf
```
### Example 4: Python Script
```python
import requests
# Prepare files
files = {
'html_file': open('report.html', 'rb'),
}
# Add images
images = [
('images', open('image1.png', 'rb')),
('images', open('image2.jpg', 'rb')),
]
# Optional parameters
data = {
'aspect_ratio': '9:16',
'auto_detect': 'false'
}
# Make request
response = requests.post(
'https://abdallalswaiti-htmlpdfs.hf.space/convert',
files=files,
data=data,
files=files + images
)
# Save PDF
if response.status_code == 200:
with open('output.pdf', 'wb') as f:
f.write(response.content)
print("PDF generated successfully!")
else:
print(f"Error: {response.status_code}")
print(response.text)
```
### Example 5: JavaScript/Node.js
```javascript
const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');
async function convertToPDF() {
const form = new FormData();
// Add HTML file
form.append('html_file', fs.createReadStream('report.html'));
// Add images
form.append('images', fs.createReadStream('image1.png'));
form.append('images', fs.createReadStream('image2.jpg'));
// Optional parameters
form.append('aspect_ratio', '9:16');
const response = await fetch(
'https://abdallalswaiti-htmlpdfs.hf.space/convert',
{
method: 'POST',
body: form
}
);
if (response.ok) {
const buffer = await response.arrayBuffer();
fs.writeFileSync('output.pdf', Buffer.from(buffer));
console.log('PDF generated successfully!');
} else {
console.error('Error:', await response.text());
}
}
convertToPDF();
```
## πŸ“ HTML Best Practices
### For Multi-Page Documents
Use page classes to control page breaks:
```html
<div class="page">
<h1>Page 1</h1>
<p>Content here...</p>
</div>
<div class="page">
<h1>Page 2</h1>
<p>More content...</p>
</div>
```
### For Presentations (16:9)
```html
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0, orientation=landscape">
<style>
.slide {
width: 100vw;
height: 100vh;
display: flex;
flex-direction: column;
justify-content: center;
align-items: center;
}
</style>
</head>
<body>
<div class="slide">
<h1>Slide 1</h1>
<img src="chart.png" alt="Chart">
</div>
<div class="slide">
<h1>Slide 2</h1>
<img src="graph.png" alt="Graph">
</div>
</body>
</html>
```
### For Reports (9:16)
```html
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0, orientation=portrait">
<style>
body {
font-family: Arial, sans-serif;
padding: 20px;
}
.page {
min-height: 100vh;
}
</style>
</head>
<body>
<section class="page">
<h1>Annual Report 2024</h1>
<img src="logo.png" alt="Logo" style="width: 200px;">
<p>Report content...</p>
</section>
</body>
</html>
```
## 🎯 Image Handling
### Supported Formats
- PNG, JPG/JPEG
- GIF, SVG
- WebP, BMP
### Image Path Examples
Your HTML can have **any** of these formats:
```html
<!-- All of these work! -->
<img src="logo.png">
<img src="images/logo.png">
<img src="../../../assets/images/logo.png">
<img src="./photos/image.jpg">
<!-- CSS backgrounds too -->
<div style="background-image: url('bg.jpg')"></div>
<div style="background-image: url('../images/bg.jpg')"></div>
```
Just upload the images:
```bash
curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@index.html" \
-F "images=@logo.png" \
-F "images=@bg.jpg" \
-o output.pdf
```
The API automatically:
1. Extracts filenames from paths
2. Normalizes all references to simple filenames
3. Saves images to the same directory as HTML
4. Generates PDF with all images embedded
## πŸ”§ Troubleshooting
### Images Not Showing
- Ensure image filenames match exactly (case-sensitive)
- Upload ALL images referenced in your HTML
- Check that image paths are normalized (the API does this automatically)
### Wrong Aspect Ratio
- Set `auto_detect=false` and specify `aspect_ratio` manually
- Check HTML for viewport meta tags that might override
### Page Breaks in Wrong Places
- Add `class="no-page-break"` to elements that should stay together
- Use `class="page-break"` to force breaks at specific points
### PDF Too Large
- Optimize images before uploading (compress, resize)
- Use appropriate image formats (WebP for photos, PNG for graphics)
## πŸ“Š Response Headers
The API includes useful metadata in response headers:
- `X-Aspect-Ratio`: Detected or specified aspect ratio
- `X-Path-Replacements`: Number of image paths normalized
- `X-PDF-Size`: Size of generated PDF in bytes
**Example:**
```bash
curl -I -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
-F "html_file=@test.html"
# Response headers:
# X-Aspect-Ratio: 9:16
# X-Path-Replacements: 3
# X-PDF-Size: 245678
```
## πŸ› οΈ Technical Details
- **Engine**: Puppeteer (Chromium-based)
- **Backend**: FastAPI (Python)
- **Max Timeout**: 60 seconds per conversion
- **Page Sizes**:
- 16:9 β†’ A4 Landscape (297mm Γ— 210mm)
- 9:16 β†’ A4 Portrait (210mm Γ— 297mm)
- 1:1 β†’ Square (210mm Γ— 210mm)
## πŸ“„ License
This API is provided as-is for public use on Hugging Face Spaces.
## 🀝 Support
For issues or questions, please visit the [Space discussion page](https://huggingface.co/spaces/abdallalswaiti/htmlpdfs/discussions).
---
**Made with ❀️ using FastAPI and Puppeteer**