htmlpdfs / README.md
ABDALLALSWAITI's picture
Update README.md
25fef2d verified
metadata
title: HTML to PDF Converter
emoji: πŸ“„
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
health_check:
  path: /health

HTML to PDF Converter API πŸ“„

Convert HTML files to PDF with automatic image embedding and page break management. Perfect for generating reports, presentations, and documents from HTML.

πŸš€ Quick Start

Basic Conversion (HTML only)

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@your_file.html" \
  -o output.pdf

With Images

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@report.html" \
  -F "images=@image1.png" \
  -F "images=@image2.jpg" \
  -F "images=@logo.svg" \
  -o output.pdf

Custom Aspect Ratio

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@presentation.html" \
  -F "aspect_ratio=16:9" \
  -F "auto_detect=false" \
  -o slides.pdf

πŸ“‹ API Endpoints

POST /convert

Convert HTML file to PDF with optional images.

Parameters:

  • html_file (required): HTML file to convert
  • images (optional): Image files referenced in HTML (can upload multiple)
  • aspect_ratio (optional): 16:9, 1:1, or 9:16
  • auto_detect (optional): Auto-detect aspect ratio from HTML (default: true)

Response:

  • PDF file (application/pdf)
  • Headers include metadata: aspect ratio, image count, PDF size

POST /convert-string

Convert HTML string to PDF (for HTML without external images).

Parameters:

  • html_content (required): HTML content as string
  • aspect_ratio (optional): 16:9, 1:1, or 9:16
  • auto_detect (optional): Auto-detect aspect ratio (default: true)

Example:

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert-string \
  -F "html_content=<html><body><h1>Hello World</h1></body></html>" \
  -o output.pdf

GET /health

Health check endpoint.

curl https://abdallalswaiti-htmlpdfs.hf.space/health

🎨 Features

βœ… Automatic Image Path Normalization

The API automatically converts complex image paths to simple filenames:

Before:

<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/../../../assets/images/logo.png">
<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/images/photo.jpg">

After (automatically):

<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/logo.png">
<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/photo.jpg">

Just upload your images with the images parameter, and they'll work!

βœ… Aspect Ratio Detection

The API automatically detects aspect ratio from:

  • HTML <meta name="viewport"> tags
  • CSS aspect-ratio properties
  • Keywords like "presentation", "slide"

Supported ratios:

  • 16:9 - Landscape (presentations, slides) β†’ A4 Landscape
  • 9:16 - Portrait (reports, documents) β†’ A4 Portrait
  • 1:1 - Square (social media posts) β†’ 210mm Γ— 210mm

βœ… Automatic Page Breaks

The API intelligently handles page breaks:

  • Elements with classes: .page, .slide, section.page
  • Top-level <section>, <article>, <div> elements
  • Prevents breaking inside: headings, images, tables, code blocks

βœ… Color Preservation

All colors, backgrounds, and gradients are preserved in the PDF with print-color-adjust: exact.

πŸ’‘ Usage Examples

Example 1: Simple Report

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@report.html" \
  -o report.pdf

Example 2: Presentation with Images

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@slides.html" \
  -F "images=@chart1.png" \
  -F "images=@chart2.png" \
  -F "images=@logo.svg" \
  -F "aspect_ratio=16:9" \
  -o presentation.pdf

Example 3: Multiple Images from Directory

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@document.html" \
  $(for img in images/*.{png,jpg}; do echo "-F images=@$img"; done) \
  -o document.pdf

Example 4: Python Script

import requests

# Prepare files
files = {
    'html_file': open('report.html', 'rb'),
}

# Add images
images = [
    ('images', open('image1.png', 'rb')),
    ('images', open('image2.jpg', 'rb')),
]

# Optional parameters
data = {
    'aspect_ratio': '9:16',
    'auto_detect': 'false'
}

# Make request
response = requests.post(
    'https://abdallalswaiti-htmlpdfs.hf.space/convert',
    files=files,
    data=data,
    files=files + images
)

# Save PDF
if response.status_code == 200:
    with open('output.pdf', 'wb') as f:
        f.write(response.content)
    print("PDF generated successfully!")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Example 5: JavaScript/Node.js

const FormData = require('form-data');
const fs = require('fs');
const fetch = require('node-fetch');

async function convertToPDF() {
    const form = new FormData();
    
    // Add HTML file
    form.append('html_file', fs.createReadStream('report.html'));
    
    // Add images
    form.append('images', fs.createReadStream('image1.png'));
    form.append('images', fs.createReadStream('image2.jpg'));
    
    // Optional parameters
    form.append('aspect_ratio', '9:16');
    
    const response = await fetch(
        'https://abdallalswaiti-htmlpdfs.hf.space/convert',
        {
            method: 'POST',
            body: form
        }
    );
    
    if (response.ok) {
        const buffer = await response.arrayBuffer();
        fs.writeFileSync('output.pdf', Buffer.from(buffer));
        console.log('PDF generated successfully!');
    } else {
        console.error('Error:', await response.text());
    }
}

convertToPDF();

πŸ“ HTML Best Practices

For Multi-Page Documents

Use page classes to control page breaks:

<div class="page">
    <h1>Page 1</h1>
    <p>Content here...</p>
</div>

<div class="page">
    <h1>Page 2</h1>
    <p>More content...</p>
</div>

For Presentations (16:9)

<!DOCTYPE html>
<html>
<head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0, orientation=landscape">
    <style>
        .slide {
            width: 100vw;
            height: 100vh;
            display: flex;
            flex-direction: column;
            justify-content: center;
            align-items: center;
        }
    </style>
</head>
<body>
    <div class="slide">
        <h1>Slide 1</h1>
        <img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/chart.png" alt="Chart">
    </div>
    
    <div class="slide">
        <h1>Slide 2</h1>
        <img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/graph.png" alt="Graph">
    </div>
</body>
</html>

For Reports (9:16)

<!DOCTYPE html>
<html>
<head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0, orientation=portrait">
    <style>
        body {
            font-family: Arial, sans-serif;
            padding: 20px;
        }
        .page {
            min-height: 100vh;
        }
    </style>
</head>
<body>
    <section class="page">
        <h1>Annual Report 2024</h1>
        <img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/logo.png" alt="Logo" style="width: 200px;">
        <p>Report content...</p>
    </section>
</body>
</html>

🎯 Image Handling

Supported Formats

  • PNG, JPG/JPEG
  • GIF, SVG
  • WebP, BMP

Image Path Examples

Your HTML can have any of these formats:

<!-- All of these work! -->
<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/logo.png">
<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/images/logo.png">
<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/../../../assets/images/logo.png">
<img src="/spaces/ABDALLALSWAITI/htmlpdfs/resolve/main/photos/image.jpg">

<!-- CSS backgrounds too -->
<div style="background-image: url('bg.jpg')"></div>
<div style="background-image: url('../images/bg.jpg')"></div>

Just upload the images:

curl -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@index.html" \
  -F "images=@logo.png" \
  -F "images=@bg.jpg" \
  -o output.pdf

The API automatically:

  1. Extracts filenames from paths
  2. Normalizes all references to simple filenames
  3. Saves images to the same directory as HTML
  4. Generates PDF with all images embedded

πŸ”§ Troubleshooting

Images Not Showing

  • Ensure image filenames match exactly (case-sensitive)
  • Upload ALL images referenced in your HTML
  • Check that image paths are normalized (the API does this automatically)

Wrong Aspect Ratio

  • Set auto_detect=false and specify aspect_ratio manually
  • Check HTML for viewport meta tags that might override

Page Breaks in Wrong Places

  • Add class="no-page-break" to elements that should stay together
  • Use class="page-break" to force breaks at specific points

PDF Too Large

  • Optimize images before uploading (compress, resize)
  • Use appropriate image formats (WebP for photos, PNG for graphics)

πŸ“Š Response Headers

The API includes useful metadata in response headers:

  • X-Aspect-Ratio: Detected or specified aspect ratio
  • X-Path-Replacements: Number of image paths normalized
  • X-PDF-Size: Size of generated PDF in bytes

Example:

curl -I -X POST https://abdallalswaiti-htmlpdfs.hf.space/convert \
  -F "html_file=@test.html"

# Response headers:
# X-Aspect-Ratio: 9:16
# X-Path-Replacements: 3
# X-PDF-Size: 245678

πŸ› οΈ Technical Details

  • Engine: Puppeteer (Chromium-based)
  • Backend: FastAPI (Python)
  • Max Timeout: 60 seconds per conversion
  • Page Sizes:
    • 16:9 β†’ A4 Landscape (297mm Γ— 210mm)
    • 9:16 β†’ A4 Portrait (210mm Γ— 297mm)
    • 1:1 β†’ Square (210mm Γ— 210mm)

πŸ“„ License

This API is provided as-is for public use on Hugging Face Spaces.

🀝 Support

For issues or questions, please visit the Space discussion page.


Made with ❀️ using FastAPI and Puppeteer