pdf-4 / USAGE_GUIDE.md
fokan's picture
Initial commit with static file serving and inline PDF viewing
623e14e

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

Usage Guide for Enhanced DOCX to PDF Converter

This guide explains how to use the enhanced DOCX to PDF converter, which has been completely redesigned from the original Gradio-based version to a professional FastAPI service.

Getting Started

Prerequisites

  • Docker and Docker Compose installed
  • At least 4GB of available RAM
  • Internet connection for initial setup

Quick Start

  1. Clone or download this repository
  2. Navigate to the project directory
  3. Run the service:
    docker-compose up --build
    
  4. Access the API at http://localhost:8000
  5. View API documentation at http://localhost:8000/docs

API Endpoints

Convert Single DOCX File

POST /convert

Converts a single DOCX file to PDF.

Using Multipart File Upload:

curl -X POST "http://localhost:8000/convert" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.docx"

Using Base64 Content:

# First encode your file to base64
BASE64_CONTENT=$(base64 -i document.docx)

# Then send the request
curl -X POST "http://localhost:8000/convert" \
  -H "accept: application/json" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "file_content=$BASE64_CONTENT" \
  -d "filename=document.docx"

Response:

{
  "success": true,
  "pdf_url": "/download/abc123/document.pdf",
  "message": "Conversion successful"
}

Batch Convert Multiple DOCX Files

POST /convert/batch

Converts multiple DOCX files in a single request.

curl -X POST "http://localhost:8000/convert/batch" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "file_content": "base64_encoded_content_1",
        "filename": "document1.docx"
      },
      {
        "file_content": "base64_encoded_content_2",
        "filename": "document2.docx"
      }
    ]
  }'

Response:

[
  {
    "success": true,
    "pdf_url": "/download/abc123/document1.pdf",
    "message": "Conversion successful"
  },
  {
    "success": false,
    "error": "Error description"
  }
]

Download Converted PDF

GET /download/{temp_id}/{filename}

Downloads a converted PDF file.

curl -X GET "http://localhost:8000/download/abc123/document.pdf" \
  -o document.pdf

Health Check

GET /health

Checks if the service is running.

curl -X GET "http://localhost:8000/health"

Response:

{
  "status": "healthy",
  "version": "2.0.0"
}

Browser Integration

The API includes full CORS support for direct browser integration. You can use the Fetch API or XMLHttpRequest to communicate directly with the service from web applications.

Example JavaScript Integration:

// Convert and download a file
async function convertDocxToPdf(file) {
  const formData = new FormData();
  formData.append('file', file);
  
  try {
    const response = await fetch('http://localhost:8000/convert', {
      method: 'POST',
      body: formData
    });
    
    const result = await response.json();
    
    if (result.success) {
      // Open PDF in new tab
      window.open('http://localhost:8000' + result.pdf_url, '_blank');
      
      // Or download directly
      const link = document.createElement('a');
      link.href = 'http://localhost:8000' + result.pdf_url;
      link.download = 'converted.pdf';
      link.click();
    } else {
      console.error('Conversion failed:', result.error);
    }
  } catch (error) {
    console.error('Network error:', error);
  }
}

Configuration

The service can be configured using environment variables:

Variable Description Default
PORT Application port 8000
MAX_FILE_SIZE Maximum file size in bytes 52428800 (50MB)
MAX_CONVERSION_TIME Conversion timeout in seconds 120
TEMP_DIR Temporary directory for conversions /tmp/conversions
CORS_ORIGINS CORS allowed origins *

Example with custom configuration:

PORT=8080 MAX_FILE_SIZE=104857600 docker-compose up

File Handling

Supported File Types

  • DOCX (Microsoft Word documents)

File Size Limits

  • Default maximum: 50MB
  • Configurable via MAX_FILE_SIZE environment variable

Storage

  • Converted files are stored temporarily in the conversions directory
  • This directory is mounted as a Docker volume for persistence
  • Files are automatically cleaned up when the container is restarted

Error Handling

The API provides detailed error messages for troubleshooting:

  • 400 Bad Request: Invalid input parameters
  • 413 Payload Too Large: File exceeds size limits
  • 500 Internal Server Error: Conversion failed

Example error response:

{
  "success": false,
  "error": "File too large"
}

Performance Considerations

Batch Processing

For converting multiple files, use the batch endpoint to reduce overhead:

curl -X POST "http://localhost:8000/convert/batch" \
  -H "Content-Type: application/json" \
  -d '{"files": [...]}'

Resource Usage

  • Each conversion uses a separate LibreOffice instance
  • Monitor memory usage for large files
  • Consider scaling the service for high-volume usage

Troubleshooting

Common Issues

  1. Service won't start:

    • Ensure Docker and Docker Compose are installed
    • Check that port 8000 is not in use
    • Verify sufficient system resources
  2. Conversion fails:

    • Check that the DOCX file is valid
    • Verify file size is within limits
    • Review logs with docker-compose logs
  3. Download fails:

    • Ensure the file hasn't been cleaned up
    • Check the download URL is correct

Viewing Logs

docker-compose logs -f docx-to-pdf-enhanced

Testing

Run the test suite:

docker-compose run --rm docx-to-pdf-enhanced python3 -m pytest tests/

Deployment

See DEPLOYMENT_ENHANCED.md for detailed deployment instructions for production environments.

Security

  • Files are validated for type and size
  • Only DOCX files are accepted
  • CORS can be configured for production use
  • Run containers with minimal privileges

This enhanced version provides a robust, scalable solution for converting DOCX files to PDF with excellent Arabic language support and formatting preservation.