pdf-4 / USAGE_GUIDE.md
fokan's picture
Initial commit with static file serving and inline PDF viewing
623e14e
# Usage Guide for Enhanced DOCX to PDF Converter
This guide explains how to use the enhanced DOCX to PDF converter, which has been completely redesigned from the original Gradio-based version to a professional FastAPI service.
## Getting Started
### Prerequisites
- Docker and Docker Compose installed
- At least 4GB of available RAM
- Internet connection for initial setup
### Quick Start
1. Clone or download this repository
2. Navigate to the project directory
3. Run the service:
```bash
docker-compose up --build
```
4. Access the API at `http://localhost:8000`
5. View API documentation at `http://localhost:8000/docs`
## API Endpoints
### Convert Single DOCX File
**POST** `/convert`
Converts a single DOCX file to PDF.
#### Using Multipart File Upload:
```bash
curl -X POST "http://localhost:8000/convert" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.docx"
```
#### Using Base64 Content:
```bash
# First encode your file to base64
BASE64_CONTENT=$(base64 -i document.docx)
# Then send the request
curl -X POST "http://localhost:8000/convert" \
-H "accept: application/json" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "file_content=$BASE64_CONTENT" \
-d "filename=document.docx"
```
#### Response:
```json
{
"success": true,
"pdf_url": "/download/abc123/document.pdf",
"message": "Conversion successful"
}
```
### Batch Convert Multiple DOCX Files
**POST** `/convert/batch`
Converts multiple DOCX files in a single request.
```bash
curl -X POST "http://localhost:8000/convert/batch" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"file_content": "base64_encoded_content_1",
"filename": "document1.docx"
},
{
"file_content": "base64_encoded_content_2",
"filename": "document2.docx"
}
]
}'
```
#### Response:
```json
[
{
"success": true,
"pdf_url": "/download/abc123/document1.pdf",
"message": "Conversion successful"
},
{
"success": false,
"error": "Error description"
}
]
```
### Download Converted PDF
**GET** `/download/{temp_id}/{filename}`
Downloads a converted PDF file.
```bash
curl -X GET "http://localhost:8000/download/abc123/document.pdf" \
-o document.pdf
```
### Health Check
**GET** `/health`
Checks if the service is running.
```bash
curl -X GET "http://localhost:8000/health"
```
Response:
```json
{
"status": "healthy",
"version": "2.0.0"
}
```
## Browser Integration
The API includes full CORS support for direct browser integration. You can use the Fetch API or XMLHttpRequest to communicate directly with the service from web applications.
### Example JavaScript Integration:
```javascript
// Convert and download a file
async function convertDocxToPdf(file) {
const formData = new FormData();
formData.append('file', file);
try {
const response = await fetch('http://localhost:8000/convert', {
method: 'POST',
body: formData
});
const result = await response.json();
if (result.success) {
// Open PDF in new tab
window.open('http://localhost:8000' + result.pdf_url, '_blank');
// Or download directly
const link = document.createElement('a');
link.href = 'http://localhost:8000' + result.pdf_url;
link.download = 'converted.pdf';
link.click();
} else {
console.error('Conversion failed:', result.error);
}
} catch (error) {
console.error('Network error:', error);
}
}
```
## Configuration
The service can be configured using environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| `PORT` | Application port | 8000 |
| `MAX_FILE_SIZE` | Maximum file size in bytes | 52428800 (50MB) |
| `MAX_CONVERSION_TIME` | Conversion timeout in seconds | 120 |
| `TEMP_DIR` | Temporary directory for conversions | /tmp/conversions |
| `CORS_ORIGINS` | CORS allowed origins | * |
### Example with custom configuration:
```bash
PORT=8080 MAX_FILE_SIZE=104857600 docker-compose up
```
## File Handling
### Supported File Types
- DOCX (Microsoft Word documents)
### File Size Limits
- Default maximum: 50MB
- Configurable via `MAX_FILE_SIZE` environment variable
### Storage
- Converted files are stored temporarily in the `conversions` directory
- This directory is mounted as a Docker volume for persistence
- Files are automatically cleaned up when the container is restarted
## Error Handling
The API provides detailed error messages for troubleshooting:
- `400 Bad Request`: Invalid input parameters
- `413 Payload Too Large`: File exceeds size limits
- `500 Internal Server Error`: Conversion failed
Example error response:
```json
{
"success": false,
"error": "File too large"
}
```
## Performance Considerations
### Batch Processing
For converting multiple files, use the batch endpoint to reduce overhead:
```bash
curl -X POST "http://localhost:8000/convert/batch" \
-H "Content-Type: application/json" \
-d '{"files": [...]}'
```
### Resource Usage
- Each conversion uses a separate LibreOffice instance
- Monitor memory usage for large files
- Consider scaling the service for high-volume usage
## Troubleshooting
### Common Issues
1. **Service won't start**:
- Ensure Docker and Docker Compose are installed
- Check that port 8000 is not in use
- Verify sufficient system resources
2. **Conversion fails**:
- Check that the DOCX file is valid
- Verify file size is within limits
- Review logs with `docker-compose logs`
3. **Download fails**:
- Ensure the file hasn't been cleaned up
- Check the download URL is correct
### Viewing Logs
```bash
docker-compose logs -f docx-to-pdf-enhanced
```
## Testing
Run the test suite:
```bash
docker-compose run --rm docx-to-pdf-enhanced python3 -m pytest tests/
```
## Deployment
See [DEPLOYMENT_ENHANCED.md](DEPLOYMENT_ENHANCED.md) for detailed deployment instructions for production environments.
## Security
- Files are validated for type and size
- Only DOCX files are accepted
- CORS can be configured for production use
- Run containers with minimal privileges
This enhanced version provides a robust, scalable solution for converting DOCX files to PDF with excellent Arabic language support and formatting preservation.