python-doc-convert / README.md
omthakur1's picture
v2.0: Add all PDF operations - PDF to Word, Image OCR, PDF Split/Merge
8e7152e
---
title: Document Conversion API
emoji: πŸ“„
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
---
# πŸ“„ Document Conversion API - Word to PDF
Free, self-hosted document conversion service using LibreOffice. Deploy on Hugging Face Spaces for unlimited FREE usage!
## ✨ Features
- **100% FREE** - No API keys, no limits, no credit card
- **High Quality** - Uses LibreOffice for professional PDF conversion
- **Fast** - Converts documents in seconds
- **Self-Hosted** - Complete control and privacy
- **Multiple Formats** - Supports DOCX, DOC, ODT, RTF, TXT β†’ PDF
## πŸš€ Quick Deploy to Hugging Face Spaces
### Step 1: Create a New Space
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Fill in:
- **Space name**: `nextools-doc-converter` (or your choice)
- **License**: Apache 2.0
- **Select the SDK**: **Docker**
- **Space hardware**: CPU basic (FREE)
- **Visibility**: Public
### Step 2: Upload Files
Upload these 3 files to your Space:
1. `Dockerfile`
2. `app.py`
3. `requirements.txt`
### Step 3: Wait for Build
- Hugging Face will automatically build your Docker container
- Takes about 5-10 minutes (first time only)
- Watch the logs for "Application startup complete"
### Step 4: Get Your API URL
Your API will be available at:
```
https://YOUR-USERNAME-nextools-doc-converter.hf.space
```
### Step 5: Add to Your Vercel .env.local
```bash
# Document Conversion API
DOC_CONVERSION_API_URL=https://YOUR-USERNAME-nextools-doc-converter.hf.space
```
## πŸ“‘ API Usage
### Convert Document to PDF
**Endpoint:** `POST /convert`
**cURL Example:**
```bash
curl -X POST \
https://YOUR-USERNAME-nextools-doc-converter.hf.space/convert \
-F "file=@document.docx" \
--output converted.pdf
```
**JavaScript Example:**
```javascript
const formData = new FormData();
formData.append('file', file);
const response = await fetch('https://YOUR-API-URL/convert', {
method: 'POST',
body: formData
});
const pdfBlob = await response.blob();
```
### Health Check
**Endpoint:** `GET /health`
```bash
curl https://YOUR-API-URL/health
```
**Response:**
```json
{
"status": "healthy",
"libreoffice": true,
"message": "Service is running"
}
```
## πŸ”§ Test Locally (Optional)
### Using Docker:
```bash
# Build
docker build -t doc-converter .
# Run
docker run -p 7860:7860 doc-converter
# Test
curl -X POST http://localhost:7860/convert \
-F "file=@test.docx" \
--output converted.pdf
```
### Using Python (requires LibreOffice installed):
```bash
# Install LibreOffice first:
# Ubuntu/Debian: sudo apt install libreoffice
# Mac: brew install libreoffice
# Windows: Download from libreoffice.org
# Install dependencies
pip install -r requirements.txt
# Run
python app.py
# Test
curl -X POST http://localhost:7860/convert \
-F "file=@test.docx" \
--output converted.pdf
```
## πŸ“Š Supported Formats
### Input Formats:
- `.docx` - Microsoft Word (2007+)
- `.doc` - Microsoft Word (97-2003)
- `.odt` - OpenDocument Text
- `.rtf` - Rich Text Format
- `.txt` - Plain Text
### Output Format:
- `.pdf` - PDF (Portable Document Format)
## 🎯 Why Hugging Face Spaces?
1. **FREE Forever** - No billing, no credit card
2. **No Rate Limits** - Unlimited conversions
3. **Always Online** - 99.9% uptime
4. **Fast** - Global CDN delivery
5. **Easy Deploy** - Just upload files
6. **Auto-Scaling** - Handles traffic spikes
## πŸ”’ Security & Privacy
- Files are processed in memory
- Automatic cleanup after conversion
- No data is stored or logged
- CORS enabled for your domains
- SSL/HTTPS encryption
## πŸ› Troubleshooting
### Build Failed?
- Check Dockerfile syntax
- Ensure all files are uploaded
- Wait for LibreOffice installation to complete
### Conversion Failed?
- Check file format is supported
- Verify file is not corrupted
- Check logs in Hugging Face dashboard
### Timeout?
- Large files (>10MB) may take longer
- Consider increasing timeout in Dockerfile
- Split large documents
## πŸ“ Notes
- **First conversion** may take 5-10 seconds (LibreOffice startup)
- **Subsequent conversions** are much faster (~1-2 seconds)
- **Maximum file size**: 50MB (configurable)
- **Concurrent requests**: Supported with workers
## πŸ”— Integration with NexTools
Update your `app/api/pdf-convert/route.ts`:
```typescript
// Use Hugging Face API for Word to PDF
async function wordToPdf(fileBuffer: Buffer) {
const apiUrl = process.env.DOC_CONVERSION_API_URL;
if (!apiUrl) {
throw new Error('DOC_CONVERSION_API_URL not configured');
}
const formData = new FormData();
formData.append('file', new Blob([fileBuffer]), 'document.docx');
const response = await fetch(`${apiUrl}/convert`, {
method: 'POST',
body: formData,
});
if (!response.ok) {
throw new Error('Conversion failed');
}
const pdfBuffer = Buffer.from(await response.arrayBuffer());
return {
content: pdfBuffer.toString('base64'),
mimeType: 'application/pdf',
fileName: 'converted.pdf',
fileType: 'PDF',
pages: 1, // Calculate if needed
};
}
```
## πŸ“ž Support
- **Issues**: Report on GitHub
- **Questions**: Ask in Hugging Face discussions
- **Updates**: Watch this repository
## πŸ“œ License
Apache 2.0 License - Free for commercial and personal use
---
Made with ❀️ for NexTools - Your All-in-One SaaS Platform