Spaces:
Sleeping
Sleeping
optimize: Reduce Docker image size - remove GUI deps, add aggressive cleanup
Browse files- .gitignore +13 -0
- DEPLOY_GUIDE.md +173 -0
- Dockerfile +20 -9
- test.txt +1 -0
.gitignore
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
__pycache__/
|
| 2 |
+
*.pyc
|
| 3 |
+
*.pyo
|
| 4 |
+
*.pyd
|
| 5 |
+
.Python
|
| 6 |
+
venv/
|
| 7 |
+
env/
|
| 8 |
+
*.log
|
| 9 |
+
.DS_Store
|
| 10 |
+
*.pdf
|
| 11 |
+
*.docx
|
| 12 |
+
*.doc
|
| 13 |
+
temp/
|
DEPLOY_GUIDE.md
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π Deploy Word to PDF Converter to Hugging Face
|
| 2 |
+
|
| 3 |
+
## Quick 5-Minute Setup
|
| 4 |
+
|
| 5 |
+
### Step 1: Create Hugging Face Space
|
| 6 |
+
|
| 7 |
+
1. Go to **https://huggingface.co/spaces**
|
| 8 |
+
2. Click **"Create new Space"**
|
| 9 |
+
3. Fill in:
|
| 10 |
+
- **Space name**: `nextools-doc-converter`
|
| 11 |
+
- **License**: Apache 2.0
|
| 12 |
+
- **SDK**: Select **Docker** β οΈ IMPORTANT!
|
| 13 |
+
- **Space hardware**: CPU basic (FREE)
|
| 14 |
+
- **Visibility**: Public
|
| 15 |
+
|
| 16 |
+
4. Click **Create Space**
|
| 17 |
+
|
| 18 |
+
### Step 2: Upload Files
|
| 19 |
+
|
| 20 |
+
In your Space repository, upload these 4 files from `python-doc-convert/`:
|
| 21 |
+
|
| 22 |
+
1. β
`Dockerfile`
|
| 23 |
+
2. β
`app.py`
|
| 24 |
+
3. β
`requirements.txt`
|
| 25 |
+
4. β
`README.md` (optional)
|
| 26 |
+
|
| 27 |
+
**How to Upload:**
|
| 28 |
+
- Click **"Files and versions"** tab
|
| 29 |
+
- Click **"Add file"** β **"Upload files"**
|
| 30 |
+
- Drag and drop all 4 files
|
| 31 |
+
- Click **"Commit changes to main"**
|
| 32 |
+
|
| 33 |
+
### Step 3: Wait for Build (~5-10 min)
|
| 34 |
+
|
| 35 |
+
- Hugging Face will automatically build your Docker container
|
| 36 |
+
- Click **"Logs"** to watch progress
|
| 37 |
+
- Wait for: β
**"Application startup complete"**
|
| 38 |
+
|
| 39 |
+
### Step 4: Test Your API
|
| 40 |
+
|
| 41 |
+
Your API URL will be:
|
| 42 |
+
```
|
| 43 |
+
https://YOUR-USERNAME-nextools-doc-converter.hf.space
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
Test it:
|
| 47 |
+
```bash
|
| 48 |
+
# Health check
|
| 49 |
+
curl https://YOUR-USERNAME-nextools-doc-converter.hf.space/health
|
| 50 |
+
|
| 51 |
+
# Convert a document
|
| 52 |
+
curl -X POST https://YOUR-USERNAME-nextools-doc-converter.hf.space/convert \
|
| 53 |
+
-F "file=@test.docx" \
|
| 54 |
+
--output converted.pdf
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### Step 5: Add to Your Vercel Project
|
| 58 |
+
|
| 59 |
+
Update `.env.local`:
|
| 60 |
+
```bash
|
| 61 |
+
DOC_CONVERSION_API_URL=https://YOUR-USERNAME-nextools-doc-converter.hf.space
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
**Important:** Also add this to your **Vercel Dashboard**:
|
| 65 |
+
1. Go to your project on Vercel
|
| 66 |
+
2. Settings β Environment Variables
|
| 67 |
+
3. Add: `DOC_CONVERSION_API_URL` = `https://YOUR-USERNAME-nextools-doc-converter.hf.space`
|
| 68 |
+
4. Redeploy your site
|
| 69 |
+
|
| 70 |
+
### Step 6: Test on Your Site
|
| 71 |
+
|
| 72 |
+
1. Go to your NexTools site
|
| 73 |
+
2. Navigate to "Word to PDF" tool
|
| 74 |
+
3. Upload a `.docx` file
|
| 75 |
+
4. Click Convert
|
| 76 |
+
5. Download your PDF! π
|
| 77 |
+
|
| 78 |
+
---
|
| 79 |
+
|
| 80 |
+
## π― Why This Works on Vercel
|
| 81 |
+
|
| 82 |
+
### The Problem:
|
| 83 |
+
- **Vercel** = Serverless (no system tools like LibreOffice)
|
| 84 |
+
- **Local** = Your computer has LibreOffice installed
|
| 85 |
+
- **Result** = Works locally, fails on Vercel β
|
| 86 |
+
|
| 87 |
+
### The Solution:
|
| 88 |
+
- **Hugging Face Space** = Full Docker container with LibreOffice
|
| 89 |
+
- **Free Forever** = No cost, no limits
|
| 90 |
+
- **Your Vercel Site** β API call to HF Space
|
| 91 |
+
- **Result** = Works everywhere! β
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## π₯ Benefits
|
| 96 |
+
|
| 97 |
+
β
**FREE Forever** - No API costs
|
| 98 |
+
β
**No Rate Limits** - Unlimited conversions
|
| 99 |
+
β
**Professional Quality** - Real LibreOffice conversion
|
| 100 |
+
β
**Fast** - ~2 seconds per document
|
| 101 |
+
β
**Reliable** - 99.9% uptime
|
| 102 |
+
β
**Scalable** - Auto-scales with traffic
|
| 103 |
+
β
**Private** - Your own instance
|
| 104 |
+
|
| 105 |
+
---
|
| 106 |
+
|
| 107 |
+
## π Troubleshooting
|
| 108 |
+
|
| 109 |
+
### Build Failed?
|
| 110 |
+
- Make sure you selected **Docker** as SDK (not Gradio or Streamlit)
|
| 111 |
+
- Check all 3 required files are uploaded
|
| 112 |
+
- Wait 5-10 minutes for first build
|
| 113 |
+
|
| 114 |
+
### 503 Service Unavailable?
|
| 115 |
+
- Space is still building - check Logs tab
|
| 116 |
+
- Space went to sleep - first request wakes it up (30 sec delay)
|
| 117 |
+
- Make a test request to wake it up
|
| 118 |
+
|
| 119 |
+
### Conversion Failed?
|
| 120 |
+
- Check file format is `.docx`, `.doc`, `.odt`, `.rtf`, or `.txt`
|
| 121 |
+
- File size should be < 50MB
|
| 122 |
+
- Test locally first with curl
|
| 123 |
+
|
| 124 |
+
### API URL Not Working?
|
| 125 |
+
- Copy exact URL from your Space (should end with `.hf.space`)
|
| 126 |
+
- Don't add `/convert` to env variable - just base URL
|
| 127 |
+
- Check it's accessible in browser
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## π Your Setup Now
|
| 132 |
+
|
| 133 |
+
```
|
| 134 |
+
User Upload .docx
|
| 135 |
+
β
|
| 136 |
+
Your NexTools Site (Vercel)
|
| 137 |
+
β
|
| 138 |
+
Next.js API Route
|
| 139 |
+
β
|
| 140 |
+
Hugging Face Space (LibreOffice) β FREE!
|
| 141 |
+
β
|
| 142 |
+
Convert to PDF
|
| 143 |
+
β
|
| 144 |
+
Return to User
|
| 145 |
+
```
|
| 146 |
+
|
| 147 |
+
**Everything works on Vercel now!** π
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## π‘ Pro Tips
|
| 152 |
+
|
| 153 |
+
1. **Multiple Spaces**: Create 2-3 spaces for redundancy
|
| 154 |
+
2. **Custom URL**: Add all space URLs to env (fallback system)
|
| 155 |
+
3. **Monitor**: Check HF dashboard for usage stats
|
| 156 |
+
4. **Updates**: Update app.py and docker will rebuild automatically
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## π Success Checklist
|
| 161 |
+
|
| 162 |
+
- [ ] Created Hugging Face Space with Docker SDK
|
| 163 |
+
- [ ] Uploaded all 3 files (Dockerfile, app.py, requirements.txt)
|
| 164 |
+
- [ ] Space built successfully (check logs)
|
| 165 |
+
- [ ] Health check returns `{"status": "healthy"}`
|
| 166 |
+
- [ ] Test conversion works with curl
|
| 167 |
+
- [ ] Added `DOC_CONVERSION_API_URL` to Vercel env
|
| 168 |
+
- [ ] Redeployed Vercel site
|
| 169 |
+
- [ ] Tested on live site - Word to PDF works!
|
| 170 |
+
|
| 171 |
+
---
|
| 172 |
+
|
| 173 |
+
**Need Help?** Check the full [README.md](./README.md) in python-doc-convert folder
|
Dockerfile
CHANGED
|
@@ -1,24 +1,35 @@
|
|
| 1 |
# Hugging Face Spaces Dockerfile for Document Conversion
|
| 2 |
-
#
|
| 3 |
|
| 4 |
FROM python:3.10-slim
|
| 5 |
|
| 6 |
-
# Install
|
| 7 |
-
RUN apt-get update && apt-get install -y \
|
| 8 |
-
|
| 9 |
-
libreoffice-writer \
|
| 10 |
-
libreoffice-calc \
|
| 11 |
-
libreoffice-impress \
|
|
|
|
| 12 |
tesseract-ocr \
|
| 13 |
tesseract-ocr-eng \
|
|
|
|
| 14 |
default-jre-headless \
|
|
|
|
| 15 |
libgl1 \
|
| 16 |
libglib2.0-0 \
|
| 17 |
libsm6 \
|
| 18 |
libxext6 \
|
| 19 |
-
|
|
|
|
| 20 |
&& apt-get clean \
|
| 21 |
-
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
# Set working directory
|
| 24 |
WORKDIR /app
|
|
|
|
| 1 |
# Hugging Face Spaces Dockerfile for Document Conversion
|
| 2 |
+
# Optimized for minimal size and fast startup
|
| 3 |
|
| 4 |
FROM python:3.10-slim
|
| 5 |
|
| 6 |
+
# Install ONLY essential packages (no GUI, no unnecessary libraries)
|
| 7 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 8 |
+
# LibreOffice headless only (no GUI components)
|
| 9 |
+
libreoffice-writer-nogui \
|
| 10 |
+
libreoffice-calc-nogui \
|
| 11 |
+
libreoffice-impress-nogui \
|
| 12 |
+
# Tesseract OCR (English only)
|
| 13 |
tesseract-ocr \
|
| 14 |
tesseract-ocr-eng \
|
| 15 |
+
# Minimal Java runtime for LibreOffice
|
| 16 |
default-jre-headless \
|
| 17 |
+
# OpenCV system dependencies (minimal)
|
| 18 |
libgl1 \
|
| 19 |
libglib2.0-0 \
|
| 20 |
libsm6 \
|
| 21 |
libxext6 \
|
| 22 |
+
libxrender1 \
|
| 23 |
+
# Aggressive cleanup to reduce image size
|
| 24 |
&& apt-get clean \
|
| 25 |
+
&& rm -rf /var/lib/apt/lists/* \
|
| 26 |
+
&& rm -rf /tmp/* /var/tmp/* \
|
| 27 |
+
# Remove LibreOffice bloat (galleries, extra fonts, docs)
|
| 28 |
+
&& rm -rf /usr/lib/libreoffice/share/gallery \
|
| 29 |
+
&& rm -rf /usr/share/fonts/truetype/liberation \
|
| 30 |
+
&& rm -rf /usr/share/doc \
|
| 31 |
+
&& rm -rf /usr/share/man \
|
| 32 |
+
&& rm -rf /usr/share/locale
|
| 33 |
|
| 34 |
# Set working directory
|
| 35 |
WORKDIR /app
|
test.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Test Document
|