fokan commited on
Commit
61b4298
·
1 Parent(s): fddf9e7
DEPLOYMENT_ENHANCED.md CHANGED
@@ -1,330 +1,176 @@
1
- # Enhanced Deployment Guide
2
 
3
- ## Prerequisites
4
 
5
- - Ubuntu 20.04+ or compatible Linux distribution
6
- - Docker Engine 20.10+
7
  - Docker Compose 1.29+
8
- - Minimum 4GB RAM (8GB recommended)
9
- - Minimum 20GB free disk space
10
 
11
- ## System Package Installation
12
 
13
- ### Core Dependencies
14
 
15
- ```bash
16
- # Update package repositories
17
- sudo apt-get update
18
-
19
- # Install system dependencies
20
- sudo apt-get install -y \
21
- python3 \
22
- python3-pip \
23
- libreoffice \
24
- libreoffice-writer \
25
- libreoffice-l10n-ar \
26
- fonts-liberation \
27
- fonts-liberation2 \
28
- fonts-dejavu \
29
- fonts-dejavu-core \
30
- fonts-dejavu-extra \
31
- fonts-croscore \
32
- fonts-noto-core \
33
- fonts-noto-ui-core \
34
- fonts-noto-mono \
35
- fonts-noto-color-emoji \
36
- fonts-opensymbol \
37
- fonts-freefont-ttf \
38
- fontconfig \
39
- wget \
40
- curl \
41
- unzip \
42
- locales
43
- ```
44
-
45
- ### Locale Configuration
46
-
47
- ```bash
48
- # Generate Arabic locale
49
- sudo locale-gen ar_SA.UTF-8
50
-
51
- # Update locale settings
52
- sudo update-locale LANG=ar_SA.UTF-8 LC_ALL=ar_SA.UTF-8
53
- ```
54
-
55
- ## Font Configuration
56
-
57
- ### Enhanced Arabic Font Support
58
-
59
- The application includes automatic font installation for optimal Arabic text rendering:
60
-
61
- ```bash
62
- # Make font installation script executable
63
- chmod +x arabic_fonts_setup.sh
64
-
65
- # Run font installation
66
- ./arabic_fonts_setup.sh
67
- ```
68
 
69
- This script installs:
70
- - **Amiri**: Professional Arabic font with excellent readability
71
- - **Scheherazade New**: Unicode-compliant Arabic font with extensive character support
72
 
73
- ### Font Cache Update
 
 
 
74
 
75
- ```bash
76
- # Update system font cache
77
- sudo fc-cache -fv
 
78
 
79
- # Verify font installation
80
- fc-list | grep -i "amiri\|scheherazade"
81
- ```
82
 
83
- ## Python Environment Setup
 
 
 
 
 
 
 
 
84
 
85
- ### Virtual Environment (Recommended)
 
 
 
86
 
87
- ```bash
88
- # Create virtual environment
89
- python3 -m venv venv
 
90
 
91
- # Activate virtual environment
92
- source venv/bin/activate
93
 
94
- # Install Python dependencies
95
- pip install -r requirements.txt
96
- ```
97
 
98
- ### Direct Installation
 
 
 
 
 
 
 
99
 
 
100
  ```bash
101
- # Install Python packages
102
- pip3 install --no-cache-dir -r requirements.txt
103
- ```
104
-
105
- ## Docker Deployment
106
-
107
- ### Docker Compose Configuration
108
-
109
- The provided `docker-compose.yml` includes optimized settings:
110
-
111
- ```yaml
112
- version: '3.8'
113
-
114
- services:
115
- docx-to-pdf-arabic:
116
- build: .
117
- container_name: docx-pdf-converter-arabic
118
- ports:
119
- - "7860:7860"
120
- environment:
121
- - LANG=ar_SA.UTF-8
122
- - LC_ALL=ar_SA.UTF-8
123
- - PYTHONUNBUFFERED=1
124
- - TEMP_DIR=/tmp/conversions
125
- - STATIC_DIR=/app/static
126
- volumes:
127
- - ./static:/app/static
128
- - ./conversions:/tmp/conversions
129
- restart: unless-stopped
130
- healthcheck:
131
- test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
132
- interval: 30s
133
- timeout: 10s
134
- retries: 3
135
- start_period: 40s
136
  ```
137
 
138
- ### Building and Running
139
-
140
- ```bash
141
- # Build and start services
142
- docker-compose up -d --build
143
-
144
- # View logs
145
- docker-compose logs -f
146
 
147
- # Stop services
148
- docker-compose down
 
 
 
 
149
  ```
150
 
151
- ## Configuration Options
152
-
153
- ### Environment Variables
154
 
155
- | Variable | Default | Description |
156
- |----------|---------|-------------|
157
- | `STATIC_DIR` | `/app/static` | Directory for storing converted PDFs |
158
- | `TEMP_DIR` | `/tmp/conversions` | Temporary directory for processing |
159
- | `MAX_FILE_SIZE` | `52428800` | Maximum file size in bytes (50MB) |
160
- | `MAX_CONVERSION_TIME` | `120` | Conversion timeout in seconds |
161
- | `CORS_ORIGINS` | `*` | Allowed CORS origins |
162
- | `LANG` | `ar_SA.UTF-8` | System language |
163
- | `LC_ALL` | `ar_SA.UTF-8` | Locale settings |
164
 
165
- ### Volume Mounts
166
 
167
- For persistent storage:
168
- - `/app/static`: Converted PDF files
169
- - `/tmp/conversions`: Temporary processing files
 
 
 
170
 
171
- ## Health Monitoring
172
 
173
- ### Built-in Health Check
174
 
175
- ```bash
176
- # Check application health
177
- curl -f http://localhost:7860/health
178
 
179
- # Expected response
180
- {"status": "healthy", "version": "2.0.0"}
181
- ```
182
 
183
- ### Docker Health Status
 
 
 
184
 
185
- ```bash
186
- # Check container health
187
- docker inspect --format='{{json .State.Health}}' docx-pdf-converter-arabic
188
- ```
189
-
190
- ## Performance Optimization
191
 
192
- ### Resource Allocation
 
 
193
 
194
- For production deployments:
195
 
196
- ```yaml
197
- deploy:
198
- resources:
199
- limits:
200
- memory: 2G
201
- cpus: '1.0'
202
- reservations:
203
- memory: 1G
204
- cpus: '0.5'
205
- ```
206
 
207
- ### Conversion Optimization
 
 
208
 
209
- The application automatically optimizes conversions:
210
- - Parallel processing for batch operations
211
- - Memory-efficient temporary file handling
212
- - Automatic cleanup of temporary files
213
- - Timeout management for long-running conversions
214
 
215
- ## Security Considerations
 
 
216
 
217
- ### File Validation
 
 
218
 
219
- - MIME type verification
220
- - File extension validation
221
- - Size limit enforcement
222
- - Path traversal prevention
223
-
224
- ### Container Security
225
 
 
226
  ```bash
227
- # Run with limited capabilities
228
- docker run --cap-drop=ALL --read-only --tmpfs /tmp docx-pdf-converter
229
  ```
230
 
231
- ## Backup and Recovery
232
-
233
- ### Data Backup
234
-
235
- ```bash
236
- # Backup converted files
237
- tar -czf static-backup-$(date +%Y%m%d).tar.gz static/
238
-
239
- # Backup configuration
240
- tar -czf config-backup-$(date +%Y%m%d).tar.gz docker-compose.yml .env
241
- ```
242
 
243
- ### Data Recovery
 
 
244
 
245
- ```bash
246
- # Restore converted files
247
- tar -xzf static-backup-*.tar.gz
248
 
249
- # Restore configuration
250
- tar -xzf config-backup-*.tar.gz
251
- ```
252
 
253
- ## Troubleshooting
 
 
254
 
255
- ### Common Issues
256
 
257
- 1. **Font Rendering Problems**
258
  ```bash
259
- # Reinstall fonts
260
- ./arabic_fonts_setup.sh
261
- sudo fc-cache -fv
262
  ```
263
 
264
- 2. **LibreOffice Failures**
265
  ```bash
266
- # Check LibreOffice installation
267
- libreoffice --version
268
-
269
- # Test conversion
270
- libreoffice --headless --convert-to pdf --outdir /tmp test.docx
271
  ```
272
 
273
- 3. **Permission Errors**
274
  ```bash
275
- # Fix directory permissions
276
- sudo chmod 777 static/
277
- sudo chmod 777 conversions/
278
- ```
279
-
280
- ### Log Analysis
281
-
282
- ```bash
283
- # View application logs
284
- docker-compose logs docx-to-pdf-arabic
285
-
286
- # Filter error messages
287
- docker-compose logs docx-to-pdf-arabic | grep -i error
288
- ```
289
-
290
- ## Maintenance
291
-
292
- ### Regular Updates
293
-
294
- ```bash
295
- # Update system packages
296
- sudo apt-get update && sudo apt-get upgrade
297
-
298
- # Update Docker images
299
- docker-compose pull
300
- docker-compose up -d --build
301
- ```
302
-
303
- ### Font Updates
304
-
305
- ```bash
306
- # Update Arabic fonts
307
- ./arabic_fonts_setup.sh
308
- sudo fc-cache -fv
309
- ```
310
-
311
- ## Scaling Options
312
-
313
- ### Horizontal Scaling
314
-
315
- For high-volume deployments:
316
-
317
- ```yaml
318
- version: '3.8'
319
-
320
- services:
321
- docx-to-pdf-arabic:
322
- build: .
323
- deploy:
324
- replicas: 3
325
- # ... other configuration
326
- ```
327
-
328
- ### Load Balancing
329
-
330
- Use nginx or similar for load distribution across multiple instances.
 
1
+ # Deployment Guide for Enhanced DOCX to PDF Converter
2
 
3
+ ## System Requirements
4
 
5
+ - Docker 20.10+
 
6
  - Docker Compose 1.29+
7
+ - 4GB+ RAM recommended
8
+ - 2+ CPU cores recommended
9
 
10
+ ## Deployment Options
11
 
12
+ ### 1. Docker Deployment (Recommended)
13
 
14
+ 1. **Build and run with Docker Compose:**
15
+ ```bash
16
+ docker-compose up --build -d
17
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
+ 2. **Access the service:**
20
+ - API: http://localhost:8000
21
+ - API Documentation: http://localhost:8000/docs
22
 
23
+ 3. **View logs:**
24
+ ```bash
25
+ docker-compose logs -f
26
+ ```
27
 
28
+ 4. **Stop the service:**
29
+ ```bash
30
+ docker-compose down
31
+ ```
32
 
33
+ ### 2. Manual Deployment
 
 
34
 
35
+ 1. **Install system dependencies:**
36
+ ```bash
37
+ # Ubuntu/Debian
38
+ sudo apt-get update
39
+ sudo apt-get install -y python3 python3-pip libreoffice libreoffice-writer
40
+
41
+ # Install Arabic fonts
42
+ sudo apt-get install -y fonts-noto-core fonts-noto-kufi-arabic fonts-amiri fonts-scheherazade-new
43
+ ```
44
 
45
+ 2. **Install Python dependencies:**
46
+ ```bash
47
+ pip3 install -r requirements.txt
48
+ ```
49
 
50
+ 3. **Run the application:**
51
+ ```bash
52
+ python3 src/api/app.py
53
+ ```
54
 
55
+ ## Configuration
 
56
 
57
+ ### Environment Variables
 
 
58
 
59
+ | Variable | Description | Default |
60
+ |----------|-------------|---------|
61
+ | `PORT` | Application port | 8000 |
62
+ | `MAX_FILE_SIZE` | Maximum file size in bytes | 52428800 (50MB) |
63
+ | `MAX_CONVERSION_TIME` | Conversion timeout in seconds | 120 |
64
+ | `TEMP_DIR` | Temporary directory for conversions | /tmp/conversions |
65
+ | `CORS_ORIGINS` | CORS allowed origins | * |
66
+ | `CORS_CREDENTIALS` | CORS credentials support | true |
67
 
68
+ ### Example with custom configuration:
69
  ```bash
70
+ PORT=8080 MAX_FILE_SIZE=104857600 docker-compose up
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ```
72
 
73
+ ## Health Checks
 
 
 
 
 
 
 
74
 
75
+ The service provides a health check endpoint at `/health` which returns:
76
+ ```json
77
+ {
78
+ "status": "healthy",
79
+ "version": "2.0.0"
80
+ }
81
  ```
82
 
83
+ Docker health checks are configured in the docker-compose.yml file.
 
 
84
 
85
+ ## Scaling
 
 
 
 
 
 
 
 
86
 
87
+ For high-traffic environments:
88
 
89
+ 1. **Increase worker count in Docker:**
90
+ ```yaml
91
+ # In docker-compose.yml
92
+ environment:
93
+ - WORKERS=8
94
+ ```
95
 
96
+ 2. **Use a reverse proxy like NGINX for load balancing**
97
 
98
+ 3. **Consider using Kubernetes for orchestration**
99
 
100
+ ## Monitoring
 
 
101
 
102
+ The application logs to stdout/stderr and includes:
 
 
103
 
104
+ - Request logging
105
+ - Conversion success/failure tracking
106
+ - Error details
107
+ - Performance metrics
108
 
109
+ ## Backup and Recovery
 
 
 
 
 
110
 
111
+ - Converted files are stored in the `conversions` directory
112
+ - This directory is mounted as a volume in Docker
113
+ - Regularly backup this directory for persistence
114
 
115
+ ## Troubleshooting
116
 
117
+ ### Common Issues
 
 
 
 
 
 
 
 
 
118
 
119
+ 1. **LibreOffice not found:**
120
+ - Ensure LibreOffice is installed in the container/host
121
+ - Check PATH environment variable
122
 
123
+ 2. **Font issues with Arabic text:**
124
+ - Verify Arabic fonts are installed
125
+ - Check font cache with `fc-list | grep -i arabic`
 
 
126
 
127
+ 3. **Large file timeouts:**
128
+ - Increase `MAX_CONVERSION_TIME` environment variable
129
+ - Consider preprocessing large documents
130
 
131
+ 4. **Memory issues:**
132
+ - Allocate more RAM to Docker/container
133
+ - Monitor memory usage with `docker stats`
134
 
135
+ ### Logs
 
 
 
 
 
136
 
137
+ View application logs:
138
  ```bash
139
+ docker-compose logs docx-to-pdf-enhanced
 
140
  ```
141
 
142
+ ## Security Considerations
 
 
 
 
 
 
 
 
 
 
143
 
144
+ 1. **File Validation:**
145
+ - Files are validated for type and size
146
+ - Only DOCX files are accepted
147
 
148
+ 2. **Resource Limits:**
149
+ - File size limits prevent abuse
150
+ - Conversion timeouts prevent resource exhaustion
151
 
152
+ 3. **Container Security:**
153
+ - Run with minimal privileges
154
+ - Keep base images updated
155
 
156
+ 4. **CORS Configuration:**
157
+ - Configure `CORS_ORIGINS` appropriately for production
158
+ - Don't use "*" in production environments
159
 
160
+ ## Updating the Application
161
 
162
+ 1. **Pull latest changes:**
163
  ```bash
164
+ git pull origin main
 
 
165
  ```
166
 
167
+ 2. **Rebuild and restart:**
168
  ```bash
169
+ docker-compose down
170
+ docker-compose up --build -d
 
 
 
171
  ```
172
 
173
+ 3. **Verify the update:**
174
  ```bash
175
+ curl http://localhost:8000/health
176
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEPLOYMENT_GUIDE.md CHANGED
@@ -1,216 +1,217 @@
1
- # Deployment Guide
2
-
3
- ## System Requirements
4
-
5
- - Ubuntu 20.04 or later (recommended)
6
- - Docker and Docker Compose
7
- - Minimum 2GB RAM
8
- - Minimum 20GB disk space
9
-
10
- ## Installation Steps
11
-
12
- ### 1. Install System Dependencies
13
-
14
- ```bash
15
- # Update package list
16
- sudo apt-get update
17
-
18
- # Install core system dependencies
19
- sudo apt-get install -y \
20
- python3 \
21
- python3-pip \
22
- libreoffice \
23
- libreoffice-writer \
24
- libreoffice-l10n-ar \
25
- fonts-liberation \
26
- fonts-liberation2 \
27
- fonts-dejavu \
28
- fonts-dejavu-core \
29
- fonts-dejavu-extra \
30
- fonts-croscore \
31
- fonts-noto-core \
32
- fonts-noto-ui-core \
33
- fonts-noto-mono \
34
- fonts-noto-color-emoji \
35
- fonts-opensymbol \
36
- fonts-freefont-ttf \
37
- fontconfig \
38
- wget \
39
- curl \
40
- unzip \
41
- locales
42
- ```
43
-
44
- ### 2. Generate Arabic Locale
45
-
46
- ```bash
47
- sudo locale-gen ar_SA.UTF-8
48
- ```
49
-
50
- ### 3. Install Python Dependencies
51
-
52
- ```bash
53
- pip3 install -r requirements.txt
54
  ```
55
 
56
- ### 4. Setup Arabic Fonts
 
 
 
57
 
58
- The application includes a script to automatically download and install additional Arabic fonts:
59
 
 
60
  ```bash
61
- chmod +x arabic_fonts_setup.sh
62
- ./arabic_fonts_setup.sh
63
- ```
64
-
65
- This script will install:
66
- - Amiri font (excellent for Traditional Arabic)
67
- - Scheherazade New font (designed for Arabic script)
68
 
69
- ### 5. Update Font Cache
 
70
 
71
- ```bash
72
- fc-cache -fv
73
  ```
74
 
75
- ### 6. Verify Installation
 
 
 
76
 
77
- Check that Arabic fonts are properly installed:
78
 
79
- ```bash
80
- fc-list | grep -i "arabic\|amiri\|scheherazade"
81
- ```
82
 
83
- ## Docker Deployment
 
 
 
 
 
 
 
 
84
 
85
- ### Using Docker Compose (Recommended)
 
 
 
86
 
87
- 1. Build and start the containers:
 
 
88
 
89
- ```bash
90
- docker-compose up -d
91
- ```
92
 
93
- 2. Access the application at `http://localhost:7860`
 
 
 
 
94
 
95
- ### Manual Docker Build
 
 
 
 
96
 
97
- 1. Build the Docker image:
98
 
 
99
  ```bash
100
- docker build -t docx-pdf-converter .
101
  ```
102
 
103
- 2. Run the container:
104
-
105
  ```bash
106
- docker run -p 7860:7860 docx-pdf-converter
107
  ```
108
 
109
- ## Configuration
110
-
111
- ### Environment Variables
112
-
113
- - `STATIC_DIR` - Directory for storing converted PDFs (default: /app/static)
114
- - `TEMP_DIR` - Temporary directory for processing (default: /tmp/conversions)
115
- - `MAX_FILE_SIZE` - Maximum file size in bytes (default: 52428800)
116
- - `MAX_CONVERSION_TIME` - Conversion timeout in seconds (default: 120)
117
-
118
- ### Volume Mounts
119
-
120
- For persistent storage of converted files:
121
-
122
- ```yaml
123
- volumes:
124
- - ./static:/app/static
125
- - ./conversions:/tmp/conversions
126
- ```
127
-
128
- ## Troubleshooting
129
-
130
- ### Common Issues
131
-
132
- 1. **Font rendering issues**: Ensure all Arabic fonts are properly installed and font cache is updated
133
- 2. **LibreOffice startup failures**: Check that LibreOffice is properly installed and configured
134
- 3. **Permission errors**: Ensure directories have proper write permissions
135
-
136
- ### Checking Font Installation
137
-
138
  ```bash
139
- # List all available Arabic fonts
140
- fc-list :lang=ar
141
-
142
- # Check specific font
143
- fc-match "Amiri"
144
  ```
145
 
146
- ### LibreOffice Validation
147
 
 
 
148
  ```bash
149
- # Check LibreOffice version
150
  libreoffice --version
151
 
152
- # Test headless conversion
153
- libreoffice --headless --convert-to pdf --outdir /tmp test.docx
 
154
  ```
155
 
156
- ## Performance Tuning
157
-
158
- ### Resource Limits
 
 
159
 
160
- For production deployments, consider setting resource limits:
 
161
 
162
- ```yaml
163
- deploy:
164
- resources:
165
- limits:
166
- memory: 1G
167
- cpus: '0.5'
168
  ```
169
 
170
- ### Health Checks
171
-
172
- The Docker image includes built-in health checks:
 
 
173
 
174
- ```yaml
175
- healthcheck:
176
- test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
177
- interval: 30s
178
- timeout: 10s
179
- retries: 3
180
  ```
181
 
182
- ## Updating the Application
 
 
 
 
183
 
184
- 1. Pull the latest code:
185
-
186
- ```bash
187
- git pull origin main
188
- ```
189
 
190
- 2. Rebuild Docker images:
 
 
 
191
 
 
192
  ```bash
193
- docker-compose build
194
- ```
195
 
196
- 3. Restart services:
 
197
 
198
- ```bash
199
- docker-compose up -d
200
  ```
201
 
202
- ## Backup and Recovery
203
 
204
- ### Backup Converted Files
 
 
 
 
205
 
206
- ```bash
207
- # Backup static directory containing converted PDFs
208
- tar -czf static-backup-$(date +%Y%m%d).tar.gz static/
209
- ```
 
210
 
211
- ### Restore Backup
212
 
213
- ```bash
214
- # Restore static directory
215
- tar -xzf static-backup-*.tar.gz
216
- ```
 
 
1
+ # 🚀 دليل النشر - محول DOCX إلى PDF للعربية
2
+
3
+ ## 📋 خيارات النشر
4
+
5
+ ### 1. 🌐 Hugging Face Spaces (الموصى به)
6
+
7
+ #### الخطوات:
8
+ 1. **إنشاء Space جديد:**
9
+ - اذهب إلى [Hugging Face Spaces](https://huggingface.co/spaces)
10
+ - اضغط "Create new Space"
11
+ - اختر "Gradio" كـ SDK
12
+ - اختر اسم للـ Space
13
+
14
+ 2. **رفع الملفات:**
15
+ ```bash
16
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
17
+ cd YOUR_SPACE_NAME
18
+
19
+ # نسخ الملفات المطلوبة
20
+ cp /path/to/your/project/app.py .
21
+ cp /path/to/your/project/requirements.txt .
22
+ cp /path/to/your/project/packages.txt .
23
+ cp /path/to/your/project/README.md .
24
+
25
+ # رفع التغييرات
26
+ git add .
27
+ git commit -m "Add Arabic DOCX to PDF converter"
28
+ git push
29
+ ```
30
+
31
+ 3. **التحقق من النشر:**
32
+ - انتظر بناء الـ Space (5-10 دقائق)
33
+ - تحقق من السجلات للتأكد من تثبيت الخطوط العربية
34
+ - اختبر التحويل بملف عربي بسيط
35
+
36
+ #### المزايا:
37
+ - ✅ مجاني ومتاح 24/7
38
+ - ✅ تثبيت تلقائي للتبعيات
39
+ - ✅ واجهة ويب جاهزة
40
+ - ✅ مشاركة سهلة عبر الرابط
41
+
42
+ ### 2. 🐳 Docker (للتشغيل المحلي)
43
+
44
+ #### الخطوات:
45
+ ```bash
46
+ # بناء الصورة
47
+ docker build -t docx-pdf-arabic .
48
+
49
+ # تشغيل الحاوية
50
+ docker run -p 7860:7860 docx-pdf-arabic
51
+
52
+ # أو استخدام docker-compose
53
+ docker-compose up -d
54
  ```
55
 
56
+ #### المزايا:
57
+ - ✅ بيئة معزولة ومستقرة
58
+ - ✅ سهولة النشر على خوادم مختلفة
59
+ - ✅ تحكم كامل في البيئة
60
 
61
+ ### 3. 🖥️ التشغيل المحلي المباشر
62
 
63
+ #### الخطوات:
64
  ```bash
65
+ # تثبيت التبعيات النظام (Ubuntu/Debian)
66
+ sudo apt-get update
67
+ sudo apt-get install libreoffice libreoffice-writer \
68
+ fonts-liberation fonts-dejavu fonts-noto fontconfig
 
 
 
69
 
70
+ # تثبيت التبعيات Python
71
+ pip install -r requirements.txt
72
 
73
+ # تشغيل التطبيق
74
+ python run_local.py
75
  ```
76
 
77
+ #### المزايا:
78
+ - ✅ أداء أسرع
79
+ - ✅ تحكم كامل في النظام
80
+ - ✅ سهولة التطوير والاختبار
81
 
82
+ ## 🔧 إعدادات التحسين
83
 
84
+ ### لـ Hugging Face Spaces:
 
 
85
 
86
+ 1. **تحسين packages.txt:**
87
+ ```
88
+ libreoffice
89
+ libreoffice-writer
90
+ libreoffice-l10n-ar
91
+ fonts-noto-naskh
92
+ fonts-amiri
93
+ fontconfig
94
+ ```
95
 
96
+ 2. **تحسين requirements.txt:**
97
+ ```
98
+ gradio==4.20.0
99
+ ```
100
 
101
+ 3. **إعدادات README.md:**
102
+ - تأكد من وجود YAML frontmatter صحيح
103
+ - اضبط sdk_version على النسخة الصحيحة
104
 
105
+ ### للخوادم المخصصة:
 
 
106
 
107
+ 1. **تحسين الذاكرة:**
108
+ ```bash
109
+ export JAVA_OPTS="-Xmx2g"
110
+ export SAL_DISABLE_OPENCL=1
111
+ ```
112
 
113
+ 2. **تحسين الخطوط:**
114
+ ```bash
115
+ fc-cache -fv
116
+ fc-list | grep -i arabic
117
+ ```
118
 
119
+ ## 🧪 اختبار النشر
120
 
121
+ ### 1. اختبار أساسي:
122
  ```bash
123
+ python test_conversion.py
124
  ```
125
 
126
+ ### 2. اختبار الخطوط العربية:
 
127
  ```bash
128
+ fc-list | grep -i "amiri\|noto.*arabic"
129
  ```
130
 
131
+ ### 3. اختبار LibreOffice:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ```bash
133
+ libreoffice --headless --convert-to pdf test.docx
 
 
 
 
134
  ```
135
 
136
+ ## 🔍 استكشاف أخطاء النشر
137
 
138
+ ### مشكلة: LibreOffice لا يعمل
139
+ **الحل:**
140
  ```bash
141
+ # تحقق من التثبيت
142
  libreoffice --version
143
 
144
+ # إعادة تثبيت
145
+ sudo apt-get remove --purge libreoffice*
146
+ sudo apt-get install libreoffice libreoffice-writer
147
  ```
148
 
149
+ ### مشكلة: الخطوط العربية مفقودة
150
+ **الحل:**
151
+ ```bash
152
+ # تثبيت خطوط إضافية
153
+ sudo apt-get install fonts-noto-naskh fonts-amiri
154
 
155
+ # تحديث cache
156
+ sudo fc-cache -fv
157
 
158
+ # التحقق
159
+ fc-list | grep -i arabic
 
 
 
 
160
  ```
161
 
162
+ ### مشكلة: أخطاء الذاكرة
163
+ **الحل:**
164
+ ```bash
165
+ # زيادة حد الذاكرة
166
+ export JAVA_OPTS="-Xmx4g"
167
 
168
+ # تعطيل OpenCL
169
+ export SAL_DISABLE_OPENCL=1
 
 
 
 
170
  ```
171
 
172
+ ### مشكلة: بطء التحويل
173
+ **الحل:**
174
+ - قلل حجم الملفات المدخلة
175
+ - استخدم خادم بمواصفات أعلى
176
+ - فعل التخزين المؤقت
177
 
178
+ ## 📊 مراقبة الأداء
 
 
 
 
179
 
180
+ ### مؤشرات مهمة:
181
+ - وقت التحويل (يجب أن يكون < 30 ثانية للملفات العادية)
182
+ - استخدام الذاكرة (يجب أن يكون < 2GB)
183
+ - معدل نجاح التحويل (يجب أن يكون > 95%)
184
 
185
+ ### أدوات المراقبة:
186
  ```bash
187
+ # مراقبة الذاكرة
188
+ htop
189
 
190
+ # مراقبة العمليات
191
+ ps aux | grep libreoffice
192
 
193
+ # مراقبة السجلات
194
+ tail -f /var/log/syslog
195
  ```
196
 
197
+ ## 🔒 الأمان
198
 
199
+ ### إعدادات الأمان:
200
+ 1. تحديد حجم الملفات المرفوعة (< 50MB)
201
+ 2. تنظيف الملفات المؤقتة تلقائياً
202
+ 3. تحديد وقت انتهاء للعمليات (timeout)
203
+ 4. منع تنفيذ الكود الضار في الملفات
204
 
205
+ ### أفضل الممارسات:
206
+ - استخدم HTTPS دائماً
207
+ - فعل rate limiting
208
+ - راقب استخدام الموارد
209
+ - احتفظ بنسخ احتياطية من الإعدادات
210
 
211
+ ## 📞 الدعم
212
 
213
+ إذا واجهت مشاكل في النشر:
214
+ 1. تحقق من السجلات أولاً
215
+ 2. تأكد من تثبيت جميع التبعيات
216
+ 3. اختبر على بيئة محلية أولاً
217
+ 4. راجع دليل استكشاف الأخطاء أعلاه
DOCKER_TROUBLESHOOTING.md DELETED
@@ -1,201 +0,0 @@
1
- # Docker Build Troubleshooting Guide
2
-
3
- This document provides solutions for common issues encountered when building the Docker image for the Enhanced DOCX to PDF Converter.
4
-
5
- ## Common Build Errors and Solutions
6
-
7
- ### 1. Package Not Found Errors
8
-
9
- **Error Message:**
10
- ```
11
- E: Package 'libreoffice-help-ar' has no installation candidate
12
- E: Unable to locate package fonts-noto-naskh
13
- E: Unable to locate package fonts-noto-kufi-arabic
14
- E: Unable to locate package fonts-amiri
15
- E: Unable to locate package fonts-scheherazade-new
16
- ```
17
-
18
- **Solution:**
19
- These packages are not available in the Ubuntu 22.04 repository. The Dockerfile has been updated to:
20
- 1. Remove unavailable packages
21
- 2. Install Arabic fonts manually via the `install_arabic_fonts.sh` script
22
-
23
- ### 2. Font Installation Failures
24
-
25
- **Error Message:**
26
- ```
27
- Failed to download <font-name>
28
- ```
29
-
30
- **Solution:**
31
- The font installation script includes error handling with `|| true` to continue even if some fonts fail to download. This ensures the build process continues and the application remains functional with the available fonts.
32
-
33
- ### 3. Network Timeout During Font Downloads
34
-
35
- **Error Message:**
36
- ```
37
- wget: unable to resolve host address
38
- curl: (6) Could not resolve host
39
- ```
40
-
41
- **Solution:**
42
- The font installation script includes:
43
- - Timeout settings (`--timeout=30`)
44
- - Retry attempts (`--tries=2` or `--retry 2`)
45
- - Fallback to alternative download methods (curl if wget fails)
46
-
47
- ### 4. Permission Denied Errors
48
-
49
- **Error Message:**
50
- ```
51
- chmod: cannot access 'install_arabic_fonts.sh': Permission denied
52
- ```
53
-
54
- **Solution:**
55
- Ensure the script has execute permissions:
56
- ```dockerfile
57
- RUN chmod +x install_arabic_fonts.sh && \
58
- ./install_arabic_fonts.sh || true
59
- ```
60
-
61
- ### 5. Font Cache Update Failures
62
-
63
- **Error Message:**
64
- ```
65
- fc-cache: command not found
66
- ```
67
-
68
- **Solution:**
69
- Ensure `fontconfig` package is installed:
70
- ```dockerfile
71
- RUN apt-get update && apt-get install -y \
72
- fontconfig \
73
- # other packages...
74
- ```
75
-
76
- ## Dockerfile Best Practices Implemented
77
-
78
- ### 1. Minimal Base Image
79
- Using `ubuntu:22.04` for stability and security.
80
-
81
- ### 2. Efficient Package Installation
82
- Combining multiple `apt-get install` commands to reduce layers:
83
- ```dockerfile
84
- RUN apt-get update && apt-get install -y \
85
- package1 \
86
- package2 \
87
- package3 \
88
- && rm -rf /var/lib/apt/lists/*
89
- ```
90
-
91
- ### 3. Proper Cleanup
92
- Removing apt cache after installation to reduce image size:
93
- ```dockerfile
94
- && rm -rf /var/lib/apt/lists/*
95
- ```
96
-
97
- ### 4. Error Handling
98
- Using `|| true` to prevent build failures from non-critical steps:
99
- ```dockerfile
100
- RUN ./install_arabic_fonts.sh || true
101
- ```
102
-
103
- ### 5. Correct Working Directory
104
- Setting working directory early in the Dockerfile:
105
- ```dockerfile
106
- WORKDIR /app
107
- ```
108
-
109
- ## Manual Font Installation Process
110
-
111
- The `install_arabic_fonts.sh` script performs the following steps:
112
-
113
- 1. Creates font directory: `/usr/share/fonts/truetype/arabic`
114
- 2. Downloads Arabic fonts from reliable sources:
115
- - Amiri font
116
- - Scheherazade New font
117
- - Noto Sans Arabic font
118
- - Noto Naskh Arabic font
119
- 3. Extracts and installs font files
120
- 4. Updates font cache with `fc-cache -fv`
121
-
122
- ## Testing Docker Build Locally
123
-
124
- To test the Docker build locally:
125
-
126
- ```bash
127
- docker build -t docx-pdf-converter .
128
- ```
129
-
130
- To test with no cache (recommended for troubleshooting):
131
- ```bash
132
- docker build --no-cache -t docx-pdf-converter .
133
- ```
134
-
135
- ## Hugging Face Spaces Specific Considerations
136
-
137
- ### 1. Build Time Limits
138
- Hugging Face Spaces have build time limits. To optimize:
139
- - Use multi-stage builds if needed
140
- - Minimize the number of layers
141
- - Cache dependencies effectively
142
-
143
- ### 2. Network Restrictions
144
- Hugging Face build environments may have network restrictions:
145
- - Use HTTPS for all downloads
146
- - Include fallback mechanisms
147
- - Set appropriate timeouts
148
-
149
- ### 3. Disk Space Limitations
150
- Monitor image size:
151
- - Remove unnecessary files after installation
152
- - Use `.dockerignore` to exclude unnecessary files
153
- - Consider using smaller base images if needed
154
-
155
- ## Debugging Build Issues
156
-
157
- ### 1. Enable Verbose Output
158
- Add `set -x` to shell scripts for debugging:
159
- ```bash
160
- #!/bin/bash
161
- set -x # Enable verbose output
162
- ```
163
-
164
- ### 2. Test Individual Commands
165
- Run commands interactively in a container:
166
- ```bash
167
- docker run -it ubuntu:22.04 /bin/bash
168
- ```
169
-
170
- ### 3. Check Available Packages
171
- List available packages in the build environment:
172
- ```bash
173
- apt-cache search <package-name>
174
- apt list --upgradable
175
- ```
176
-
177
- ## Alternative Solutions
178
-
179
- ### 1. Using Different Base Images
180
- If Ubuntu continues to have issues, consider:
181
- - `debian:stable-slim`
182
- - `alpine:latest` (with proper package mapping)
183
-
184
- ### 2. Pre-downloading Fonts
185
- Include font files directly in the repository to avoid network dependencies during build.
186
-
187
- ### 3. Using Font Packages from Different Repositories
188
- Add additional package repositories if needed:
189
- ```dockerfile
190
- RUN echo "deb http://ppa.launchpad.net/libreoffice/ppa/ubuntu jammy main" > /etc/apt/sources.list.d/libreoffice.list
191
- ```
192
-
193
- ## Contact Support
194
-
195
- If you continue to experience issues:
196
- 1. Check the Hugging Face community forums
197
- 2. Review the build logs carefully
198
- 3. Test the Dockerfile locally first
199
- 4. Contact Hugging Face support with detailed error information
200
-
201
- This troubleshooting guide should help resolve most common Docker build issues for the Enhanced DOCX to PDF Converter.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Dockerfile CHANGED
@@ -1,4 +1,4 @@
1
- # Dockerfile for DOCX to PDF Converter with Enhanced Arabic Support
2
  FROM ubuntu:22.04
3
 
4
  # Set environment variables for Arabic support
@@ -6,15 +6,15 @@ ENV DEBIAN_FRONTEND=noninteractive
6
  ENV LANG=ar_SA.UTF-8
7
  ENV LC_ALL=ar_SA.UTF-8
8
  ENV PYTHONUNBUFFERED=1
9
- ENV STATIC_DIR=/app/static
10
 
11
- # Install system dependencies including available Arabic fonts
12
  RUN apt-get update && apt-get install -y \
13
  python3 \
14
  python3-pip \
15
  libreoffice \
16
  libreoffice-writer \
17
  libreoffice-l10n-ar \
 
18
  fonts-liberation \
19
  fonts-liberation2 \
20
  fonts-dejavu \
@@ -25,8 +25,12 @@ RUN apt-get update && apt-get install -y \
25
  fonts-noto-ui-core \
26
  fonts-noto-mono \
27
  fonts-noto-color-emoji \
 
 
28
  fonts-opensymbol \
29
  fonts-freefont-ttf \
 
 
30
  fontconfig \
31
  wget \
32
  curl \
@@ -40,33 +44,28 @@ RUN locale-gen ar_SA.UTF-8
40
  # Set working directory
41
  WORKDIR /app
42
 
43
- # Create necessary directories
44
- RUN mkdir -p /tmp/libreoffice_conversion && \
45
- mkdir -p /app/static && \
46
- chmod 777 /app/static
47
-
48
  # Copy requirements and install Python dependencies
49
  COPY requirements.txt .
50
  RUN pip3 install --no-cache-dir -r requirements.txt
51
 
52
  # Copy application files
53
- COPY app.py .
54
- COPY arabic_fonts_setup.sh .
55
- COPY libreoffice_arabic_config.xml .
56
-
57
- # Setup additional Arabic fonts
58
- RUN chmod +x arabic_fonts_setup.sh && \
59
- ./arabic_fonts_setup.sh || true
60
 
61
- # Update font cache
 
 
62
  RUN fc-cache -fv
63
 
 
 
 
64
  # Expose port
65
  EXPOSE 7860
66
 
67
  # Health check
68
  HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
69
- CMD curl -f http://localhost:7860/ || exit 1
70
 
71
  # Run the application
72
- CMD ["python3", "app.py"]
 
1
+ # Dockerfile for Enhanced DOCX to PDF Converter
2
  FROM ubuntu:22.04
3
 
4
  # Set environment variables for Arabic support
 
6
  ENV LANG=ar_SA.UTF-8
7
  ENV LC_ALL=ar_SA.UTF-8
8
  ENV PYTHONUNBUFFERED=1
 
9
 
10
+ # Install system dependencies including Arabic fonts and Python
11
  RUN apt-get update && apt-get install -y \
12
  python3 \
13
  python3-pip \
14
  libreoffice \
15
  libreoffice-writer \
16
  libreoffice-l10n-ar \
17
+ libreoffice-help-ar \
18
  fonts-liberation \
19
  fonts-liberation2 \
20
  fonts-dejavu \
 
25
  fonts-noto-ui-core \
26
  fonts-noto-mono \
27
  fonts-noto-color-emoji \
28
+ fonts-noto-naskh \
29
+ fonts-noto-kufi-arabic \
30
  fonts-opensymbol \
31
  fonts-freefont-ttf \
32
+ fonts-amiri \
33
+ fonts-scheherazade-new \
34
  fontconfig \
35
  wget \
36
  curl \
 
44
  # Set working directory
45
  WORKDIR /app
46
 
 
 
 
 
 
47
  # Copy requirements and install Python dependencies
48
  COPY requirements.txt .
49
  RUN pip3 install --no-cache-dir -r requirements.txt
50
 
51
  # Copy application files
52
+ COPY src/ src/
53
+ COPY arial.ttf .
 
 
 
 
 
54
 
55
+ # Setup font configuration
56
+ RUN mkdir -p /usr/share/fonts/truetype/local-arial
57
+ RUN cp arial.ttf /usr/share/fonts/truetype/local-arial/
58
  RUN fc-cache -fv
59
 
60
+ # Create necessary directories
61
+ RUN mkdir -p /tmp/conversions
62
+
63
  # Expose port
64
  EXPOSE 7860
65
 
66
  # Health check
67
  HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
68
+ CMD curl -f http://localhost:7860/health || exit 1
69
 
70
  # Run the application
71
+ CMD ["python3", "src/api/app.py"]
HUGGINGFACE_DEPLOYMENT.md CHANGED
@@ -67,12 +67,12 @@ The Space is configured through the README.md file:
67
 
68
  ```yaml
69
  ---
70
- title: Enhanced DOCX to PDF Converter with Arabic Support
71
  emoji: 📄
72
  colorFrom: blue
73
  colorTo: purple
74
  sdk: docker
75
- app_file: app.py
76
  pinned: false
77
  ---
78
  ```
@@ -84,7 +84,7 @@ pinned: false
84
  - **colorFrom**: Gradient start color
85
  - **colorTo**: Gradient end color
86
  - **sdk**: Must be "docker" for this application
87
- - **app_file**: Must point to "app.py" (the main application file)
88
  - **pinned**: Whether to pin the Space to your profile
89
 
90
  ## API Usage
@@ -114,13 +114,6 @@ curl -X POST "https://your-username-your-space-name.hf.space/convert/batch" \
114
  }'
115
  ```
116
 
117
- ### Access Converted PDFs
118
- Converted PDFs are stored in a static directory and can be accessed directly:
119
- ```bash
120
- # Access PDF directly via static URL
121
- https://your-username-your-space-name.hf.space/static/uuid_filename.pdf
122
- ```
123
-
124
  ### Health Check
125
  ```bash
126
  curl "https://your-username-your-space-name.hf.space/health"
@@ -132,10 +125,9 @@ curl "https://your-username-your-space-name.hf.space/health"
132
 
133
  You can set environment variables in your Space settings:
134
 
135
- - `STATIC_DIR`: Directory for storing converted PDFs (default: /app/static)
136
- - `TEMP_DIR`: Temporary directory for conversions (default: /tmp/conversions)
137
  - `MAX_FILE_SIZE`: Maximum file size in bytes (default: 52428800)
138
  - `MAX_CONVERSION_TIME`: Conversion timeout in seconds (default: 120)
 
139
 
140
  ### Hardware Upgrade
141
 
@@ -163,20 +155,6 @@ For processing larger files or handling more concurrent requests, consider upgra
163
  - Check file size limits
164
  - Review application logs for conversion errors
165
 
166
- ### Docker Build Issues
167
-
168
- If you encounter package installation errors during the Docker build:
169
-
170
- 1. **Package Not Found Errors**: The Dockerfile has been updated to remove unavailable packages:
171
- - Removed: `libreoffice-help-ar`, `fonts-noto-naskh`, `fonts-noto-kufi-arabic`, `fonts-amiri`, `fonts-scheherazade-new`
172
- - These fonts are now installed manually via the `arabic_fonts_setup.sh` script.
173
-
174
- 2. **Font Installation Failures**: The font installation script includes error handling to continue even if some fonts fail to download.
175
-
176
- 3. **Network Timeout**: The script includes timeout settings and retry attempts for font downloads.
177
-
178
- See [DOCKER_TROUBLESHOOTING.md](DOCKER_TROUBLESHOOTING.md) for detailed troubleshooting steps.
179
-
180
  ### Logs and Monitoring
181
 
182
  Monitor your Space through:
 
67
 
68
  ```yaml
69
  ---
70
+ title: Enhanced DOCX to PDF Converter
71
  emoji: 📄
72
  colorFrom: blue
73
  colorTo: purple
74
  sdk: docker
75
+ app_file: Dockerfile
76
  pinned: false
77
  ---
78
  ```
 
84
  - **colorFrom**: Gradient start color
85
  - **colorTo**: Gradient end color
86
  - **sdk**: Must be "docker" for this application
87
+ - **app_file**: Must point to "Dockerfile"
88
  - **pinned**: Whether to pin the Space to your profile
89
 
90
  ## API Usage
 
114
  }'
115
  ```
116
 
 
 
 
 
 
 
 
117
  ### Health Check
118
  ```bash
119
  curl "https://your-username-your-space-name.hf.space/health"
 
125
 
126
  You can set environment variables in your Space settings:
127
 
 
 
128
  - `MAX_FILE_SIZE`: Maximum file size in bytes (default: 52428800)
129
  - `MAX_CONVERSION_TIME`: Conversion timeout in seconds (default: 120)
130
+ - `TEMP_DIR`: Temporary directory for conversions (default: /tmp/conversions)
131
 
132
  ### Hardware Upgrade
133
 
 
155
  - Check file size limits
156
  - Review application logs for conversion errors
157
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  ### Logs and Monitoring
159
 
160
  Monitor your Space through:
README.md CHANGED
@@ -1,104 +1,129 @@
1
  ---
2
- title: Enhanced DOCX to PDF Converter with Arabic Support
3
  emoji: 📄
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: docker
 
7
  app_file: app.py
8
  pinned: false
9
  ---
10
 
11
- # Enhanced DOCX to PDF Converter with Arabic Support
12
 
13
- This enhanced version of the DOCX to PDF converter provides professional API capabilities with improved Arabic language support and better file handling.
14
 
15
- ## Features
16
 
17
- - Perfect DOCX to PDF conversion with formatting preservation
18
- - Enhanced Arabic RTL text support
19
- - Professional FastAPI-based RESTful API
20
- - Static file serving for converted PDFs
21
- - Direct URL access to converted PDFs
22
- - ✅ Inline PDF viewing in browser
23
- - ✅ Multi-file batch processing
24
- - ✅ Base64 encoded file support
25
- - ✅ Comprehensive error handling
26
- - ✅ Docker containerization support
27
- - ✅ Health monitoring endpoints
28
- - ✅ CORS support for web integration
29
 
30
- ## API Endpoints
 
 
 
 
31
 
32
- - `POST /convert` - Convert a single DOCX file to PDF
33
- - `POST /convert/batch` - Convert multiple DOCX files to PDF
34
- - `GET /static/{filename}` - Access converted PDF files directly
35
- - `GET /health` - Application health check
36
- - `GET /docs` - Interactive API documentation
37
 
38
- ## How It Works
39
 
40
- 1. Upload a DOCX file via the API
41
- 2. The file is converted to PDF using LibreOffice
42
- 3. The converted PDF is stored in a static directory
43
- 4. A direct URL to the PDF is returned
44
- 5. The PDF can be accessed directly via the URL or opened in the browser
 
 
45
 
46
- ## Static File Serving
47
 
48
- Converted PDF files are stored in a static directory and served directly via URLs:
49
- - Files are stored in `/app/static` directory
50
- - Access via `https://your-domain/static/{filename}`
51
- - PDFs open inline in the browser by default
 
 
 
52
 
53
- ## Usage
54
 
55
- ### Web Interface
 
 
56
 
57
- Use the provided HTML interface to test the converter:
58
- 1. Open `test_interface.html` in your browser
59
- 2. Select a DOCX file
60
- 3. Click "Convert to PDF"
61
- 4. Click "Open PDF in Browser" to view the converted file
62
 
63
- ### API Usage
 
 
 
 
 
 
 
 
 
64
 
65
- ```bash
66
- # Single file conversion
67
- curl -X POST -F "file=@document.docx" https://your-domain/convert
68
 
69
- # Response will include a direct URL to the PDF:
70
- # {
71
- # "success": true,
72
- # "pdf_url": "/static/uuid_filename.pdf",
73
- # "message": "Conversion successful"
74
- # }
75
 
76
- # Access the PDF directly at: https://your-domain/static/uuid_filename.pdf
77
- ```
78
 
79
- ## Deployment
 
 
 
 
 
80
 
81
- ### Docker Deployment
82
 
83
  ```bash
84
- docker-compose up -d
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ```
86
 
87
- ### Environment Variables
 
 
88
 
89
- - `STATIC_DIR` - Directory for storing converted PDFs (default: /app/static)
90
- - `TEMP_DIR` - Temporary directory for processing (default: /tmp/conversions)
91
- - `MAX_FILE_SIZE` - Maximum file size in bytes (default: 52428800)
92
- - `MAX_CONVERSION_TIME` - Conversion timeout in seconds (default: 120)
93
 
94
- ## Arabic Language Support
 
 
 
 
95
 
96
- This converter includes enhanced support for Arabic text:
97
- - Proper RTL text handling
98
- - Arabic font installation and configuration
99
- - Font substitution rules for optimal rendering
100
- - Support for complex Arabic script features
101
 
102
- ## License
 
 
103
 
104
- This project is licensed under the MIT License.
 
1
  ---
2
+ title: محول DOCX إلى PDF المتقدم - دقة 99%+ للعربية
3
  emoji: 📄
4
+ colorFrom: gray
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: "4.20.0"
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # 🚀 محول DOCX إلى PDF المتقدم - دقة 99%+ للتنسيق العربي
13
 
14
+ محول من الجيل الجديد مع **تقنيات متقدمة لضمان دقة 99%+ في التنسيق العربي** - يتضمن معالجة مسبقة ذكية، مراقبة لاحقة، وتقارير جودة شاملة.
15
 
16
+ ## 🎯 التقنيات المتقدمة الجديدة
17
 
18
+ ### 🔧 معالجة DOCX مسبقة ذكية
19
+ - **كشف المشاكل تلقائياً**: يحدد TextBoxes، SmartArt، والأشكال المعقدة
20
+ - **إزالة العناصر المشكلة**: يحول العناصر المشكلة إلى تنسيقات متوافقة
21
+ - **تحسين بنية الجداول**: يصلح الجداول المتداخلة ومشاكل دمج الخلايا
22
+ - **حماية Placeholders**: يضمن بقاء {{name}}, {{date}} في مواضعها الدقيقة
 
 
 
 
 
 
 
23
 
24
+ ### ⚙️ إعدادات LibreOffice محسنة
25
+ - **70+ معامل تصدير PDF**: تكوين JSON محسن لأقصى جودة
26
+ - **بدون ضغط**: يحافظ على جودة الصور والنصوص الأصلية
27
+ - **تضمين الخطوط**: جميع الخطوط مضمنة للعرض المتسق
28
+ - **إعدادات RTL متخصصة**: تكوين خاص لاتجاه النص العربي
29
 
30
+ ### 🔍 مراقبة لاحقة بـ PyMuPDF
31
+ - **تحقق من موضع العناصر**: يؤكد أن كل عنصر في الموضع الصحيح
32
+ - **تحقق من الأحرف العربية**: يتحقق من دقة عرض النص RTL
33
+ - **فحص بنية الجداول**: يضمن الحفاظ على تخطيط الجداول
34
+ - **تتبع Placeholders**: يراقب موضع المحتوى الديناميكي
35
 
36
+ ## الميزات المحسنة للعربية
37
 
38
+ - **🔤 تميز الخطوط**: توافق كامل مع الخطوط العربية (Traditional Arabic→Amiri، Arabic Typesetting→Noto Naskh، Simplified Arabic→Noto Naskh)
39
+ - **📊 كمال الجداول**: يحافظ على المساحة الدقيقة للخلايا والحدود والمحاذاة وتنسيق النص
40
+ - **🖼️ أقصى جودة للصور**: الحفاظ على 600 DPI بدون ضغط مدمر
41
+ - **🌍 دعم العربية RTL**: عرض مثالي للنص من اليمين إلى اليسار مع خطوط Amiri و Noto
42
+ - **🔍 التحقق من الجودة**: تحليل فوري للمستند والتحقق من التحويل
43
+ - **🛠️ تشخيص متقدم**: تحليل شامل للأخطاء مع إرشادات استكشاف الأخطاء المحددة
44
+ - **⚡ أداء محسن**: تكوين LibreOffice محسن للمستندات المعقدة العربية
45
 
46
+ ## 🛠️ حلول المشاكل الشائعة
47
 
48
+ **تم حل المشاكل التالية:**
49
+ - تراكب النصوص العربية وعدم وجود فراغات كافية
50
+ - فقدان المحاذاة اليمنى (Right-to-Left) في النص العربي
51
+ - استبدال الخطوط الأصلية بخطوط غير داعمة للعربية
52
+ - ❌ تشوه الجداول أو اختفاء البنية التنظيمية للوثيقة
53
+ - ❌ تغيير مواقع قوالب التع��ئة الديناميكية (مثل {{name}}, {{date}})
54
+ - ❌ حجم الصفحة أو الهامش غير مناسب للطباعة بشكل مرتب (A4)
55
 
56
+ ## 🚀 Usage
57
 
58
+ 1. Upload your `.docx` file
59
+ 2. Wait for conversion to complete
60
+ 3. Download the generated PDF
61
 
62
+ ## 🔧 Technical Excellence
 
 
 
 
63
 
64
+ - **Backend**: Enhanced LibreOffice with maximum quality PDF export settings
65
+ - **Frontend**: Advanced Gradio interface with real-time validation feedback
66
+ - **Font System**: Comprehensive font packages including:
67
+ - Liberation fonts (Arial/Times/Courier/Calibri/Cambria compatible)
68
+ - Croscore fonts (Arimo/Tinos/Cousine for additional compatibility)
69
+ - DejaVu and Noto fonts for international support
70
+ - Advanced fontconfig with Microsoft font substitution rules
71
+ - **Quality Assurance**: Document structure analysis and PDF validation
72
+ - **Error Handling**: Intelligent error analysis with specific troubleshooting guidance
73
+ - **Environment**: Optimized for Hugging Face Spaces with all dependencies pre-configured
74
 
75
+ ## 📋 Comprehensive Support
 
 
76
 
77
+ - **Complex Documents**: Tables, images, mixed fonts, multi-page layouts
78
+ - ✅ **Microsoft Compatibility**: Perfect handling of Calibri, Cambria, Arial, Times New Roman
79
+ - ✅ **International Text**: Arabic RTL, Unicode, special characters
80
+ - ✅ **Large Files**: Documents up to 50MB with unlimited complexity
81
+ - ✅ **Quality Validation**: Real-time analysis ensures perfect conversion results
 
82
 
83
+ ## 🎯 Critical Success Metrics
 
84
 
85
+ **Page Count**: DOCX pages = PDF pages (EXACTLY)
86
+ ✅ **Table Text**: Same size, weight, and position
87
+ ✅ **Images**: No quality loss, exact positioning
88
+ ✅ **Fonts**: Consistent rendering, no size changes
89
+ ✅ **Layout**: Zero pixel shifts or reflowing
90
+ ✅ **File Size**: Reasonable output without bloat
91
 
92
+ ## 🏗️ Local Development
93
 
94
  ```bash
95
+ # Install comprehensive system dependencies (Ubuntu/Debian)
96
+ sudo apt-get update
97
+ sudo apt-get install libreoffice libreoffice-writer \
98
+ fonts-liberation fonts-liberation2 fonts-dejavu fonts-croscore \
99
+ fonts-noto-core fonts-opensymbol fontconfig
100
+
101
+ # Update font cache
102
+ sudo fc-cache -fv
103
+
104
+ # Install Python dependencies
105
+ pip install -r requirements.txt
106
+
107
+ # Run the app with enhanced formatting preservation
108
+ python app.py
109
  ```
110
 
111
+ For Hugging Face Spaces deployment, all system dependencies are automatically installed via the enhanced `packages.txt`.
112
+
113
+ ## 🚀 Implementation Standards
114
 
115
+ This converter implements the requirements from `bb.txt` with absolute precision:
 
 
 
116
 
117
+ - **Enhanced Font Packages**: Complete Microsoft-compatible font ecosystem
118
+ - **Optimized LibreOffice Command**: Quality:100, font embedding, layout preservation
119
+ - **Advanced Configuration**: Custom registrymodifications.xcu with font substitution rules
120
+ - **Environment Excellence**: Proper LANG, fontconfig, and LibreOffice user profile setup
121
+ - **Quality Assurance**: Document analysis, PDF validation, and comprehensive error handling
122
 
123
+ ## 🎯 Final Goal Achievement
 
 
 
 
124
 
125
+ Creates DOCX to PDF conversions so accurate that users cannot tell the difference between the original DOCX and the converted PDF when viewed side by side. **Zero tolerance for formatting deviations.**
126
+
127
+ ---
128
 
129
+ **Built for Hugging Face Spaces** | Enterprise-Grade • Pixel-Perfect • Uncompromising Quality
UPDATE_HF_SPACE.md DELETED
@@ -1,125 +0,0 @@
1
- # Updating Your Hugging Face Space
2
-
3
- This document provides instructions for updating your deployed Hugging Face Space with the latest fixes.
4
-
5
- ## Prerequisites
6
-
7
- 1. Your Hugging Face Space is already deployed
8
- 2. You have write access to the Space repository
9
- 3. Git is installed on your local machine
10
-
11
- ## Update Steps
12
-
13
- ### 1. Clone Your Space Repository
14
-
15
- If you haven't already cloned your Space repository:
16
-
17
- ```bash
18
- git clone https://huggingface.co/spaces/your-username/your-space-name
19
- cd your-space-name
20
- ```
21
-
22
- If you already have a local clone, make sure it's up to date:
23
-
24
- ```bash
25
- cd your-space-name
26
- git pull
27
- ```
28
-
29
- ### 2. Update Files
30
-
31
- Copy the updated files from this project to your Space repository:
32
-
33
- ```bash
34
- # From this project directory, copy all files to your Space repository
35
- cp -r /path/to/enhanced-docx-to-pdf/* /path/to/your/space/repository/
36
- ```
37
-
38
- Alternatively, you can selectively copy the updated files:
39
-
40
- ```bash
41
- # Copy the updated main application file
42
- cp src/api/main.py /path/to/your/space/repository/src/api/main.py
43
-
44
- # Copy any other updated files as needed
45
- ```
46
-
47
- ### 3. Commit and Push Changes
48
-
49
- Add, commit, and push the changes to your Space repository:
50
-
51
- ```bash
52
- cd /path/to/your/space/repository
53
- git add .
54
- git commit -m "Fix root endpoint and improve web interface"
55
- git push
56
- ```
57
-
58
- ### 4. Monitor the Build
59
-
60
- 1. Go to your Space page on Hugging Face
61
- 2. Click on the "Logs" tab to monitor the build process
62
- 3. Wait for the build to complete successfully
63
-
64
- ### 5. Verify the Update
65
-
66
- Once the build completes:
67
-
68
- 1. Visit your Space URL: `https://your-username-your-space-name.hf.space`
69
- 2. You should now see the web interface instead of a 404 error
70
- 3. Test the file conversion functionality
71
- 4. Check the API documentation at `/docs`
72
-
73
- ## What's Fixed
74
-
75
- The update includes:
76
-
77
- 1. **Root Endpoint Fix**: The application now properly serves the web interface at the root path
78
- 2. **Improved Web Interface**: Enhanced user interface with better styling
79
- 3. **Better Error Handling**: More robust error handling for file conversions
80
- 4. **Docker Build Fixes**: Resolved issues with Arabic font installation
81
-
82
- ## Troubleshooting
83
-
84
- ### If the Build Fails
85
-
86
- 1. Check the build logs for specific error messages
87
- 2. Ensure all required files are included in the commit
88
- 3. Verify that the Dockerfile syntax is correct
89
-
90
- ### If the Application Still Shows 404
91
-
92
- 1. Confirm that the `templates/index.html` file is present
93
- 2. Check that the root endpoint handler is in `src/api/main.py`
94
- 3. Verify the application logs for any startup errors
95
-
96
- ### If File Conversion Fails
97
-
98
- 1. Check the application logs for conversion errors
99
- 2. Ensure the input file is a valid DOCX document
100
- 3. Verify file size limits are not exceeded
101
-
102
- ## Rollback (If Needed)
103
-
104
- If you need to rollback to the previous version:
105
-
106
- 1. Find the previous commit hash:
107
- ```bash
108
- git log --oneline
109
- ```
110
-
111
- 2. Reset to the previous commit:
112
- ```bash
113
- git reset --hard <previous-commit-hash>
114
- git push --force
115
- ```
116
-
117
- ## Support
118
-
119
- If you continue to experience issues:
120
-
121
- 1. Check the Hugging Face community forums
122
- 2. Review the application logs carefully
123
- 3. Contact the maintainers with detailed error information
124
-
125
- This update should resolve the 404 error and provide a better user experience for your DOCX to PDF conversion Space.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docker-compose.yml CHANGED
@@ -1,9 +1,9 @@
1
  version: '3.8'
2
 
3
  services:
4
- docx-to-pdf-arabic:
5
  build: .
6
- container_name: docx-pdf-converter-arabic
7
  ports:
8
  - "7860:7860"
9
  environment:
@@ -11,15 +11,12 @@ services:
11
  - LC_ALL=ar_SA.UTF-8
12
  - PYTHONUNBUFFERED=1
13
  - TEMP_DIR=/tmp/conversions
14
- - STATIC_DIR=/app/static
15
  volumes:
16
- # Optional: Mount local directories for testing
17
- - ./test_files:/app/test_files:ro
18
- - ./test_results:/app/test_results
19
- - ./static:/app/static
20
  restart: unless-stopped
21
  healthcheck:
22
- test: ["CMD", "curl", "-f", "http://localhost:7860/"]
23
  interval: 30s
24
  timeout: 10s
25
  retries: 3
 
1
  version: '3.8'
2
 
3
  services:
4
+ docx-to-pdf-enhanced:
5
  build: .
6
+ container_name: docx-pdf-converter-enhanced
7
  ports:
8
  - "7860:7860"
9
  environment:
 
11
  - LC_ALL=ar_SA.UTF-8
12
  - PYTHONUNBUFFERED=1
13
  - TEMP_DIR=/tmp/conversions
 
14
  volumes:
15
+ # Mount for persistent storage of conversions
16
+ - ./conversions:/tmp/conversions
 
 
17
  restart: unless-stopped
18
  healthcheck:
19
+ test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
20
  interval: 30s
21
  timeout: 10s
22
  retries: 3
install_arabic_fonts.sh DELETED
@@ -1,82 +0,0 @@
1
- #!/bin/bash
2
-
3
- # Script to install Arabic fonts manually
4
- set -e
5
-
6
- echo "Installing Arabic fonts manually..."
7
-
8
- # Create fonts directory
9
- mkdir -p /usr/share/fonts/truetype/arabic
10
-
11
- # Function to download and install font
12
- download_font() {
13
- local url=$1
14
- local filename=$2
15
- echo "Downloading $filename..."
16
-
17
- # Try to download with wget
18
- if command -v wget >/dev/null 2>&1; then
19
- if wget --timeout=30 --tries=2 -q "$url" -O "/tmp/$filename"; then
20
- install_font_file "/tmp/$filename"
21
- rm -f "/tmp/$filename"
22
- return 0
23
- fi
24
- fi
25
-
26
- # Try to download with curl if wget failed
27
- if command -v curl >/dev/null 2>&1; then
28
- if curl --max-time 30 --retry 2 -s -L "$url" -o "/tmp/$filename"; then
29
- install_font_file "/tmp/$filename"
30
- rm -f "/tmp/$filename"
31
- return 0
32
- fi
33
- fi
34
-
35
- echo "Failed to download $filename"
36
- return 1
37
- }
38
-
39
- # Function to install font file
40
- install_font_file() {
41
- local filepath=$1
42
-
43
- if [[ "$filepath" == *.zip ]]; then
44
- # Extract zip file
45
- if command -v unzip >/dev/null 2>&1; then
46
- cd /tmp
47
- if unzip -q "$filepath"; then
48
- # Find and copy TTF files
49
- find . -name "*.ttf" -exec cp {} /usr/share/fonts/truetype/arabic/ \; 2>/dev/null || true
50
- # Cleanup
51
- rm -rf *.zip */ 2>/dev/null || true
52
- echo "Installed fonts from zip file"
53
- else
54
- echo "Failed to extract zip file"
55
- fi
56
- else
57
- echo "unzip not available"
58
- fi
59
- else
60
- # Copy TTF file directly
61
- if cp "$filepath" /usr/share/fonts/truetype/arabic/ 2>/dev/null; then
62
- echo "Installed font file"
63
- else
64
- echo "Failed to copy font file"
65
- fi
66
- fi
67
- }
68
-
69
- # Download and install various Arabic fonts
70
- # Continue even if some downloads fail
71
- set +e
72
- download_font "https://github.com/aliftype/amiri/releases/download/0.117/Amiri-0.117.zip" "Amiri-0.117.zip" || true
73
- download_font "https://github.com/silnrsi/font-scheherazade/releases/download/v3.300/ScheherazadeNew-3.300.zip" "ScheherazadeNew-3.300.zip" || true
74
- download_font "https://github.com/notofonts/notofonts.github.io/raw/main/fonts/NotoSansArabic/hinted/ttf/NotoSansArabic-Regular.ttf" "NotoSansArabic-Regular.ttf" || true
75
- download_font "https://github.com/notofonts/notofonts.github.io/raw/main/fonts/NotoNaskhArabic/hinted/ttf/NotoNaskhArabic-Regular.ttf" "NotoNaskhArabic-Regular.ttf" || true
76
- set -e
77
-
78
- # Update font cache
79
- echo "Updating font cache..."
80
- fc-cache -fv || echo "Warning: Failed to update font cache"
81
-
82
- echo "Arabic fonts installation completed!"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
simple_test.html DELETED
@@ -1,225 +0,0 @@
1
- <!DOCTYPE html>
2
- <html lang="ar">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>اختبار تحويل DOCX إلى PDF</title>
7
- <style>
8
- body {
9
- font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
10
- max-width: 800px;
11
- margin: 0 auto;
12
- padding: 20px;
13
- background-color: #f5f7fa;
14
- direction: rtl;
15
- text-align: right;
16
- }
17
-
18
- .container {
19
- background: white;
20
- border-radius: 10px;
21
- padding: 30px;
22
- box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
23
- }
24
-
25
- h1 {
26
- color: #2c3e50;
27
- text-align: center;
28
- }
29
-
30
- .form-group {
31
- margin-bottom: 20px;
32
- }
33
-
34
- label {
35
- display: block;
36
- margin-bottom: 8px;
37
- font-weight: bold;
38
- color: #34495e;
39
- }
40
-
41
- input[type="file"] {
42
- width: 100%;
43
- padding: 12px;
44
- border: 2px dashed #bdc3c7;
45
- border-radius: 5px;
46
- background-color: #ecf0f1;
47
- }
48
-
49
- button {
50
- background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
51
- color: white;
52
- border: none;
53
- padding: 12px 24px;
54
- border-radius: 5px;
55
- cursor: pointer;
56
- font-size: 16px;
57
- font-weight: bold;
58
- width: 100%;
59
- transition: transform 0.2s;
60
- }
61
-
62
- button:hover {
63
- transform: translateY(-2px);
64
- }
65
-
66
- button:disabled {
67
- background: #bdc3c7;
68
- cursor: not-allowed;
69
- transform: none;
70
- }
71
-
72
- .result {
73
- margin-top: 20px;
74
- padding: 15px;
75
- border-radius: 5px;
76
- display: none;
77
- }
78
-
79
- .success {
80
- background-color: #d4edda;
81
- color: #155724;
82
- border: 1px solid #c3e6cb;
83
- }
84
-
85
- .error {
86
- background-color: #f8d7da;
87
- color: #721c24;
88
- border: 1px solid #f5c6cb;
89
- }
90
-
91
- .loading {
92
- text-align: center;
93
- display: none;
94
- }
95
-
96
- .spinner {
97
- border: 4px solid #f3f3f3;
98
- border-top: 4px solid #667eea;
99
- border-radius: 50%;
100
- width: 30px;
101
- height: 30px;
102
- animation: spin 1s linear infinite;
103
- margin: 0 auto 10px;
104
- }
105
-
106
- @keyframes spin {
107
- 0% { transform: rotate(0deg); }
108
- 100% { transform: rotate(360deg); }
109
- }
110
-
111
- .instructions {
112
- background: #e3f2fd;
113
- padding: 15px;
114
- border-radius: 5px;
115
- margin-top: 20px;
116
- }
117
- </style>
118
- </head>
119
- <body>
120
- <div class="container">
121
- <h1>اختبار تحويل DOCX إلى PDF</h1>
122
-
123
- <div class="form-group">
124
- <label for="docxFile">اختر ملف DOCX:</label>
125
- <input type="file" id="docxFile" accept=".docx" required>
126
- </div>
127
-
128
- <button id="convertBtn" disabled>تحويل إلى PDF</button>
129
-
130
- <div class="loading" id="loading">
131
- <div class="spinner"></div>
132
- <p>جاري التحويل... يرجى الانتظار</p>
133
- </div>
134
-
135
- <div class="result success" id="successResult">
136
- <h3>تم التحويل بنجاح!</h3>
137
- <p>يمكنك تنزيل ملف PDF المحول:</p>
138
- <a id="downloadLink" href="#" target="_blank" style="display: inline-block; margin-top: 10px; padding: 10px 20px; background: #27ae60; color: white; text-decoration: none; border-radius: 5px;">تنزيل PDF</a>
139
- </div>
140
-
141
- <div class="result error" id="errorResult">
142
- <h3>حدث خطأ</h3>
143
- <p id="errorMessage"></p>
144
- </div>
145
-
146
- <div class="instructions">
147
- <h3>كيفية الاستخدام:</h3>
148
- <ol>
149
- <li>اختر ملف DOCX باستخدام الزر أعلاه</li>
150
- <li>انقر على زر "تحويل إلى PDF"</li>
151
- <li>انتظر حتى يكتمل التحويل</li>
152
- <li>انقر على "تنزيل PDF" للحصول على ملفك المحول</li>
153
- </ol>
154
- <p><strong>ملاحظة:</strong> هذا الواجهة تتصل مباشرة بمساحتك على Hugging Face Space.</p>
155
- </div>
156
- </div>
157
-
158
- <script>
159
- document.getElementById('docxFile').addEventListener('change', function(e) {
160
- const file = e.target.files[0];
161
- const convertBtn = document.getElementById('convertBtn');
162
-
163
- if (file && file.name.endsWith('.docx')) {
164
- convertBtn.disabled = false;
165
- } else {
166
- convertBtn.disabled = true;
167
- alert('الرجاء اختيار ملف DOCX فقط');
168
- }
169
- });
170
-
171
- document.getElementById('convertBtn').addEventListener('click', async function() {
172
- const fileInput = document.getElementById('docxFile');
173
- const convertBtn = document.getElementById('convertBtn');
174
- const loading = document.getElementById('loading');
175
- const successResult = document.getElementById('successResult');
176
- const errorResult = document.getElementById('errorResult');
177
- const errorMessage = document.getElementById('errorMessage');
178
- const downloadLink = document.getElementById('downloadLink');
179
-
180
- // Reset UI
181
- successResult.style.display = 'none';
182
- errorResult.style.display = 'none';
183
-
184
- const file = fileInput.files[0];
185
- if (!file) {
186
- alert('الرجاء اختيار ملف أولاً');
187
- return;
188
- }
189
-
190
- // Show loading
191
- convertBtn.disabled = true;
192
- loading.style.display = 'block';
193
-
194
- try {
195
- // Create FormData
196
- const formData = new FormData();
197
- formData.append('file', file);
198
-
199
- // Send request to your Hugging Face Space
200
- const response = await fetch('https://fokan-pdf-4.hf.space/convert', {
201
- method: 'POST',
202
- body: formData
203
- });
204
-
205
- const result = await response.json();
206
-
207
- if (result.success) {
208
- // Show success
209
- loading.style.display = 'none';
210
- successResult.style.display = 'block';
211
- downloadLink.href = 'https://fokan-pdf-4.hf.space' + result.pdf_url;
212
- } else {
213
- throw new Error(result.error || 'فشل التحويل');
214
- }
215
- } catch (error) {
216
- loading.style.display = 'none';
217
- errorResult.style.display = 'block';
218
- errorMessage.textContent = error.message || 'حدث خطأ أثناء التحويل';
219
- } finally {
220
- convertBtn.disabled = false;
221
- }
222
- });
223
- </script>
224
- </body>
225
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/api/main.py CHANGED
@@ -16,13 +16,8 @@ from fastapi import FastAPI, File, UploadFile, Form, HTTPException, BackgroundTa
16
  from fastapi.responses import FileResponse, JSONResponse
17
  from fastapi.middleware.cors import CORSMiddleware
18
  from fastapi.staticfiles import StaticFiles
19
- from fastapi.responses import HTMLResponse
20
  from pydantic import BaseModel
21
 
22
- # Set environment variables for LibreOffice before importing other modules
23
- os.environ['HOME'] = '/tmp'
24
- os.environ['USERPROFILE'] = '/tmp'
25
-
26
  # Import utility modules
27
  from src.utils.config import Config
28
  from src.utils.file_handler import FileHandler
@@ -54,47 +49,18 @@ app.add_middleware(
54
  allow_headers=Config.CORS_HEADERS,
55
  )
56
 
57
- # Create static directory if it doesn't exist
58
- os.makedirs(Config.STATIC_DIR, exist_ok=True)
59
-
60
- # Mount static files
61
- app.mount("/static", StaticFiles(directory=Config.STATIC_DIR), name="static")
62
 
63
  # Serve index.html at root if it exists
64
  if os.path.exists("templates/index.html"):
 
 
65
  @app.get("/", response_class=HTMLResponse)
66
  async def read_index():
67
  with open("templates/index.html", "r", encoding="utf-8") as f:
68
  return f.read()
69
- else:
70
- @app.get("/", response_class=HTMLResponse)
71
- async def read_index():
72
- return """
73
- <!DOCTYPE html>
74
- <html>
75
- <head>
76
- <title>Enhanced DOCX to PDF Converter</title>
77
- <style>
78
- body { font-family: Arial, sans-serif; margin: 40px; }
79
- .container { max-width: 800px; margin: 0 auto; }
80
- h1 { color: #333; }
81
- .info { background: #f5f5f5; padding: 20px; border-radius: 5px; }
82
- a { color: #007bff; text-decoration: none; }
83
- a:hover { text-decoration: underline; }
84
- </style>
85
- </head>
86
- <body>
87
- <div class="container">
88
- <h1>Enhanced DOCX to PDF Converter</h1>
89
- <div class="info">
90
- <p>The API is running successfully!</p>
91
- <p><a href="/docs">View API Documentation</a></p>
92
- <p><a href="/health">Health Check</a></p>
93
- </div>
94
- </div>
95
- </body>
96
- </html>
97
- """
98
 
99
  # Request/Response Models
100
  class ConversionRequest(BaseModel):
@@ -118,28 +84,12 @@ async def startup_event():
118
  """Initialize application on startup"""
119
  logger.info("Starting Enhanced DOCX to PDF Converter...")
120
 
121
- # Set environment variables for LibreOffice
122
- os.environ['HOME'] = '/tmp'
123
- os.environ['USERPROFILE'] = '/tmp'
124
-
125
  # Validate LibreOffice installation
126
  if not converter.validate_libreoffice():
127
  logger.warning("LibreOffice validation failed - conversions may not work")
128
 
129
  # Create temp directory if it doesn't exist
130
- try:
131
- os.makedirs(Config.TEMP_DIR, exist_ok=True)
132
- os.chmod(Config.TEMP_DIR, 0o777)
133
- logger.info(f"Ensured temp directory exists: {Config.TEMP_DIR}")
134
- except Exception as e:
135
- logger.error(f"Failed to create temp directory {Config.TEMP_DIR}: {e}")
136
-
137
- # Create static directory if it doesn't exist
138
- try:
139
- os.makedirs(Config.STATIC_DIR, exist_ok=True)
140
- logger.info(f"Ensured static directory exists: {Config.STATIC_DIR}")
141
- except Exception as e:
142
- logger.error(f"Failed to create static directory {Config.STATIC_DIR}: {e}")
143
 
144
  logger.info("Application started successfully")
145
 
@@ -209,16 +159,8 @@ async def convert_docx(
209
  if not converter.convert_docx_to_pdf(input_path, output_path):
210
  raise HTTPException(status_code=500, detail="Conversion failed")
211
 
212
- # Generate a unique filename for the static directory
213
- unique_filename = f"{uuid.uuid4()}_{output_filename}"
214
- static_file_path = os.path.join(Config.STATIC_DIR, unique_filename)
215
-
216
- # Move the converted PDF to the static directory
217
- import shutil
218
- shutil.move(output_path, static_file_path)
219
-
220
- # Return success response with direct URL to the PDF
221
- pdf_url = f"/static/{unique_filename}"
222
  return ConversionResponse(
223
  success=True,
224
  pdf_url=pdf_url,
@@ -231,18 +173,12 @@ async def convert_docx(
231
  logger.error(f"Conversion error: {e}")
232
  raise HTTPException(status_code=500, detail=f"Conversion failed: {str(e)}")
233
  finally:
234
- # Cleanup temporary directory
235
- if temp_dir and os.path.exists(temp_dir):
236
- import shutil
237
- try:
238
- shutil.rmtree(temp_dir)
239
- logger.info(f"Cleaned up temporary directory: {temp_dir}")
240
- except Exception as e:
241
- logger.error(f"Failed to cleanup directory {temp_dir}: {e}")
242
 
243
  @app.get("/download/{temp_id}/{filename}")
244
  async def download_pdf(temp_id: str, filename: str):
245
- """Download converted PDF file with inline content disposition"""
246
  try:
247
  file_path = f"{Config.TEMP_DIR}/{temp_id}/{filename}"
248
 
@@ -252,8 +188,7 @@ async def download_pdf(temp_id: str, filename: str):
252
  return FileResponse(
253
  path=file_path,
254
  filename=filename,
255
- media_type='application/pdf',
256
- headers={"Content-Disposition": "inline"}
257
  )
258
  except HTTPException:
259
  raise
@@ -299,16 +234,7 @@ async def batch_convert(request: BatchConversionRequest):
299
 
300
  # Perform conversion
301
  if converter.convert_docx_to_pdf(input_path, output_path):
302
- # Generate a unique filename for the static directory
303
- unique_filename = f"{uuid.uuid4()}_{output_filename}"
304
- static_file_path = os.path.join(Config.STATIC_DIR, unique_filename)
305
-
306
- # Move the converted PDF to the static directory
307
- import shutil
308
- shutil.move(output_path, static_file_path)
309
-
310
- # Return success response with direct URL to the PDF
311
- pdf_url = f"/static/{unique_filename}"
312
  results.append(ConversionResponse(
313
  success=True,
314
  pdf_url=pdf_url,
@@ -326,14 +252,5 @@ async def batch_convert(request: BatchConversionRequest):
326
  success=False,
327
  error=str(e)
328
  ))
329
- finally:
330
- # Cleanup temporary directory
331
- if 'temp_dir' in locals() and os.path.exists(temp_dir):
332
- import shutil
333
- try:
334
- shutil.rmtree(temp_dir)
335
- logger.info(f"Cleaned up temporary directory: {temp_dir}")
336
- except Exception as cleanup_e:
337
- logger.error(f"Failed to cleanup directory {temp_dir}: {cleanup_e}")
338
 
339
  return results
 
16
  from fastapi.responses import FileResponse, JSONResponse
17
  from fastapi.middleware.cors import CORSMiddleware
18
  from fastapi.staticfiles import StaticFiles
 
19
  from pydantic import BaseModel
20
 
 
 
 
 
21
  # Import utility modules
22
  from src.utils.config import Config
23
  from src.utils.file_handler import FileHandler
 
49
  allow_headers=Config.CORS_HEADERS,
50
  )
51
 
52
+ # Mount static files if templates directory exists
53
+ if os.path.exists("templates"):
54
+ app.mount("/static", StaticFiles(directory="templates"), name="static")
 
 
55
 
56
  # Serve index.html at root if it exists
57
  if os.path.exists("templates/index.html"):
58
+ from fastapi.responses import HTMLResponse
59
+
60
  @app.get("/", response_class=HTMLResponse)
61
  async def read_index():
62
  with open("templates/index.html", "r", encoding="utf-8") as f:
63
  return f.read()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  # Request/Response Models
66
  class ConversionRequest(BaseModel):
 
84
  """Initialize application on startup"""
85
  logger.info("Starting Enhanced DOCX to PDF Converter...")
86
 
 
 
 
 
87
  # Validate LibreOffice installation
88
  if not converter.validate_libreoffice():
89
  logger.warning("LibreOffice validation failed - conversions may not work")
90
 
91
  # Create temp directory if it doesn't exist
92
+ os.makedirs(Config.TEMP_DIR, exist_ok=True)
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  logger.info("Application started successfully")
95
 
 
159
  if not converter.convert_docx_to_pdf(input_path, output_path):
160
  raise HTTPException(status_code=500, detail="Conversion failed")
161
 
162
+ # Return success response
163
+ pdf_url = f"/download/{os.path.basename(temp_dir)}/{output_filename}"
 
 
 
 
 
 
 
 
164
  return ConversionResponse(
165
  success=True,
166
  pdf_url=pdf_url,
 
173
  logger.error(f"Conversion error: {e}")
174
  raise HTTPException(status_code=500, detail=f"Conversion failed: {str(e)}")
175
  finally:
176
+ # Cleanup will be handled by download endpoint or background task
177
+ pass
 
 
 
 
 
 
178
 
179
  @app.get("/download/{temp_id}/{filename}")
180
  async def download_pdf(temp_id: str, filename: str):
181
+ """Download converted PDF file"""
182
  try:
183
  file_path = f"{Config.TEMP_DIR}/{temp_id}/{filename}"
184
 
 
188
  return FileResponse(
189
  path=file_path,
190
  filename=filename,
191
+ media_type='application/pdf'
 
192
  )
193
  except HTTPException:
194
  raise
 
234
 
235
  # Perform conversion
236
  if converter.convert_docx_to_pdf(input_path, output_path):
237
+ pdf_url = f"/download/{os.path.basename(temp_dir)}/{output_filename}"
 
 
 
 
 
 
 
 
 
238
  results.append(ConversionResponse(
239
  success=True,
240
  pdf_url=pdf_url,
 
252
  success=False,
253
  error=str(e)
254
  ))
 
 
 
 
 
 
 
 
 
255
 
256
  return results
src/utils/config.py CHANGED
@@ -18,12 +18,8 @@ class Config:
18
 
19
  # Conversion settings
20
  MAX_CONVERSION_TIME = int(os.environ.get("MAX_CONVERSION_TIME", 120)) # 2 minutes
21
- # Use /tmp/conversions as it's more likely to be writable in containerized environments
22
  TEMP_DIR = os.environ.get("TEMP_DIR", "/tmp/conversions")
23
 
24
- # Static files directory for storing converted PDFs
25
- STATIC_DIR = os.environ.get("STATIC_DIR", "/app/static")
26
-
27
  # API settings
28
  API_TITLE = "Enhanced DOCX to PDF Converter"
29
  API_DESCRIPTION = "Professional API for converting DOCX files to PDF with perfect formatting preservation"
 
18
 
19
  # Conversion settings
20
  MAX_CONVERSION_TIME = int(os.environ.get("MAX_CONVERSION_TIME", 120)) # 2 minutes
 
21
  TEMP_DIR = os.environ.get("TEMP_DIR", "/tmp/conversions")
22
 
 
 
 
23
  # API settings
24
  API_TITLE = "Enhanced DOCX to PDF Converter"
25
  API_DESCRIPTION = "Professional API for converting DOCX files to PDF with perfect formatting preservation"
src/utils/converter.py CHANGED
@@ -25,27 +25,12 @@ class DocumentConverter:
25
  logger.error(f"Input file does not exist: {input_path}")
26
  return False
27
 
28
- # Get output directory
29
- output_dir = os.path.dirname(output_path)
30
-
31
- # Ensure output directory exists
32
- os.makedirs(output_dir, exist_ok=True)
33
-
34
- # Set environment variables for LibreOffice to avoid user installation issues
35
- env = os.environ.copy()
36
- env['HOME'] = '/tmp'
37
- env['USERPROFILE'] = '/tmp'
38
-
39
  # Use LibreOffice headless mode for conversion
40
  cmd = [
41
  "libreoffice",
42
  "--headless",
43
- "--norestore",
44
- "--nofirststartwizard",
45
- "--nologo",
46
- "--nolockcheck",
47
  "--convert-to", "pdf",
48
- "--outdir", output_dir,
49
  input_path
50
  ]
51
 
@@ -55,21 +40,16 @@ class DocumentConverter:
55
  cmd,
56
  capture_output=True,
57
  text=True,
58
- timeout=self.max_conversion_time,
59
- env=env
60
  )
61
 
62
  if result.returncode != 0:
63
- logger.error(f"Conversion failed with return code {result.returncode}: {result.stderr}")
64
  return False
65
 
66
  # Check if PDF was created
67
  if not os.path.exists(output_path):
68
  logger.error("PDF file was not created")
69
- # List files in output directory for debugging
70
- if os.path.exists(output_dir):
71
- files = os.listdir(output_dir)
72
- logger.info(f"Files in output directory: {files}")
73
  return False
74
 
75
  logger.info(f"Successfully converted {input_path} to {output_path}")
@@ -93,17 +73,11 @@ class DocumentConverter:
93
  def validate_libreoffice(self) -> bool:
94
  """Validate LibreOffice installation"""
95
  try:
96
- # Set environment variables for LibreOffice
97
- env = os.environ.copy()
98
- env['HOME'] = '/tmp'
99
- env['USERPROFILE'] = '/tmp'
100
-
101
  result = subprocess.run(
102
  ["libreoffice", "--version"],
103
  capture_output=True,
104
  text=True,
105
- timeout=10,
106
- env=env
107
  )
108
  if result.returncode != 0:
109
  logger.error("LibreOffice not found or not working")
 
25
  logger.error(f"Input file does not exist: {input_path}")
26
  return False
27
 
 
 
 
 
 
 
 
 
 
 
 
28
  # Use LibreOffice headless mode for conversion
29
  cmd = [
30
  "libreoffice",
31
  "--headless",
 
 
 
 
32
  "--convert-to", "pdf",
33
+ "--outdir", os.path.dirname(output_path),
34
  input_path
35
  ]
36
 
 
40
  cmd,
41
  capture_output=True,
42
  text=True,
43
+ timeout=self.max_conversion_time
 
44
  )
45
 
46
  if result.returncode != 0:
47
+ logger.error(f"Conversion failed: {result.stderr}")
48
  return False
49
 
50
  # Check if PDF was created
51
  if not os.path.exists(output_path):
52
  logger.error("PDF file was not created")
 
 
 
 
53
  return False
54
 
55
  logger.info(f"Successfully converted {input_path} to {output_path}")
 
73
  def validate_libreoffice(self) -> bool:
74
  """Validate LibreOffice installation"""
75
  try:
 
 
 
 
 
76
  result = subprocess.run(
77
  ["libreoffice", "--version"],
78
  capture_output=True,
79
  text=True,
80
+ timeout=10
 
81
  )
82
  if result.returncode != 0:
83
  logger.error("LibreOffice not found or not working")
src/utils/file_handler.py CHANGED
@@ -16,37 +16,18 @@ class FileHandler:
16
  """Handle file operations for the converter"""
17
 
18
  def __init__(self, base_temp_dir: str = "/tmp/conversions"):
19
- # Use /tmp as fallback since it's more likely to be writable in containerized environments
20
  self.base_temp_dir = base_temp_dir
21
- try:
22
- os.makedirs(self.base_temp_dir, exist_ok=True)
23
- # Ensure the directory is writable
24
- os.chmod(self.base_temp_dir, 0o777)
25
- except Exception as e:
26
- logger.error(f"Failed to create base temp directory {self.base_temp_dir}: {e}")
27
- # Fallback to system temp directory
28
- self.base_temp_dir = tempfile.gettempdir()
29
- logger.info(f"Falling back to system temp directory: {self.base_temp_dir}")
30
 
31
  def create_temp_directory(self) -> str:
32
  """Create a temporary directory for file processing"""
33
  try:
34
  temp_dir = tempfile.mkdtemp(dir=self.base_temp_dir)
35
  logger.info(f"Created temporary directory: {temp_dir}")
36
- # Ensure the directory is writable
37
- os.chmod(temp_dir, 0o777)
38
  return temp_dir
39
  except Exception as e:
40
  logger.error(f"Failed to create temporary directory: {e}")
41
- # Try fallback to system temp directory
42
- try:
43
- temp_dir = tempfile.mkdtemp()
44
- os.chmod(temp_dir, 0o777)
45
- logger.info(f"Created temporary directory in fallback location: {temp_dir}")
46
- return temp_dir
47
- except Exception as fallback_e:
48
- logger.error(f"Fallback also failed: {fallback_e}")
49
- raise
50
 
51
  def save_uploaded_file(self, temp_dir: str, filename: str, content: bytes) -> str:
52
  """Save uploaded file to temporary directory"""
 
16
  """Handle file operations for the converter"""
17
 
18
  def __init__(self, base_temp_dir: str = "/tmp/conversions"):
 
19
  self.base_temp_dir = base_temp_dir
20
+ os.makedirs(self.base_temp_dir, exist_ok=True)
 
 
 
 
 
 
 
 
21
 
22
  def create_temp_directory(self) -> str:
23
  """Create a temporary directory for file processing"""
24
  try:
25
  temp_dir = tempfile.mkdtemp(dir=self.base_temp_dir)
26
  logger.info(f"Created temporary directory: {temp_dir}")
 
 
27
  return temp_dir
28
  except Exception as e:
29
  logger.error(f"Failed to create temporary directory: {e}")
30
+ raise
 
 
 
 
 
 
 
 
31
 
32
  def save_uploaded_file(self, temp_dir: str, filename: str, content: bytes) -> str:
33
  """Save uploaded file to temporary directory"""
static/.gitkeep DELETED
File without changes
test_interface.html DELETED
@@ -1,287 +0,0 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>DOCX to PDF Converter Test</title>
7
- <script src="https://cdn.tailwindcss.com"></script>
8
- <style>
9
- @import url('https://fonts.googleapis.com/css2?family=Tajawal:wght@300;400;500;700&display=swap');
10
-
11
- body {
12
- font-family: 'Tajawal', sans-serif;
13
- background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
14
- min-height: 100vh;
15
- }
16
-
17
- .arabic-text {
18
- font-family: 'Tajawal', sans-serif;
19
- direction: rtl;
20
- text-align: right;
21
- }
22
-
23
- .glow {
24
- box-shadow: 0 0 20px rgba(255, 255, 255, 0.2);
25
- }
26
-
27
- .pulse {
28
- animation: pulse 2s infinite;
29
- }
30
-
31
- @keyframes pulse {
32
- 0% { box-shadow: 0 0 0 0 rgba(255, 255, 255, 0.4); }
33
- 70% { box-shadow: 0 0 0 10px rgba(255, 255, 255, 0); }
34
- 100% { box-shadow: 0 0 0 0 rgba(255, 255, 255, 0); }
35
- }
36
- </style>
37
- </head>
38
- <body class="bg-gradient-to-br from-blue-50 to-purple-50">
39
- <div class="container mx-auto px-4 py-8 max-w-4xl">
40
- <!-- Header -->
41
- <div class="text-center mb-10">
42
- <h1 class="text-4xl font-bold text-white mb-2">DOCX to PDF Converter</h1>
43
- <p class="text-blue-100 text-lg">تحويل مستندات Word إلى PDF بسهولة</p>
44
- </div>
45
-
46
- <!-- Main Card -->
47
- <div class="bg-white rounded-2xl shadow-xl overflow-hidden glow">
48
- <div class="p-8">
49
- <!-- Upload Section -->
50
- <div class="mb-8">
51
- <h2 class="text-2xl font-bold text-gray-800 mb-4">رفع ملف DOCX</h2>
52
- <p class="text-gray-600 mb-6">اختر ملف DOCX لتحويله إلى PDF</p>
53
-
54
- <div class="border-2 border-dashed border-gray-300 rounded-xl p-8 text-center transition-all hover:border-blue-400 hover:bg-blue-50 cursor-pointer" id="dropZone">
55
- <div class="flex flex-col items-center justify-center">
56
- <svg class="w-12 h-12 text-gray-400 mb-4" fill="none" stroke="currentColor" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
57
- <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M7 16a4 4 0 01-.88-7.903A5 5 0 1115.9 6L16 6a5 5 0 011 9.9M15 13l-3-3m0 0l-3 3m3-3v12"></path>
58
- </svg>
59
- <p class="text-gray-600 mb-2">اسحب الملف هنا أو انقر للاختيار</p>
60
- <p class="text-gray-400 text-sm">يدعم فقط ملفات DOCX</p>
61
- <input type="file" id="fileInput" class="hidden" accept=".docx">
62
- </div>
63
- </div>
64
-
65
- <div class="mt-4 text-center">
66
- <button id="selectFileBtn" class="bg-blue-500 hover:bg-blue-600 text-white font-medium py-3 px-6 rounded-lg transition-all transform hover:scale-105">
67
- اختيار ملف
68
- </button>
69
- </div>
70
- </div>
71
-
72
- <!-- File Info -->
73
- <div id="fileInfo" class="hidden mb-6 p-4 bg-blue-50 rounded-lg border border-blue-200">
74
- <div class="flex items-center">
75
- <svg class="w-5 h-5 text-blue-500 mr-2" fill="none" stroke="currentColor" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
76
- <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
77
- </svg>
78
- <span id="fileName" class="text-blue-800 font-medium"></span>
79
- </div>
80
- </div>
81
-
82
- <!-- Convert Button -->
83
- <div class="text-center mb-8">
84
- <button id="convertBtn" class="bg-gradient-to-r from-purple-500 to-indigo-600 hover:from-purple-600 hover:to-indigo-700 text-white font-bold py-4 px-8 rounded-xl text-lg shadow-lg transition-all transform hover:scale-105 disabled:opacity-50 disabled:cursor-not-allowed disabled:transform-none" disabled>
85
- تحويل إلى PDF
86
- </button>
87
- </div>
88
-
89
- <!-- Status Section -->
90
- <div id="statusSection" class="hidden">
91
- <div class="flex items-center justify-center mb-4">
92
- <div id="spinner" class="hidden animate-spin rounded-full h-8 w-8 border-b-2 border-purple-600 mr-3"></div>
93
- <h3 id="statusTitle" class="text-xl font-bold text-gray-800"></h3>
94
- </div>
95
- <p id="statusMessage" class="text-center text-gray-600"></p>
96
-
97
- <!-- Result Section -->
98
- <div id="resultSection" class="hidden mt-6 p-6 bg-green-50 rounded-xl border border-green-200">
99
- <div class="text-center">
100
- <svg class="w-12 h-12 text-green-500 mx-auto mb-4" fill="none" stroke="currentColor" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
101
- <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z"></path>
102
- </svg>
103
- <h4 class="text-xl font-bold text-green-800 mb-2">تم التحويل بنجاح!</h4>
104
- <p class="text-green-600 mb-4">يمكنك الآن فتح ملف PDF المحول</p>
105
- <button id="openPdfBtn" class="inline-block bg-green-500 hover:bg-green-600 text-white font-medium py-3 px-6 rounded-lg transition-all transform hover:scale-105">
106
- فتح PDF في المتصفح
107
- </button>
108
- </div>
109
- </div>
110
-
111
- <!-- Error Section -->
112
- <div id="errorSection" class="hidden mt-6 p-6 bg-red-50 rounded-xl border border-red-200">
113
- <div class="text-center">
114
- <svg class="w-12 h-12 text-red-500 mx-auto mb-4" fill="none" stroke="currentColor" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
115
- <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 8v4m0 4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"></path>
116
- </svg>
117
- <h4 class="text-xl font-bold text-red-800 mb-2">حدث خطأ</h4>
118
- <p id="errorMessage" class="text-red-600 mb-4"></p>
119
- <button id="retryBtn" class="bg-red-500 hover:bg-red-600 text-white font-medium py-2 px-4 rounded-lg">
120
- المحاولة مرة أخرى
121
- </button>
122
- </div>
123
- </div>
124
- </div>
125
- </div>
126
- </div>
127
-
128
- <!-- Instructions -->
129
- <div class="mt-8 bg-white rounded-2xl shadow-lg p-6">
130
- <h3 class="text-xl font-bold text-gray-800 mb-4">كيفية الاستخدام</h3>
131
- <ol class="list-decimal list-inside space-y-2 text-gray-600">
132
- <li>انقر على زر "اختيار ملف" أو اسحب ملف DOCX إلى المنطقة المخصصة</li>
133
- <li>تأكد من أن الملف بصيغة DOCX</li>
134
- <li>انقر على زر "تحويل إلى PDF"</li>
135
- <li>انتظر حتى يكتمل التحويل</li>
136
- <li>انقر على "فتح PDF في المتصفح" لعرض ملفك المحول</li>
137
- </ol>
138
- </div>
139
- </div>
140
-
141
- <script>
142
- // DOM Elements
143
- const dropZone = document.getElementById('dropZone');
144
- const fileInput = document.getElementById('fileInput');
145
- const selectFileBtn = document.getElementById('selectFileBtn');
146
- const fileInfo = document.getElementById('fileInfo');
147
- const fileName = document.getElementById('fileName');
148
- const convertBtn = document.getElementById('convertBtn');
149
- const statusSection = document.getElementById('statusSection');
150
- const statusTitle = document.getElementById('statusTitle');
151
- const statusMessage = document.getElementById('statusMessage');
152
- const spinner = document.getElementById('spinner');
153
- const resultSection = document.getElementById('resultSection');
154
- const errorSection = document.getElementById('errorSection');
155
- const errorMessage = document.getElementById('errorMessage');
156
- const openPdfBtn = document.getElementById('openPdfBtn');
157
- const retryBtn = document.getElementById('retryBtn');
158
-
159
- // Hugging Face Space URL
160
- const SPACE_URL = 'https://fokan-pdf-4.hf.space';
161
-
162
- // Current PDF URL
163
- let currentPdfUrl = '';
164
-
165
- // Event Listeners
166
- selectFileBtn.addEventListener('click', () => fileInput.click());
167
- fileInput.addEventListener('change', handleFileSelect);
168
- dropZone.addEventListener('dragover', handleDragOver);
169
- dropZone.addEventListener('drop', handleDrop);
170
- convertBtn.addEventListener('click', convertFile);
171
- openPdfBtn.addEventListener('click', openPdfInNewTab);
172
- retryBtn.addEventListener('click', resetInterface);
173
-
174
- // Functions
175
- function handleFileSelect(e) {
176
- const file = e.target.files[0];
177
- if (file && file.name.endsWith('.docx')) {
178
- displayFileInfo(file);
179
- } else {
180
- alert('الرجاء اختيار ملف DOCX فقط');
181
- }
182
- }
183
-
184
- function handleDragOver(e) {
185
- e.preventDefault();
186
- e.stopPropagation();
187
- dropZone.classList.add('border-blue-400', 'bg-blue-50');
188
- }
189
-
190
- function handleDrop(e) {
191
- e.preventDefault();
192
- e.stopPropagation();
193
- dropZone.classList.remove('border-blue-400', 'bg-blue-50');
194
-
195
- const file = e.dataTransfer.files[0];
196
- if (file && file.name.endsWith('.docx')) {
197
- fileInput.files = e.dataTransfer.files;
198
- displayFileInfo(file);
199
- } else {
200
- alert('الرجاء سحب ملف DOCX فقط');
201
- }
202
- }
203
-
204
- function displayFileInfo(file) {
205
- fileName.textContent = file.name;
206
- fileInfo.classList.remove('hidden');
207
- convertBtn.disabled = false;
208
- convertBtn.classList.remove('disabled:opacity-50', 'disabled:cursor-not-allowed');
209
- }
210
-
211
- async function convertFile() {
212
- const file = fileInput.files[0];
213
- if (!file) return;
214
-
215
- // Show loading state
216
- showStatus('جاري التحويل...', 'يرجى الانتظار بينما نحول ملفك إلى PDF');
217
- spinner.classList.remove('hidden');
218
-
219
- try {
220
- // Prepare form data
221
- const formData = new FormData();
222
- formData.append('file', file);
223
-
224
- // Send request to Hugging Face Space
225
- const response = await fetch(`${SPACE_URL}/convert`, {
226
- method: 'POST',
227
- body: formData
228
- });
229
-
230
- if (!response.ok) {
231
- throw new Error(`خطأ في الطلب: ${response.status}`);
232
- }
233
-
234
- const result = await response.json();
235
-
236
- if (result.success) {
237
- // Show success
238
- showResult(result.pdf_url);
239
- } else {
240
- throw new Error(result.error || 'فشل التحويل');
241
- }
242
- } catch (error) {
243
- showError(error.message || 'حدث خطأ غير متوقع');
244
- }
245
- }
246
-
247
- function showStatus(title, message) {
248
- statusTitle.textContent = title;
249
- statusMessage.textContent = message;
250
- statusSection.classList.remove('hidden');
251
- resultSection.classList.add('hidden');
252
- errorSection.classList.add('hidden');
253
- }
254
-
255
- function showResult(pdfUrl) {
256
- spinner.classList.add('hidden');
257
- statusTitle.textContent = 'اكتمل التحويل!';
258
- statusMessage.textContent = 'ملفك جاهز للفتح';
259
- resultSection.classList.remove('hidden');
260
-
261
- // Store the PDF URL for opening in new tab
262
- currentPdfUrl = SPACE_URL + pdfUrl;
263
- }
264
-
265
- function openPdfInNewTab() {
266
- // Open PDF in new tab using window.open with "_blank" target
267
- window.open(currentPdfUrl, "_blank");
268
- }
269
-
270
- function showError(message) {
271
- spinner.classList.add('hidden');
272
- statusTitle.textContent = 'فشل التحويل';
273
- errorMessage.textContent = message;
274
- errorSection.classList.remove('hidden');
275
- }
276
-
277
- function resetInterface() {
278
- statusSection.classList.add('hidden');
279
- fileInfo.classList.add('hidden');
280
- convertBtn.disabled = true;
281
- convertBtn.classList.add('disabled:opacity-50', 'disabled:cursor-not-allowed');
282
- fileInput.value = '';
283
- currentPdfUrl = '';
284
- }
285
- </script>
286
- </body>
287
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_root_endpoint.py DELETED
@@ -1,50 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Test script to verify the root endpoint is working correctly
4
- """
5
-
6
- import os
7
- import sys
8
-
9
- def test_root_endpoint():
10
- """Test that the root endpoint is properly configured"""
11
- print("Testing root endpoint configuration...")
12
-
13
- # Check if templates directory exists
14
- if not os.path.exists("templates"):
15
- print("❌ templates directory not found")
16
- return False
17
- print("✅ templates directory found")
18
-
19
- # Check if index.html exists
20
- if not os.path.exists("templates/index.html"):
21
- print("❌ templates/index.html not found")
22
- return False
23
- print("✅ templates/index.html found")
24
-
25
- # Check if main.py has the root endpoint handler
26
- if not os.path.exists("src/api/main.py"):
27
- print("❌ src/api/main.py not found")
28
- return False
29
-
30
- with open("src/api/main.py", "r", encoding="utf-8") as f:
31
- content = f.read()
32
-
33
- # Check for root endpoint handler
34
- if "async def read_index():" in content and 'app.get("/",' in content:
35
- print("✅ Root endpoint handler found in main.py")
36
- else:
37
- print("❌ Root endpoint handler not found in main.py")
38
- return False
39
-
40
- print("\n✅ Root endpoint configuration is correct!")
41
- print("\nWhen the application is running, you should be able to access:")
42
- print("- The web interface at http://localhost:7860/")
43
- print("- API documentation at http://localhost:7860/docs")
44
- print("- Health check at http://localhost:7860/health")
45
-
46
- return True
47
-
48
- if __name__ == "__main__":
49
- success = test_root_endpoint()
50
- sys.exit(0 if success else 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_static_serving.py DELETED
@@ -1,29 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Test script to verify static file serving functionality
4
- """
5
-
6
- import os
7
- import requests
8
- import time
9
-
10
- def test_static_file_serving():
11
- """Test that static files are served correctly"""
12
- # Test URL for the Hugging Face Space
13
- base_url = "https://fokan-pdf-4.hf.space"
14
-
15
- # First, let's check if the static endpoint is accessible
16
- try:
17
- response = requests.get(f"{base_url}/static/")
18
- print(f"Static directory access: {response.status_code}")
19
-
20
- if response.status_code == 200:
21
- print("✅ Static file serving is working")
22
- else:
23
- print("❌ Static file serving may not be working properly")
24
-
25
- except Exception as e:
26
- print(f"❌ Error testing static file serving: {e}")
27
-
28
- if __name__ == "__main__":
29
- test_static_file_serving()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
validate_dockerfile.py DELETED
@@ -1,72 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Script to validate Dockerfile syntax and content
4
- """
5
-
6
- import os
7
- import re
8
-
9
- def validate_dockerfile():
10
- """Validate Dockerfile content"""
11
- dockerfile_path = "Dockerfile"
12
-
13
- if not os.path.exists(dockerfile_path):
14
- print("❌ Dockerfile not found")
15
- return False
16
-
17
- with open(dockerfile_path, "r") as f:
18
- content = f.read()
19
-
20
- print("🔍 Validating Dockerfile...")
21
-
22
- # Check for required sections
23
- required_patterns = [
24
- r"FROM ubuntu:22.04",
25
- r"WORKDIR /app",
26
- r"COPY requirements.txt",
27
- r"pip3 install",
28
- r"COPY app.py",
29
- r"EXPOSE 7860",
30
- r"CMD \["
31
- ]
32
-
33
- for pattern in required_patterns:
34
- if not re.search(pattern, content):
35
- print(f"❌ Missing required pattern: {pattern}")
36
- return False
37
- print(f"✅ Found required pattern: {pattern}")
38
-
39
- # Check for removed packages
40
- removed_packages = [
41
- r"libreoffice-help-ar",
42
- r"fonts-noto-naskh",
43
- r"fonts-noto-kufi-arabic",
44
- r"fonts-amiri",
45
- r"fonts-scheherazade-new"
46
- ]
47
-
48
- for package in removed_packages:
49
- if re.search(package, content):
50
- print(f"❌ Found removed package: {package}")
51
- return False
52
- print(f"✅ Confirmed removal of package: {package}")
53
-
54
- # Check for font installation script
55
- if "arabic_fonts_setup.sh" in content:
56
- print("✅ Found Arabic font installation script")
57
- else:
58
- print("❌ Missing Arabic font installation script")
59
- return False
60
-
61
- # Check for proper error handling
62
- if "|| true" in content:
63
- print("✅ Found error handling with || true")
64
- else:
65
- print("⚠️ No error handling found (may be OK)")
66
-
67
- print("\n✅ Dockerfile validation passed!")
68
- return True
69
-
70
- if __name__ == "__main__":
71
- success = validate_dockerfile()
72
- exit(0 if success else 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
verify_minimal.py DELETED
@@ -1,63 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Verification script for minimal setup
4
- """
5
-
6
- import os
7
- import sys
8
-
9
- def verify_minimal_setup():
10
- """Verify that all required files for minimal setup exist"""
11
- print("Verifying minimal setup...")
12
-
13
- # Essential directories
14
- required_dirs = [
15
- "src",
16
- "src/api",
17
- "src/utils",
18
- "templates",
19
- "conversions"
20
- ]
21
-
22
- # Essential files
23
- required_files = [
24
- "src/api/main.py",
25
- "src/api/app.py",
26
- "src/utils/config.py",
27
- "src/utils/converter.py",
28
- "src/utils/file_handler.py",
29
- "templates/index.html",
30
- "Dockerfile",
31
- "docker-compose.yml",
32
- "requirements.txt",
33
- "install_arabic_fonts.sh",
34
- "arial.ttf",
35
- "README.md"
36
- ]
37
-
38
- # Check directories
39
- for dir_path in required_dirs:
40
- if not os.path.exists(dir_path):
41
- print(f"❌ Missing directory: {dir_path}")
42
- return False
43
- print(f"✅ Found directory: {dir_path}")
44
-
45
- # Check files
46
- for file_path in required_files:
47
- if not os.path.exists(file_path):
48
- print(f"❌ Missing file: {file_path}")
49
- return False
50
- print(f"✅ Found file: {file_path}")
51
-
52
- print("\n✅ Minimal setup verification passed!")
53
- print("\nThis setup includes only the essential files needed to run the application:")
54
- print("- Core application files (FastAPI + utilities)")
55
- print("- Docker configuration")
56
- print("- Frontend interface")
57
- print("- Required assets and documentation")
58
-
59
- return True
60
-
61
- if __name__ == "__main__":
62
- success = verify_minimal_setup()
63
- sys.exit(0 if success else 1)