Upload 12 files
Browse files- ARABIC_USAGE_GUIDE.md +137 -0
- DEPLOYMENT_GUIDE.md +217 -0
- Dockerfile +74 -0
- README.md +27 -17
- app.py +321 -46
- arabic_fonts_setup.sh +41 -0
- docker-compose.yml +23 -0
- libreoffice_arabic_config.xml +108 -0
- packages.txt +8 -0
- run_local.py +113 -0
- test_conversion.py +122 -0
ARABIC_USAGE_GUIDE.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 📄 دليل الاستخدام - محول DOCX إلى PDF للعربية
|
| 2 |
+
|
| 3 |
+
## 🎯 نظرة عامة
|
| 4 |
+
|
| 5 |
+
هذا المحول مصمم خصيصاً لحل المشاكل الشائعة في تحويل المستندات العربية من Word إلى PDF مع الحفاظ الكامل على التنسيق.
|
| 6 |
+
|
| 7 |
+
## ✅ المشاكل التي تم حلها
|
| 8 |
+
|
| 9 |
+
### 1. ❌ تراكب النصوص العربية
|
| 10 |
+
**المشكلة:** النصوص العربية تتداخل أو تفقد المسافات الصحيحة
|
| 11 |
+
**الحل:**
|
| 12 |
+
- تحسين إعدادات الخطوط العربية
|
| 13 |
+
- ضبط المسافات والتباعد بدقة
|
| 14 |
+
- استخدام خطوط Amiri و Noto Naskh Arabic المحسنة
|
| 15 |
+
|
| 16 |
+
### 2. ❌ فقدان المحاذاة اليمنى (RTL)
|
| 17 |
+
**المشكلة:** النص العربي يظهر من اليسار لليمين بدلاً من اليمين لليسار
|
| 18 |
+
**الحل:**
|
| 19 |
+
- تفعيل دعم CTL (Complex Text Layout)
|
| 20 |
+
- إعداد اتجاه النص الافتراضي إلى RTL
|
| 21 |
+
- تحسين إعدادات اللغة العربية
|
| 22 |
+
|
| 23 |
+
### 3. ❌ استبدال الخطوط العربية
|
| 24 |
+
**المشكلة:** الخطوط العربية الأصلية تُستبدل بخطوط لا تدعم العربية
|
| 25 |
+
**الحل:**
|
| 26 |
+
- تثبيت خطوط عربية عالية الجودة (Amiri, Noto Naskh, Scheherazade)
|
| 27 |
+
- إعداد قواعد استبدال الخطوط المحسنة
|
| 28 |
+
- تضمين الخطوط في ملف PDF النهائي
|
| 29 |
+
|
| 30 |
+
### 4. ❌ تشوه الجداول
|
| 31 |
+
**المشكلة:** الجداول تفقد تنسيقها أو تتشوه أثناء التحويل
|
| 32 |
+
**الحل:**
|
| 33 |
+
- إعدادات خاصة للجداول مع الحفاظ على الأبعاد
|
| 34 |
+
- منع التغييرات التلقائية في الخط العريض
|
| 35 |
+
- الحفاظ على حدود الخلايا والمحاذاة
|
| 36 |
+
|
| 37 |
+
### 5. ❌ تغيير مواقع قوالب التعبئة
|
| 38 |
+
**المشكلة:** قوالب مثل {{name}} و {{date}} تتحرك من مواقعها
|
| 39 |
+
**الحل:**
|
| 40 |
+
- تعطيل الاستبدال التلقائي للنصوص
|
| 41 |
+
- الحفاظ على المواقع الدقيقة للعناصر
|
| 42 |
+
- منع إعادة التدفق التلقائي للنص
|
| 43 |
+
|
| 44 |
+
### 6. ❌ حجم الصفحة غير مناسب للطباعة
|
| 45 |
+
**المشكلة:** ملف PDF لا يطبع بشكل صحيح على ورق A4
|
| 46 |
+
**الحل:**
|
| 47 |
+
- ضبط أبعاد الصفحة بدقة لورق A4
|
| 48 |
+
- تحسين الهوامش للطباعة المثلى
|
| 49 |
+
- ضمان التوافق مع معايير الطباعة
|
| 50 |
+
|
| 51 |
+
## 🚀 كيفية الاستخدام
|
| 52 |
+
|
| 53 |
+
### 1. الاستخدام عبر الواجهة
|
| 54 |
+
1. افتح الرابط في المتصفح
|
| 55 |
+
2. اضغط على "Upload DOCX File"
|
| 56 |
+
3. اختر ملف Word العربي
|
| 57 |
+
4. انتظر التحويل (قد يستغرق دقائق للملفات المعقدة)
|
| 58 |
+
5. حمل ملف PDF المحول
|
| 59 |
+
|
| 60 |
+
### 2. الاستخدام المحلي
|
| 61 |
+
```bash
|
| 62 |
+
# تثبيت التبعيات
|
| 63 |
+
pip install -r requirements.txt
|
| 64 |
+
|
| 65 |
+
# تشغيل التطبيق
|
| 66 |
+
python app.py
|
| 67 |
+
|
| 68 |
+
# اختبار التحويل
|
| 69 |
+
python test_conversion.py
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
## 📋 نصائح للحصول على أفضل النتائج
|
| 73 |
+
|
| 74 |
+
### ✅ إعداد ملف Word الأصلي
|
| 75 |
+
- استخدم خطوط عربية معيارية (Traditional Arabic, Arabic Typesetting)
|
| 76 |
+
- تأكد من ضبط اتجاه النص إلى RTL
|
| 77 |
+
- تجنب الخطوط النادرة أو المخصصة
|
| 78 |
+
- احفظ الملف بصيغة .docx (ليس .doc)
|
| 79 |
+
|
| 80 |
+
### ✅ للجداول
|
| 81 |
+
- استخدم جداول بسيطة بدون دمج معقد للخلايا
|
| 82 |
+
- تجنب الجداول المتداخلة
|
| 83 |
+
- اضبط عرض الأعمدة بوضوح
|
| 84 |
+
- استخدم حدود واضحة للجداول
|
| 85 |
+
|
| 86 |
+
### ✅ للصور
|
| 87 |
+
- استخدم صور بدقة عالية (300 DPI أو أكثر)
|
| 88 |
+
- تجنب الصور المضغوطة بشدة
|
| 89 |
+
- اضبط حجم الصور في Word قبل التحويل
|
| 90 |
+
|
| 91 |
+
### ✅ للنصوص المختلطة (عربي/إنجليزي)
|
| 92 |
+
- اضبط اتجاه كل فقرة حسب اللغة
|
| 93 |
+
- استخدم خطوط تدعم كلا اللغتين
|
| 94 |
+
- تجنب الخلط في نفس السطر إذا أمكن
|
| 95 |
+
|
| 96 |
+
## 🔧 استكشاف الأخطاء وإصلاحها
|
| 97 |
+
|
| 98 |
+
### مشكلة: النص العربي يظهر مقطع أو مشوه
|
| 99 |
+
**الحل:**
|
| 100 |
+
- تأكد من أن الملف محفوظ بترميز UTF-8
|
| 101 |
+
- جرب خط عربي مختلف في Word
|
| 102 |
+
- تأكد من تفعيل دعم اللغات المعقدة في Word
|
| 103 |
+
|
| 104 |
+
### مشكلة: الجداول تظهر مشوهة
|
| 105 |
+
**الحل:**
|
| 106 |
+
- بسط تصميم الجدول
|
| 107 |
+
- تجنب دمج الخلايا المعقد
|
| 108 |
+
- اضبط عرض الجدول ليناسب الصفحة
|
| 109 |
+
|
| 110 |
+
### مشكلة: حجم الملف كبير جداً
|
| 111 |
+
**الحل:**
|
| 112 |
+
- ضغط الصور في Word قبل التحويل
|
| 113 |
+
- تجنب الصور عالية الدقة غير الضرورية
|
| 114 |
+
- استخدم تنسيقات صور محسنة (JPEG بدلاً من PNG للصور)
|
| 115 |
+
|
| 116 |
+
### مشكلة: التحويل يستغرق وقت طويل
|
| 117 |
+
**الحل:**
|
| 118 |
+
- قسم الم��تند الكبير إلى أجزاء أصغر
|
| 119 |
+
- أزل العناصر غير الضرورية
|
| 120 |
+
- تأكد من استقرار اتصال الإنترنت
|
| 121 |
+
|
| 122 |
+
## 📞 الدعم الفني
|
| 123 |
+
|
| 124 |
+
إذا واجهت مشاكل لم تُحل بالطرق أعلاه:
|
| 125 |
+
1. تأكد من أن ملف Word يفتح بشكل صحيح في Microsoft Word
|
| 126 |
+
2. جرب تحويل ملف أبسط أولاً للتأكد من عمل النظام
|
| 127 |
+
3. تحقق من حجم الملف (يُفضل أقل من 50 ميجابايت)
|
| 128 |
+
4. تأكد من أن الملف ليس محمي بكلمة مرور
|
| 129 |
+
|
| 130 |
+
## 🎯 أمثلة ناجحة
|
| 131 |
+
|
| 132 |
+
هذا المحول تم اختباره بنجاح مع:
|
| 133 |
+
- ✅ تقارير عربية معقدة مع جداول
|
| 134 |
+
- ✅ رسائل رسمية بالعربية
|
| 135 |
+
- ✅ مستندات أكاديمية مختلطة (عربي/إنجليزي)
|
| 136 |
+
- ✅ نماذج تعبئة بقوالب ديناميكية
|
| 137 |
+
- ✅ مستندات بصور وجداول معقدة
|
DEPLOYMENT_GUIDE.md
ADDED
|
@@ -0,0 +1,217 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🚀 دليل النشر - محول DOCX إلى PDF للعربية
|
| 2 |
+
|
| 3 |
+
## 📋 خيارات النشر
|
| 4 |
+
|
| 5 |
+
### 1. 🌐 Hugging Face Spaces (الموصى به)
|
| 6 |
+
|
| 7 |
+
#### الخطوات:
|
| 8 |
+
1. **إنشاء Space جديد:**
|
| 9 |
+
- اذهب إلى [Hugging Face Spaces](https://huggingface.co/spaces)
|
| 10 |
+
- اضغط "Create new Space"
|
| 11 |
+
- اختر "Gradio" كـ SDK
|
| 12 |
+
- اختر اسم للـ Space
|
| 13 |
+
|
| 14 |
+
2. **رفع الملفات:**
|
| 15 |
+
```bash
|
| 16 |
+
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
|
| 17 |
+
cd YOUR_SPACE_NAME
|
| 18 |
+
|
| 19 |
+
# نسخ الملفات المطلوبة
|
| 20 |
+
cp /path/to/your/project/app.py .
|
| 21 |
+
cp /path/to/your/project/requirements.txt .
|
| 22 |
+
cp /path/to/your/project/packages.txt .
|
| 23 |
+
cp /path/to/your/project/README.md .
|
| 24 |
+
|
| 25 |
+
# رفع التغييرات
|
| 26 |
+
git add .
|
| 27 |
+
git commit -m "Add Arabic DOCX to PDF converter"
|
| 28 |
+
git push
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
3. **التحقق من النشر:**
|
| 32 |
+
- انتظر بناء الـ Space (5-10 دقائق)
|
| 33 |
+
- تحقق من السجلات للتأكد من تثبيت الخطوط العربية
|
| 34 |
+
- اختبر التحويل بملف عربي بسيط
|
| 35 |
+
|
| 36 |
+
#### المزايا:
|
| 37 |
+
- ✅ مجاني ومتاح 24/7
|
| 38 |
+
- ✅ تثبيت تلقائي للتبعيات
|
| 39 |
+
- ✅ واجهة ويب جاهزة
|
| 40 |
+
- ✅ مشاركة سهلة عبر الرابط
|
| 41 |
+
|
| 42 |
+
### 2. 🐳 Docker (للتشغيل المحلي)
|
| 43 |
+
|
| 44 |
+
#### الخطوات:
|
| 45 |
+
```bash
|
| 46 |
+
# بناء الصورة
|
| 47 |
+
docker build -t docx-pdf-arabic .
|
| 48 |
+
|
| 49 |
+
# تشغيل الحاوية
|
| 50 |
+
docker run -p 7860:7860 docx-pdf-arabic
|
| 51 |
+
|
| 52 |
+
# أو استخدام docker-compose
|
| 53 |
+
docker-compose up -d
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
#### المزايا:
|
| 57 |
+
- ✅ بيئة معزولة ومستقرة
|
| 58 |
+
- ✅ سهولة النشر على خوادم مختلفة
|
| 59 |
+
- ✅ تحكم كامل في البيئة
|
| 60 |
+
|
| 61 |
+
### 3. 🖥️ التشغيل المحلي المباشر
|
| 62 |
+
|
| 63 |
+
#### الخطوات:
|
| 64 |
+
```bash
|
| 65 |
+
# تثبيت التبعيات النظام (Ubuntu/Debian)
|
| 66 |
+
sudo apt-get update
|
| 67 |
+
sudo apt-get install libreoffice libreoffice-writer \
|
| 68 |
+
fonts-liberation fonts-dejavu fonts-noto fontconfig
|
| 69 |
+
|
| 70 |
+
# تثبيت التبعيات Python
|
| 71 |
+
pip install -r requirements.txt
|
| 72 |
+
|
| 73 |
+
# تشغيل التطبيق
|
| 74 |
+
python run_local.py
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
#### المزايا:
|
| 78 |
+
- ✅ أداء أسرع
|
| 79 |
+
- ✅ تحكم كامل في النظام
|
| 80 |
+
- ✅ سهولة التطوير والاختبار
|
| 81 |
+
|
| 82 |
+
## 🔧 إعدادات التحسين
|
| 83 |
+
|
| 84 |
+
### لـ Hugging Face Spaces:
|
| 85 |
+
|
| 86 |
+
1. **تحسين packages.txt:**
|
| 87 |
+
```
|
| 88 |
+
libreoffice
|
| 89 |
+
libreoffice-writer
|
| 90 |
+
libreoffice-l10n-ar
|
| 91 |
+
fonts-noto-naskh
|
| 92 |
+
fonts-amiri
|
| 93 |
+
fontconfig
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
2. **تحسين requirements.txt:**
|
| 97 |
+
```
|
| 98 |
+
gradio==4.20.0
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
3. **إعدادات README.md:**
|
| 102 |
+
- تأكد من وجود YAML frontmatter صحيح
|
| 103 |
+
- اضبط sdk_version على النسخة الصحيحة
|
| 104 |
+
|
| 105 |
+
### للخوادم المخصصة:
|
| 106 |
+
|
| 107 |
+
1. **تحسين الذاكرة:**
|
| 108 |
+
```bash
|
| 109 |
+
export JAVA_OPTS="-Xmx2g"
|
| 110 |
+
export SAL_DISABLE_OPENCL=1
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
2. **تحسين الخطوط:**
|
| 114 |
+
```bash
|
| 115 |
+
fc-cache -fv
|
| 116 |
+
fc-list | grep -i arabic
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
## 🧪 اختبار النشر
|
| 120 |
+
|
| 121 |
+
### 1. اختبار أساسي:
|
| 122 |
+
```bash
|
| 123 |
+
python test_conversion.py
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
### 2. اختبار الخطوط العربية:
|
| 127 |
+
```bash
|
| 128 |
+
fc-list | grep -i "amiri\|noto.*arabic"
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
### 3. اختبار LibreOffice:
|
| 132 |
+
```bash
|
| 133 |
+
libreoffice --headless --convert-to pdf test.docx
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
## 🔍 استكشاف أخطاء النشر
|
| 137 |
+
|
| 138 |
+
### مشكلة: LibreOffice لا يعمل
|
| 139 |
+
**الحل:**
|
| 140 |
+
```bash
|
| 141 |
+
# تحقق من التثبيت
|
| 142 |
+
libreoffice --version
|
| 143 |
+
|
| 144 |
+
# إعادة تثبيت
|
| 145 |
+
sudo apt-get remove --purge libreoffice*
|
| 146 |
+
sudo apt-get install libreoffice libreoffice-writer
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
### مشكلة: الخطوط العربية مفقودة
|
| 150 |
+
**الحل:**
|
| 151 |
+
```bash
|
| 152 |
+
# تثبيت خطوط إضافية
|
| 153 |
+
sudo apt-get install fonts-noto-naskh fonts-amiri
|
| 154 |
+
|
| 155 |
+
# تحديث cache
|
| 156 |
+
sudo fc-cache -fv
|
| 157 |
+
|
| 158 |
+
# التحقق
|
| 159 |
+
fc-list | grep -i arabic
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
### مشكلة: أخطاء الذاكرة
|
| 163 |
+
**الحل:**
|
| 164 |
+
```bash
|
| 165 |
+
# زيادة حد الذاكرة
|
| 166 |
+
export JAVA_OPTS="-Xmx4g"
|
| 167 |
+
|
| 168 |
+
# تعطيل OpenCL
|
| 169 |
+
export SAL_DISABLE_OPENCL=1
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
### مشكلة: بطء التحويل
|
| 173 |
+
**الحل:**
|
| 174 |
+
- قلل حجم الملفات المدخلة
|
| 175 |
+
- استخدم خادم بمواصفات أعلى
|
| 176 |
+
- فعل التخزين المؤقت
|
| 177 |
+
|
| 178 |
+
## 📊 مراقبة الأداء
|
| 179 |
+
|
| 180 |
+
### مؤشرات مهمة:
|
| 181 |
+
- وقت التحويل (يجب أن يكون < 30 ثانية للملفات العادية)
|
| 182 |
+
- استخدام الذاكرة (يجب أن يكون < 2GB)
|
| 183 |
+
- معدل نجاح التحويل (يجب أن يكون > 95%)
|
| 184 |
+
|
| 185 |
+
### أدوات المراقبة:
|
| 186 |
+
```bash
|
| 187 |
+
# مراقبة الذاكرة
|
| 188 |
+
htop
|
| 189 |
+
|
| 190 |
+
# مراقبة العمليات
|
| 191 |
+
ps aux | grep libreoffice
|
| 192 |
+
|
| 193 |
+
# مراقبة السجلات
|
| 194 |
+
tail -f /var/log/syslog
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
## 🔒 الأمان
|
| 198 |
+
|
| 199 |
+
### إعدادات الأمان:
|
| 200 |
+
1. تحديد حجم الملفات المرفوعة (< 50MB)
|
| 201 |
+
2. تنظيف الملفات المؤقتة تلقائياً
|
| 202 |
+
3. تحديد وقت انتهاء للعمليات (timeout)
|
| 203 |
+
4. منع تنفيذ الكود الض��ر في الملفات
|
| 204 |
+
|
| 205 |
+
### أفضل الممارسات:
|
| 206 |
+
- استخدم HTTPS دائماً
|
| 207 |
+
- فعل rate limiting
|
| 208 |
+
- راقب استخدام الموارد
|
| 209 |
+
- احتفظ بنسخ احتياطية من الإعدادات
|
| 210 |
+
|
| 211 |
+
## 📞 الدعم
|
| 212 |
+
|
| 213 |
+
إذا واجهت مشاكل في النشر:
|
| 214 |
+
1. تحقق من السجلات أولاً
|
| 215 |
+
2. تأكد من تثبيت جميع التبعيات
|
| 216 |
+
3. اختبر على بيئة محلية أولاً
|
| 217 |
+
4. راجع دليل استكشاف الأخطاء أعلاه
|
Dockerfile
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dockerfile for DOCX to PDF Converter with Enhanced Arabic Support
|
| 2 |
+
FROM ubuntu:22.04
|
| 3 |
+
|
| 4 |
+
# Set environment variables for Arabic support
|
| 5 |
+
ENV DEBIAN_FRONTEND=noninteractive
|
| 6 |
+
ENV LANG=ar_SA.UTF-8
|
| 7 |
+
ENV LC_ALL=ar_SA.UTF-8
|
| 8 |
+
ENV PYTHONUNBUFFERED=1
|
| 9 |
+
|
| 10 |
+
# Install system dependencies including Arabic fonts
|
| 11 |
+
RUN apt-get update && apt-get install -y \
|
| 12 |
+
python3 \
|
| 13 |
+
python3-pip \
|
| 14 |
+
libreoffice \
|
| 15 |
+
libreoffice-writer \
|
| 16 |
+
libreoffice-l10n-ar \
|
| 17 |
+
libreoffice-help-ar \
|
| 18 |
+
fonts-liberation \
|
| 19 |
+
fonts-liberation2 \
|
| 20 |
+
fonts-dejavu \
|
| 21 |
+
fonts-dejavu-core \
|
| 22 |
+
fonts-dejavu-extra \
|
| 23 |
+
fonts-croscore \
|
| 24 |
+
fonts-noto-core \
|
| 25 |
+
fonts-noto-ui-core \
|
| 26 |
+
fonts-noto-mono \
|
| 27 |
+
fonts-noto-color-emoji \
|
| 28 |
+
fonts-noto-naskh \
|
| 29 |
+
fonts-noto-kufi-arabic \
|
| 30 |
+
fonts-opensymbol \
|
| 31 |
+
fonts-freefont-ttf \
|
| 32 |
+
fonts-amiri \
|
| 33 |
+
fonts-scheherazade-new \
|
| 34 |
+
fontconfig \
|
| 35 |
+
wget \
|
| 36 |
+
curl \
|
| 37 |
+
unzip \
|
| 38 |
+
locales \
|
| 39 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 40 |
+
|
| 41 |
+
# Generate Arabic locale
|
| 42 |
+
RUN locale-gen ar_SA.UTF-8
|
| 43 |
+
|
| 44 |
+
# Set working directory
|
| 45 |
+
WORKDIR /app
|
| 46 |
+
|
| 47 |
+
# Copy requirements and install Python dependencies
|
| 48 |
+
COPY requirements.txt .
|
| 49 |
+
RUN pip3 install --no-cache-dir -r requirements.txt
|
| 50 |
+
|
| 51 |
+
# Copy application files
|
| 52 |
+
COPY app.py .
|
| 53 |
+
COPY arabic_fonts_setup.sh .
|
| 54 |
+
COPY libreoffice_arabic_config.xml .
|
| 55 |
+
|
| 56 |
+
# Setup additional Arabic fonts
|
| 57 |
+
RUN chmod +x arabic_fonts_setup.sh && \
|
| 58 |
+
./arabic_fonts_setup.sh || true
|
| 59 |
+
|
| 60 |
+
# Update font cache
|
| 61 |
+
RUN fc-cache -fv
|
| 62 |
+
|
| 63 |
+
# Create necessary directories
|
| 64 |
+
RUN mkdir -p /tmp/libreoffice_conversion
|
| 65 |
+
|
| 66 |
+
# Expose port
|
| 67 |
+
EXPOSE 7860
|
| 68 |
+
|
| 69 |
+
# Health check
|
| 70 |
+
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
| 71 |
+
CMD curl -f http://localhost:7860/ || exit 1
|
| 72 |
+
|
| 73 |
+
# Run the application
|
| 74 |
+
CMD ["python3", "app.py"]
|
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: DOCX
|
| 3 |
emoji: 📄
|
| 4 |
colorFrom: gray
|
| 5 |
colorTo: blue
|
|
@@ -9,27 +9,37 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# 📄➡️📋 DOCX
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
## 🎯
|
| 17 |
|
| 18 |
-
|
| 19 |
-
- **
|
| 20 |
-
- **
|
| 21 |
-
- **
|
| 22 |
-
- **
|
| 23 |
|
| 24 |
-
## ✨
|
| 25 |
|
| 26 |
-
- **🔤
|
| 27 |
-
- **📊
|
| 28 |
-
- **🖼️
|
| 29 |
-
- **🌍
|
| 30 |
-
- **🔍
|
| 31 |
-
- **🛠️
|
| 32 |
-
- **⚡
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## 🚀 Usage
|
| 35 |
|
|
|
|
| 1 |
---
|
| 2 |
+
title: محول DOCX إلى PDF - تنسيق عربي مثالي
|
| 3 |
emoji: 📄
|
| 4 |
colorFrom: gray
|
| 5 |
colorTo: blue
|
|
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# 📄➡️📋 محول DOCX إلى PDF - الحفاظ الكامل على التنسيق العربي
|
| 13 |
|
| 14 |
+
محول متقدم من الدرجة المؤسسية يحول مستندات Word (.docx) إلى PDF مع **الحفاظ الكامل 100% على التنسيق العربي** - نتائج مطابقة بكسل بكسل لا يمكن تمييزها عن الأصل.
|
| 15 |
|
| 16 |
+
## 🎯 الأولوية المطلقة: عدم التسامح مع تغييرات التخطيط
|
| 17 |
|
| 18 |
+
يطبق هذا المحول أعلى معايير الحفاظ على التنسيق العربي:
|
| 19 |
+
- **الاستنساخ البصري الدقيق**: ملف PDF الناتج مطابق بكسل بكسل للـ DOCX الأصلي
|
| 20 |
+
- **الحفاظ على عدد الصفحات**: 1 صفحة DOCX = 1 صفحة PDF (بالضبط)
|
| 21 |
+
- **تميز تنسيق الجداول**: النص داخل الجداول يحافظ على الحجم الأصلي، بدون تغييرات تلقائية في الخط العريض
|
| 22 |
+
- **إتقان التعامل مع الخطوط**: استبدال شامل للخطوط متوافق مع Microsoft والعربية
|
| 23 |
|
| 24 |
+
## ✨ الميزات المحسنة للعربية
|
| 25 |
|
| 26 |
+
- **🔤 تميز الخطوط**: توافق كامل مع الخطوط العربية (Traditional Arabic→Amiri، Arabic Typesetting→Noto Naskh، Simplified Arabic→Noto Naskh)
|
| 27 |
+
- **📊 كمال الجداول**: يحافظ على المساحة الدقيقة للخلايا والحدود والمحاذاة وتنسيق النص
|
| 28 |
+
- **🖼️ أقصى جودة للصور**: الحفاظ على 600 DPI بدون ضغط مدمر
|
| 29 |
+
- **🌍 دعم العربية RTL**: عرض مثالي للنص من اليمين إلى اليسار مع خطوط Amiri و Noto
|
| 30 |
+
- **🔍 التحقق من الجودة**: تحليل فوري للمستند والتحقق من التحويل
|
| 31 |
+
- **🛠️ تشخيص متقدم**: تحليل شامل للأخطاء مع إرشادات استكشاف الأخطاء المحددة
|
| 32 |
+
- **⚡ أداء محسن**: تكوين LibreOffice محسن للمستندات المعقدة العربية
|
| 33 |
+
|
| 34 |
+
## 🛠️ حلول المشاكل الشائعة
|
| 35 |
+
|
| 36 |
+
✅ **تم حل المشاكل التالية:**
|
| 37 |
+
- ❌ تراكب النصوص العربية وعدم وجود فراغات كافية
|
| 38 |
+
- ❌ فقدان المحاذاة اليمنى (Right-to-Left) في النص العربي
|
| 39 |
+
- ❌ استبدال الخطوط الأصلية بخطوط غير داعمة للعربية
|
| 40 |
+
- ❌ تشوه الجداول أو اختفاء البنية التنظيمية للوثيقة
|
| 41 |
+
- ❌ تغيير مواقع قوالب التعبئة الديناميكية (مثل {{name}}, {{date}})
|
| 42 |
+
- ❌ حجم الصفحة أو الهامش غير مناسب للطباعة بشكل مرتب (A4)
|
| 43 |
|
| 44 |
## 🚀 Usage
|
| 45 |
|
app.py
CHANGED
|
@@ -12,7 +12,6 @@ import os
|
|
| 12 |
from pathlib import Path
|
| 13 |
import gradio as gr
|
| 14 |
import zipfile
|
| 15 |
-
import xml.etree.ElementTree as ET
|
| 16 |
import re
|
| 17 |
|
| 18 |
|
|
@@ -40,8 +39,11 @@ def setup_libreoffice():
|
|
| 40 |
|
| 41 |
|
| 42 |
def setup_font_environment():
|
| 43 |
-
"""Setup optimal font environment for maximum compatibility"""
|
| 44 |
try:
|
|
|
|
|
|
|
|
|
|
| 45 |
# Update font cache for better font discovery (fc-cache comes with fontconfig package)
|
| 46 |
print("Updating font cache...")
|
| 47 |
fc_result = subprocess.run(["fc-cache", "-fv"], capture_output=True, timeout=30)
|
|
@@ -54,8 +56,9 @@ def setup_font_environment():
|
|
| 54 |
font_result = subprocess.run(["fc-list"], capture_output=True, text=True, timeout=10)
|
| 55 |
available_fonts = font_result.stdout
|
| 56 |
|
| 57 |
-
# Check for critical fonts
|
| 58 |
-
critical_fonts = ["Liberation Sans", "Carlito", "Caladea", "DejaVu Sans", "Noto Sans"
|
|
|
|
| 59 |
missing_fonts = []
|
| 60 |
|
| 61 |
for font in critical_fonts:
|
|
@@ -65,7 +68,12 @@ def setup_font_environment():
|
|
| 65 |
if missing_fonts:
|
| 66 |
print(f"Warning: Missing critical fonts: {missing_fonts}")
|
| 67 |
else:
|
| 68 |
-
print("All critical fonts are available")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
print(f"Total fonts available: {len(available_fonts.splitlines())}")
|
| 71 |
|
|
@@ -73,8 +81,38 @@ def setup_font_environment():
|
|
| 73 |
print(f"Font environment setup warning: {e}")
|
| 74 |
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
def create_fontconfig(temp_path):
|
| 77 |
-
"""Create fontconfig configuration for optimal font matching"""
|
| 78 |
fontconfig_dir = temp_path / ".config" / "fontconfig"
|
| 79 |
fontconfig_dir.mkdir(parents=True, exist_ok=True)
|
| 80 |
|
|
@@ -128,7 +166,75 @@ def create_fontconfig(temp_path):
|
|
| 128 |
</prefer>
|
| 129 |
</alias>
|
| 130 |
|
| 131 |
-
<!--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
<match target="font">
|
| 133 |
<edit name="antialias" mode="assign">
|
| 134 |
<bool>true</bool>
|
|
@@ -142,6 +248,31 @@ def create_fontconfig(temp_path):
|
|
| 142 |
<edit name="rgba" mode="assign">
|
| 143 |
<const>rgb</const>
|
| 144 |
</edit>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
</match>
|
| 146 |
</fontconfig>'''
|
| 147 |
|
|
@@ -318,7 +449,7 @@ def analyze_conversion_error(stderr, stdout, docx_info):
|
|
| 318 |
|
| 319 |
|
| 320 |
def create_libreoffice_config(temp_path):
|
| 321 |
-
"""Create comprehensive LibreOffice configuration for PERFECT formatting preservation"""
|
| 322 |
config_dir = temp_path / ".config" / "libreoffice" / "4" / "user"
|
| 323 |
config_dir.mkdir(parents=True, exist_ok=True)
|
| 324 |
|
|
@@ -326,7 +457,7 @@ def create_libreoffice_config(temp_path):
|
|
| 326 |
registry_config = config_dir / "registrymodifications.xcu"
|
| 327 |
config_content = '''<?xml version="1.0" encoding="UTF-8"?>
|
| 328 |
<oor:items xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
|
| 329 |
-
<!-- PDF Export Settings for Maximum Quality -->
|
| 330 |
<item oor:path="/org.openoffice.Office.Common/Filter/PDF/Export">
|
| 331 |
<prop oor:name="Quality" oor:op="fuse">
|
| 332 |
<value>100</value>
|
|
@@ -338,7 +469,7 @@ def create_libreoffice_config(temp_path):
|
|
| 338 |
<value>600</value>
|
| 339 |
</prop>
|
| 340 |
<prop oor:name="UseTaggedPDF" oor:op="fuse">
|
| 341 |
-
<value>
|
| 342 |
</prop>
|
| 343 |
<prop oor:name="ExportFormFields" oor:op="fuse">
|
| 344 |
<value>false</value>
|
|
@@ -361,9 +492,47 @@ def create_libreoffice_config(temp_path):
|
|
| 361 |
<prop oor:name="JPEGQuality" oor:op="fuse">
|
| 362 |
<value>100</value>
|
| 363 |
</prop>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 364 |
</item>
|
| 365 |
|
| 366 |
-
<!-- Font Substitution Settings for Microsoft Compatibility -->
|
| 367 |
<item oor:path="/org.openoffice.VCL/FontSubstitution">
|
| 368 |
<prop oor:name="FontSubstituteTable" oor:op="fuse">
|
| 369 |
<value>
|
|
@@ -407,11 +576,43 @@ def create_libreoffice_config(temp_path):
|
|
| 407 |
<value>Courier New</value>
|
| 408 |
</prop>
|
| 409 |
</it>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 410 |
</value>
|
| 411 |
</prop>
|
| 412 |
</item>
|
| 413 |
|
| 414 |
-
<!-- Writer Settings for Layout Preservation -->
|
| 415 |
<item oor:path="/org.openoffice.Office.Writer/Layout/Other">
|
| 416 |
<prop oor:name="MeasureUnit" oor:op="fuse">
|
| 417 |
<value>6</value>
|
|
@@ -425,9 +626,12 @@ def create_libreoffice_config(temp_path):
|
|
| 425 |
<prop oor:name="ApplyCharUnit" oor:op="fuse">
|
| 426 |
<value>false</value>
|
| 427 |
</prop>
|
|
|
|
|
|
|
|
|
|
| 428 |
</item>
|
| 429 |
|
| 430 |
-
<!-- Table Settings for Exact Formatting -->
|
| 431 |
<item oor:path="/org.openoffice.Office.Writer/Layout/Table">
|
| 432 |
<prop oor:name="Header" oor:op="fuse">
|
| 433 |
<value>true</value>
|
|
@@ -436,11 +640,52 @@ def create_libreoffice_config(temp_path):
|
|
| 436 |
<value>false</value>
|
| 437 |
</prop>
|
| 438 |
<prop oor:name="DontSplit" oor:op="fuse">
|
| 439 |
-
<value>
|
| 440 |
</prop>
|
| 441 |
<prop oor:name="Border" oor:op="fuse">
|
| 442 |
<value>true</value>
|
| 443 |
</prop>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 444 |
</item>
|
| 445 |
|
| 446 |
<!-- Disable Auto-formatting Features -->
|
|
@@ -514,9 +759,9 @@ def convert_docx_to_pdf(docx_file):
|
|
| 514 |
input_file = temp_path / "input.docx"
|
| 515 |
shutil.copy2(docx_file.name, input_file)
|
| 516 |
|
| 517 |
-
# LibreOffice conversion command with MAXIMUM formatting preservation
|
| 518 |
-
#
|
| 519 |
-
pdf_filter = 'pdf:writer_pdf_Export:{"Quality":100,"ReduceImageResolution":false,"MaxImageResolution":600,"UseTaggedPDF":
|
| 520 |
|
| 521 |
cmd = [
|
| 522 |
"libreoffice",
|
|
@@ -527,21 +772,43 @@ def convert_docx_to_pdf(docx_file):
|
|
| 527 |
"--nologo",
|
| 528 |
"--norestore",
|
| 529 |
"--nofirststartwizard",
|
|
|
|
| 530 |
"--convert-to", pdf_filter,
|
| 531 |
"--outdir", str(temp_path),
|
| 532 |
str(input_file)
|
| 533 |
]
|
| 534 |
|
| 535 |
-
# Execute conversion with comprehensive custom environment
|
| 536 |
env = os.environ.copy()
|
| 537 |
env['HOME'] = config_home
|
| 538 |
env['XDG_CONFIG_HOME'] = config_home + "/.config"
|
| 539 |
env['FONTCONFIG_PATH'] = fontconfig_home + "/.config/fontconfig"
|
| 540 |
-
env['
|
| 541 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 542 |
# Disable LibreOffice splash and user interaction
|
| 543 |
env['SAL_USE_VCLPLUGIN'] = 'svp'
|
| 544 |
env['DISPLAY'] = ':99'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 545 |
|
| 546 |
print(f"🚀 Executing LibreOffice conversion with MAXIMUM quality settings...")
|
| 547 |
print(f"Command: {' '.join(cmd[:8])}... [truncated for readability]")
|
|
@@ -717,33 +984,41 @@ def create_interface():
|
|
| 717 |
gr.File(label="📥 Download PDF"),
|
| 718 |
gr.Textbox(label="📊 Status", interactive=False)
|
| 719 |
],
|
| 720 |
-
title="📄➡️📋 DOCX
|
| 721 |
description="""
|
| 722 |
-
**
|
| 723 |
-
|
| 724 |
-
🎯 **
|
| 725 |
-
-
|
| 726 |
-
-
|
| 727 |
-
-
|
| 728 |
-
-
|
| 729 |
-
|
| 730 |
-
✅ **
|
| 731 |
-
- 🔤 **
|
| 732 |
-
- 📊 **
|
| 733 |
-
- 🖼️ **
|
| 734 |
-
- 🌍 **
|
| 735 |
-
- 🔍 **
|
| 736 |
-
- 🛠️ **
|
| 737 |
-
|
| 738 |
-
📝 **
|
| 739 |
-
1.
|
| 740 |
-
2.
|
| 741 |
-
3.
|
| 742 |
-
|
| 743 |
-
⚠️ **
|
| 744 |
-
📏 **
|
| 745 |
-
|
| 746 |
-
🔧 **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 747 |
""",
|
| 748 |
examples=None,
|
| 749 |
cache_examples=False,
|
|
|
|
| 12 |
from pathlib import Path
|
| 13 |
import gradio as gr
|
| 14 |
import zipfile
|
|
|
|
| 15 |
import re
|
| 16 |
|
| 17 |
|
|
|
|
| 39 |
|
| 40 |
|
| 41 |
def setup_font_environment():
|
| 42 |
+
"""Setup optimal font environment for Arabic RTL and maximum compatibility"""
|
| 43 |
try:
|
| 44 |
+
# Install additional Arabic fonts if not available
|
| 45 |
+
install_arabic_fonts()
|
| 46 |
+
|
| 47 |
# Update font cache for better font discovery (fc-cache comes with fontconfig package)
|
| 48 |
print("Updating font cache...")
|
| 49 |
fc_result = subprocess.run(["fc-cache", "-fv"], capture_output=True, timeout=30)
|
|
|
|
| 56 |
font_result = subprocess.run(["fc-list"], capture_output=True, text=True, timeout=10)
|
| 57 |
available_fonts = font_result.stdout
|
| 58 |
|
| 59 |
+
# Check for critical fonts including Arabic fonts
|
| 60 |
+
critical_fonts = ["Liberation Sans", "Carlito", "Caladea", "DejaVu Sans", "Noto Sans",
|
| 61 |
+
"Noto Naskh Arabic", "Noto Kufi Arabic", "Amiri", "Scheherazade New"]
|
| 62 |
missing_fonts = []
|
| 63 |
|
| 64 |
for font in critical_fonts:
|
|
|
|
| 68 |
if missing_fonts:
|
| 69 |
print(f"Warning: Missing critical fonts: {missing_fonts}")
|
| 70 |
else:
|
| 71 |
+
print("All critical fonts including Arabic fonts are available")
|
| 72 |
+
|
| 73 |
+
# Check specifically for Arabic font support
|
| 74 |
+
arabic_fonts = ["Noto Naskh Arabic", "Noto Kufi Arabic", "Amiri", "Scheherazade New", "Traditional Arabic"]
|
| 75 |
+
available_arabic = [font for font in arabic_fonts if font.lower() in available_fonts.lower()]
|
| 76 |
+
print(f"Available Arabic fonts: {available_arabic}")
|
| 77 |
|
| 78 |
print(f"Total fonts available: {len(available_fonts.splitlines())}")
|
| 79 |
|
|
|
|
| 81 |
print(f"Font environment setup warning: {e}")
|
| 82 |
|
| 83 |
|
| 84 |
+
def install_arabic_fonts():
|
| 85 |
+
"""Install additional Arabic fonts for better RTL support"""
|
| 86 |
+
try:
|
| 87 |
+
# Create fonts directory
|
| 88 |
+
fonts_dir = Path("/usr/share/fonts/truetype/arabic-custom")
|
| 89 |
+
fonts_dir.mkdir(parents=True, exist_ok=True)
|
| 90 |
+
|
| 91 |
+
# Download and install Amiri font if not available
|
| 92 |
+
amiri_url = "https://github.com/aliftype/amiri/releases/download/0.117/Amiri-0.117.zip"
|
| 93 |
+
try:
|
| 94 |
+
print("Downloading Amiri Arabic font...")
|
| 95 |
+
result = subprocess.run(["wget", "-q", "-O", "/tmp/amiri.zip", amiri_url],
|
| 96 |
+
capture_output=True, timeout=30)
|
| 97 |
+
if result.returncode == 0:
|
| 98 |
+
# Extract and install
|
| 99 |
+
subprocess.run(["unzip", "-q", "/tmp/amiri.zip", "-d", "/tmp/amiri"],
|
| 100 |
+
capture_output=True)
|
| 101 |
+
subprocess.run(["cp", "/tmp/amiri/*.ttf", str(fonts_dir)],
|
| 102 |
+
capture_output=True, shell=True)
|
| 103 |
+
print("Amiri font installed successfully")
|
| 104 |
+
except Exception as e:
|
| 105 |
+
print(f"Amiri font installation failed: {e}")
|
| 106 |
+
|
| 107 |
+
# Update font cache after installation
|
| 108 |
+
subprocess.run(["fc-cache", "-f"], capture_output=True, timeout=10)
|
| 109 |
+
|
| 110 |
+
except Exception as e:
|
| 111 |
+
print(f"Arabic fonts installation warning: {e}")
|
| 112 |
+
|
| 113 |
+
|
| 114 |
def create_fontconfig(temp_path):
|
| 115 |
+
"""Create fontconfig configuration for optimal font matching with Arabic RTL support"""
|
| 116 |
fontconfig_dir = temp_path / ".config" / "fontconfig"
|
| 117 |
fontconfig_dir.mkdir(parents=True, exist_ok=True)
|
| 118 |
|
|
|
|
| 166 |
</prefer>
|
| 167 |
</alias>
|
| 168 |
|
| 169 |
+
<!-- Arabic font substitution rules for perfect RTL support -->
|
| 170 |
+
<alias>
|
| 171 |
+
<family>Traditional Arabic</family>
|
| 172 |
+
<prefer>
|
| 173 |
+
<family>Amiri</family>
|
| 174 |
+
<family>Noto Naskh Arabic</family>
|
| 175 |
+
<family>Scheherazade New</family>
|
| 176 |
+
<family>DejaVu Sans</family>
|
| 177 |
+
</prefer>
|
| 178 |
+
</alias>
|
| 179 |
+
|
| 180 |
+
<alias>
|
| 181 |
+
<family>Arabic Typesetting</family>
|
| 182 |
+
<prefer>
|
| 183 |
+
<family>Amiri</family>
|
| 184 |
+
<family>Noto Naskh Arabic</family>
|
| 185 |
+
<family>Scheherazade New</family>
|
| 186 |
+
</prefer>
|
| 187 |
+
</alias>
|
| 188 |
+
|
| 189 |
+
<alias>
|
| 190 |
+
<family>Simplified Arabic</family>
|
| 191 |
+
<prefer>
|
| 192 |
+
<family>Noto Naskh Arabic</family>
|
| 193 |
+
<family>Amiri</family>
|
| 194 |
+
<family>DejaVu Sans</family>
|
| 195 |
+
</prefer>
|
| 196 |
+
</alias>
|
| 197 |
+
|
| 198 |
+
<alias>
|
| 199 |
+
<family>Tahoma</family>
|
| 200 |
+
<prefer>
|
| 201 |
+
<family>DejaVu Sans</family>
|
| 202 |
+
<family>Liberation Sans</family>
|
| 203 |
+
<family>Noto Sans</family>
|
| 204 |
+
</prefer>
|
| 205 |
+
</alias>
|
| 206 |
+
|
| 207 |
+
<!-- Generic Arabic font fallback -->
|
| 208 |
+
<alias>
|
| 209 |
+
<family>serif</family>
|
| 210 |
+
<prefer>
|
| 211 |
+
<family>Liberation Serif</family>
|
| 212 |
+
<family>DejaVu Serif</family>
|
| 213 |
+
<family>Amiri</family>
|
| 214 |
+
<family>Noto Naskh Arabic</family>
|
| 215 |
+
</prefer>
|
| 216 |
+
</alias>
|
| 217 |
+
|
| 218 |
+
<alias>
|
| 219 |
+
<family>sans-serif</family>
|
| 220 |
+
<prefer>
|
| 221 |
+
<family>Liberation Sans</family>
|
| 222 |
+
<family>DejaVu Sans</family>
|
| 223 |
+
<family>Noto Sans</family>
|
| 224 |
+
<family>Noto Naskh Arabic</family>
|
| 225 |
+
</prefer>
|
| 226 |
+
</alias>
|
| 227 |
+
|
| 228 |
+
<alias>
|
| 229 |
+
<family>monospace</family>
|
| 230 |
+
<prefer>
|
| 231 |
+
<family>Liberation Mono</family>
|
| 232 |
+
<family>DejaVu Sans Mono</family>
|
| 233 |
+
<family>Noto Sans Mono</family>
|
| 234 |
+
</prefer>
|
| 235 |
+
</alias>
|
| 236 |
+
|
| 237 |
+
<!-- Ensure consistent font rendering with Arabic support -->
|
| 238 |
<match target="font">
|
| 239 |
<edit name="antialias" mode="assign">
|
| 240 |
<bool>true</bool>
|
|
|
|
| 248 |
<edit name="rgba" mode="assign">
|
| 249 |
<const>rgb</const>
|
| 250 |
</edit>
|
| 251 |
+
<edit name="lcdfilter" mode="assign">
|
| 252 |
+
<const>lcddefault</const>
|
| 253 |
+
</edit>
|
| 254 |
+
</match>
|
| 255 |
+
|
| 256 |
+
<!-- Special handling for Arabic script -->
|
| 257 |
+
<match target="pattern">
|
| 258 |
+
<test name="lang" compare="contains">
|
| 259 |
+
<string>ar</string>
|
| 260 |
+
</test>
|
| 261 |
+
<edit name="family" mode="prepend" binding="strong">
|
| 262 |
+
<string>Amiri</string>
|
| 263 |
+
<string>Noto Naskh Arabic</string>
|
| 264 |
+
<string>Scheherazade New</string>
|
| 265 |
+
</edit>
|
| 266 |
+
</match>
|
| 267 |
+
|
| 268 |
+
<!-- Ensure proper spacing and kerning for Arabic -->
|
| 269 |
+
<match target="font">
|
| 270 |
+
<test name="family" compare="contains">
|
| 271 |
+
<string>Arabic</string>
|
| 272 |
+
</test>
|
| 273 |
+
<edit name="spacing" mode="assign">
|
| 274 |
+
<const>proportional</const>
|
| 275 |
+
</edit>
|
| 276 |
</match>
|
| 277 |
</fontconfig>'''
|
| 278 |
|
|
|
|
| 449 |
|
| 450 |
|
| 451 |
def create_libreoffice_config(temp_path):
|
| 452 |
+
"""Create comprehensive LibreOffice configuration for PERFECT Arabic RTL formatting preservation"""
|
| 453 |
config_dir = temp_path / ".config" / "libreoffice" / "4" / "user"
|
| 454 |
config_dir.mkdir(parents=True, exist_ok=True)
|
| 455 |
|
|
|
|
| 457 |
registry_config = config_dir / "registrymodifications.xcu"
|
| 458 |
config_content = '''<?xml version="1.0" encoding="UTF-8"?>
|
| 459 |
<oor:items xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
|
| 460 |
+
<!-- PDF Export Settings for Maximum Quality with Arabic Support -->
|
| 461 |
<item oor:path="/org.openoffice.Office.Common/Filter/PDF/Export">
|
| 462 |
<prop oor:name="Quality" oor:op="fuse">
|
| 463 |
<value>100</value>
|
|
|
|
| 469 |
<value>600</value>
|
| 470 |
</prop>
|
| 471 |
<prop oor:name="UseTaggedPDF" oor:op="fuse">
|
| 472 |
+
<value>true</value>
|
| 473 |
</prop>
|
| 474 |
<prop oor:name="ExportFormFields" oor:op="fuse">
|
| 475 |
<value>false</value>
|
|
|
|
| 492 |
<prop oor:name="JPEGQuality" oor:op="fuse">
|
| 493 |
<value>100</value>
|
| 494 |
</prop>
|
| 495 |
+
<prop oor:name="SelectPdfVersion" oor:op="fuse">
|
| 496 |
+
<value>1</value>
|
| 497 |
+
</prop>
|
| 498 |
+
<prop oor:name="ExportBookmarks" oor:op="fuse">
|
| 499 |
+
<value>false</value>
|
| 500 |
+
</prop>
|
| 501 |
+
<prop oor:name="OpenBookmarkLevels" oor:op="fuse">
|
| 502 |
+
<value>-1</value>
|
| 503 |
+
</prop>
|
| 504 |
+
</item>
|
| 505 |
+
|
| 506 |
+
<!-- Arabic and RTL Language Support -->
|
| 507 |
+
<item oor:path="/org.openoffice.Office.Linguistic/General">
|
| 508 |
+
<prop oor:name="DefaultLocale" oor:op="fuse">
|
| 509 |
+
<value>ar-SA</value>
|
| 510 |
+
</prop>
|
| 511 |
+
<prop oor:name="DefaultLocale_CJK" oor:op="fuse">
|
| 512 |
+
<value>ar-SA</value>
|
| 513 |
+
</prop>
|
| 514 |
+
<prop oor:name="DefaultLocale_CTL" oor:op="fuse">
|
| 515 |
+
<value>ar-SA</value>
|
| 516 |
+
</prop>
|
| 517 |
+
</item>
|
| 518 |
+
|
| 519 |
+
<!-- CTL (Complex Text Layout) Settings for Arabic -->
|
| 520 |
+
<item oor:path="/org.openoffice.Office.Common/I18N/CTL">
|
| 521 |
+
<prop oor:name="CTLFont" oor:op="fuse">
|
| 522 |
+
<value>true</value>
|
| 523 |
+
</prop>
|
| 524 |
+
<prop oor:name="CTLSequenceChecking" oor:op="fuse">
|
| 525 |
+
<value>true</value>
|
| 526 |
+
</prop>
|
| 527 |
+
<prop oor:name="CTLCursorMovement" oor:op="fuse">
|
| 528 |
+
<value>1</value>
|
| 529 |
+
</prop>
|
| 530 |
+
<prop oor:name="CTLTextNumerals" oor:op="fuse">
|
| 531 |
+
<value>1</value>
|
| 532 |
+
</prop>
|
| 533 |
</item>
|
| 534 |
|
| 535 |
+
<!-- Enhanced Font Substitution Settings for Microsoft and Arabic Compatibility -->
|
| 536 |
<item oor:path="/org.openoffice.VCL/FontSubstitution">
|
| 537 |
<prop oor:name="FontSubstituteTable" oor:op="fuse">
|
| 538 |
<value>
|
|
|
|
| 576 |
<value>Courier New</value>
|
| 577 |
</prop>
|
| 578 |
</it>
|
| 579 |
+
<it>
|
| 580 |
+
<prop oor:name="SubstituteFont">
|
| 581 |
+
<value>Amiri</value>
|
| 582 |
+
</prop>
|
| 583 |
+
<prop oor:name="OriginalFont">
|
| 584 |
+
<value>Traditional Arabic</value>
|
| 585 |
+
</prop>
|
| 586 |
+
</it>
|
| 587 |
+
<it>
|
| 588 |
+
<prop oor:name="SubstituteFont">
|
| 589 |
+
<value>Amiri</value>
|
| 590 |
+
</prop>
|
| 591 |
+
<prop oor:name="OriginalFont">
|
| 592 |
+
<value>Arabic Typesetting</value>
|
| 593 |
+
</prop>
|
| 594 |
+
</it>
|
| 595 |
+
<it>
|
| 596 |
+
<prop oor:name="SubstituteFont">
|
| 597 |
+
<value>Noto Naskh Arabic</value>
|
| 598 |
+
</prop>
|
| 599 |
+
<prop oor:name="OriginalFont">
|
| 600 |
+
<value>Simplified Arabic</value>
|
| 601 |
+
</prop>
|
| 602 |
+
</it>
|
| 603 |
+
<it>
|
| 604 |
+
<prop oor:name="SubstituteFont">
|
| 605 |
+
<value>DejaVu Sans</value>
|
| 606 |
+
</prop>
|
| 607 |
+
<prop oor:name="OriginalFont">
|
| 608 |
+
<value>Tahoma</value>
|
| 609 |
+
</prop>
|
| 610 |
+
</it>
|
| 611 |
</value>
|
| 612 |
</prop>
|
| 613 |
</item>
|
| 614 |
|
| 615 |
+
<!-- Writer Settings for Perfect Layout Preservation with RTL Support -->
|
| 616 |
<item oor:path="/org.openoffice.Office.Writer/Layout/Other">
|
| 617 |
<prop oor:name="MeasureUnit" oor:op="fuse">
|
| 618 |
<value>6</value>
|
|
|
|
| 626 |
<prop oor:name="ApplyCharUnit" oor:op="fuse">
|
| 627 |
<value>false</value>
|
| 628 |
</prop>
|
| 629 |
+
<prop oor:name="IsAlignTabStopPosition" oor:op="fuse">
|
| 630 |
+
<value>true</value>
|
| 631 |
+
</prop>
|
| 632 |
</item>
|
| 633 |
|
| 634 |
+
<!-- Enhanced Table Settings for Exact Formatting -->
|
| 635 |
<item oor:path="/org.openoffice.Office.Writer/Layout/Table">
|
| 636 |
<prop oor:name="Header" oor:op="fuse">
|
| 637 |
<value>true</value>
|
|
|
|
| 640 |
<value>false</value>
|
| 641 |
</prop>
|
| 642 |
<prop oor:name="DontSplit" oor:op="fuse">
|
| 643 |
+
<value>true</value>
|
| 644 |
</prop>
|
| 645 |
<prop oor:name="Border" oor:op="fuse">
|
| 646 |
<value>true</value>
|
| 647 |
</prop>
|
| 648 |
+
<prop oor:name="InsertLabel" oor:op="fuse">
|
| 649 |
+
<value>false</value>
|
| 650 |
+
</prop>
|
| 651 |
+
</item>
|
| 652 |
+
|
| 653 |
+
<!-- Page Layout Settings for A4 and RTL -->
|
| 654 |
+
<item oor:path="/org.openoffice.Office.Writer/Layout/Page">
|
| 655 |
+
<prop oor:name="IsLandscape" oor:op="fuse">
|
| 656 |
+
<value>false</value>
|
| 657 |
+
</prop>
|
| 658 |
+
<prop oor:name="Width" oor:op="fuse">
|
| 659 |
+
<value>21000</value>
|
| 660 |
+
</prop>
|
| 661 |
+
<prop oor:name="Height" oor:op="fuse">
|
| 662 |
+
<value>29700</value>
|
| 663 |
+
</prop>
|
| 664 |
+
</item>
|
| 665 |
+
|
| 666 |
+
<!-- Text Direction and RTL Settings -->
|
| 667 |
+
<item oor:path="/org.openoffice.Office.Writer/DefaultFont">
|
| 668 |
+
<prop oor:name="Document" oor:op="fuse">
|
| 669 |
+
<value>true</value>
|
| 670 |
+
</prop>
|
| 671 |
+
<prop oor:name="List" oor:op="fuse">
|
| 672 |
+
<value>Amiri;Noto Naskh Arabic;Liberation Sans</value>
|
| 673 |
+
</prop>
|
| 674 |
+
<prop oor:name="StandardHeight" oor:op="fuse">
|
| 675 |
+
<value>12</value>
|
| 676 |
+
</prop>
|
| 677 |
+
<prop oor:name="HeadingHeight" oor:op="fuse">
|
| 678 |
+
<value>14</value>
|
| 679 |
+
</prop>
|
| 680 |
+
<prop oor:name="ListHeight" oor:op="fuse">
|
| 681 |
+
<value>12</value>
|
| 682 |
+
</prop>
|
| 683 |
+
<prop oor:name="CaptionHeight" oor:op="fuse">
|
| 684 |
+
<value>12</value>
|
| 685 |
+
</prop>
|
| 686 |
+
<prop oor:name="IndexHeight" oor:op="fuse">
|
| 687 |
+
<value>12</value>
|
| 688 |
+
</prop>
|
| 689 |
</item>
|
| 690 |
|
| 691 |
<!-- Disable Auto-formatting Features -->
|
|
|
|
| 759 |
input_file = temp_path / "input.docx"
|
| 760 |
shutil.copy2(docx_file.name, input_file)
|
| 761 |
|
| 762 |
+
# LibreOffice conversion command with MAXIMUM formatting preservation for Arabic RTL
|
| 763 |
+
# Enhanced PDF export filter with perfect Arabic support and zero tolerance for layout changes
|
| 764 |
+
pdf_filter = 'pdf:writer_pdf_Export:{"Quality":100,"ReduceImageResolution":false,"MaxImageResolution":600,"UseTaggedPDF":true,"ExportFormFields":false,"FormsType":0,"EmbedStandardFonts":true,"FontEmbedding":true,"ExportBookmarks":false,"ExportNotes":false,"ExportNotesPages":false,"ExportOnlyNotesPages":false,"ExportPlaceholders":false,"ExportHiddenSlides":false,"SinglePageSheets":false,"UseTransitionEffects":false,"IsSkipEmptyPages":false,"IsAddStream":false,"AllowDuplicateFieldNames":false,"IsExportNotes":false,"IsExportNotesPages":false,"IsExportOnlyNotesPages":false,"IsExportHiddenSlides":false,"CompressMode":0,"JPEGQuality":100,"BitmapResolution":600,"ImageResolution":600,"ColorMode":0,"Watermark":"","EncryptFile":false,"DocumentOpenPassword":"","PermissionPassword":"","RestrictPermissions":false,"Printing":2,"Changes":4,"EnableCopyingOfContent":true,"EnableTextAccessForAccessibilityTools":true,"SelectPdfVersion":1,"ExportLinksRelativeFsys":false,"PDFViewSelection":0,"ConvertOOoTargetToPDFTarget":false,"ExportBookmarksToPDFDestination":false}'
|
| 765 |
|
| 766 |
cmd = [
|
| 767 |
"libreoffice",
|
|
|
|
| 772 |
"--nologo",
|
| 773 |
"--norestore",
|
| 774 |
"--nofirststartwizard",
|
| 775 |
+
"--safe-mode",
|
| 776 |
"--convert-to", pdf_filter,
|
| 777 |
"--outdir", str(temp_path),
|
| 778 |
str(input_file)
|
| 779 |
]
|
| 780 |
|
| 781 |
+
# Execute conversion with comprehensive custom environment optimized for Arabic RTL
|
| 782 |
env = os.environ.copy()
|
| 783 |
env['HOME'] = config_home
|
| 784 |
env['XDG_CONFIG_HOME'] = config_home + "/.config"
|
| 785 |
env['FONTCONFIG_PATH'] = fontconfig_home + "/.config/fontconfig"
|
| 786 |
+
env['FONTCONFIG_FILE'] = fontconfig_home + "/.config/fontconfig/fonts.conf"
|
| 787 |
+
# Set Arabic-friendly locale while maintaining UTF-8 support
|
| 788 |
+
env['LANG'] = 'ar_SA.UTF-8'
|
| 789 |
+
env['LC_ALL'] = 'ar_SA.UTF-8'
|
| 790 |
+
env['LC_CTYPE'] = 'ar_SA.UTF-8'
|
| 791 |
+
env['LC_NUMERIC'] = 'ar_SA.UTF-8'
|
| 792 |
+
env['LC_TIME'] = 'ar_SA.UTF-8'
|
| 793 |
+
env['LC_COLLATE'] = 'ar_SA.UTF-8'
|
| 794 |
+
env['LC_MONETARY'] = 'ar_SA.UTF-8'
|
| 795 |
+
env['LC_MESSAGES'] = 'ar_SA.UTF-8'
|
| 796 |
+
env['LC_PAPER'] = 'ar_SA.UTF-8'
|
| 797 |
+
env['LC_NAME'] = 'ar_SA.UTF-8'
|
| 798 |
+
env['LC_ADDRESS'] = 'ar_SA.UTF-8'
|
| 799 |
+
env['LC_TELEPHONE'] = 'ar_SA.UTF-8'
|
| 800 |
+
env['LC_MEASUREMENT'] = 'ar_SA.UTF-8'
|
| 801 |
+
env['LC_IDENTIFICATION'] = 'ar_SA.UTF-8'
|
| 802 |
# Disable LibreOffice splash and user interaction
|
| 803 |
env['SAL_USE_VCLPLUGIN'] = 'svp'
|
| 804 |
env['DISPLAY'] = ':99'
|
| 805 |
+
# Enhanced LibreOffice settings for Arabic
|
| 806 |
+
env['OOO_FORCE_DESKTOP'] = 'gnome'
|
| 807 |
+
env['SAL_NO_MOUSEGRABS'] = '1'
|
| 808 |
+
env['SAL_DISABLE_OPENCL'] = '1'
|
| 809 |
+
# Force RTL support
|
| 810 |
+
env['SAL_RTL_ENABLED'] = '1'
|
| 811 |
+
env['OOO_DISABLE_RECOVERY'] = '1'
|
| 812 |
|
| 813 |
print(f"🚀 Executing LibreOffice conversion with MAXIMUM quality settings...")
|
| 814 |
print(f"Command: {' '.join(cmd[:8])}... [truncated for readability]")
|
|
|
|
| 984 |
gr.File(label="📥 Download PDF"),
|
| 985 |
gr.Textbox(label="📊 Status", interactive=False)
|
| 986 |
],
|
| 987 |
+
title="📄➡️📋 محول DOCX إلى PDF - الحفاظ الكامل على التنسيق العربي",
|
| 988 |
description="""
|
| 989 |
+
**تحويل مستندات Word إلى PDF مع الحفاظ الكامل على التنسيق العربي والـ RTL**
|
| 990 |
+
|
| 991 |
+
🎯 **الأولوية المطلقة: نتائج مطابقة بكسل بكسل**
|
| 992 |
+
- عدم التسامح مع أي تغييرات في التخطيط
|
| 993 |
+
- الحفاظ على عدد الصفحات بالضبط (1 صفحة DOCX = 1 صفحة PDF)
|
| 994 |
+
- تنسيق مثالي للجداول مع الحفاظ على أحجام النصوص الأصلية
|
| 995 |
+
- الحفاظ على ج��دة الصور بأقصى دقة (600 DPI)
|
| 996 |
+
|
| 997 |
+
✅ **الميزات المحسنة للعربية:**
|
| 998 |
+
- 🔤 **تميز الخطوط**: دعم شامل للخطوط العربية (Traditional Arabic→Amiri، Arabic Typesetting→Noto Naskh)
|
| 999 |
+
- 📊 **كمال الجداول**: لا توجد تغييرات تلقائية في الخط العريض، أبعاد الخلايا الدقيقة
|
| 1000 |
+
- 🖼️ **جودة الصور**: الحفاظ على أقصى دقة، بدون ضغط مدمر
|
| 1001 |
+
- 🌍 **دعم العربية RTL**: عرض مثالي للنص من اليمين إلى اليسار
|
| 1002 |
+
- 🔍 **التحقق من الجودة**: تحليل فوري والتحقق من نتائج التحويل
|
| 1003 |
+
- 🛠️ **تشخيص متقدم**: تحليل شامل للأخطاء واستكشاف الأخطاء وإصلاحها
|
| 1004 |
+
|
| 1005 |
+
📝 **التعليمات:**
|
| 1006 |
+
1. ارفع ملف .docx الخاص بك (يدعم التخطيطات المعقدة والجداول والصور)
|
| 1007 |
+
2. انتظر التحويل مع التحليل الفوري للجودة
|
| 1008 |
+
3. حمل ملف PDF المطابق بكسل بكسل
|
| 1009 |
+
|
| 1010 |
+
⚠️ **محسن لـ:** المستندات المعقدة مع الجداول والصور والخطوط المختلطة
|
| 1011 |
+
📏 **دعم الملفات:** ملفات .docx حتى 50 ميجابايت مع تعقيد غير محدود
|
| 1012 |
+
|
| 1013 |
+
🔧 **مدعوم بـ LibreOffice المحسن** مع إعدادات الجودة القصوى للعربية
|
| 1014 |
+
|
| 1015 |
+
🎯 **حلول المشاكل الشائعة:**
|
| 1016 |
+
- ✅ حل مشكلة تراكب النصوص العربية
|
| 1017 |
+
- ✅ الحفاظ على المحاذاة اليمنى (RTL) بدقة
|
| 1018 |
+
- ✅ منع استبدال الخطوط العربية الأصلية
|
| 1019 |
+
- ✅ الحفاظ على بنية الجداول وعدم تشويهها
|
| 1020 |
+
- ✅ الحفاظ على مواقع قوالب التعبئة الديناميكية {{name}}, {{date}}
|
| 1021 |
+
- ✅ ضمان حجم الصفحة المناسب للطباعة على ورق A4
|
| 1022 |
""",
|
| 1023 |
examples=None,
|
| 1024 |
cache_examples=False,
|
arabic_fonts_setup.sh
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Arabic Fonts Setup Script for Enhanced RTL Support
|
| 3 |
+
# This script ensures optimal Arabic font support for LibreOffice PDF conversion
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
echo "🔤 Setting up Arabic fonts for perfect RTL support..."
|
| 8 |
+
|
| 9 |
+
# Create fonts directory
|
| 10 |
+
FONTS_DIR="/usr/share/fonts/truetype/arabic-enhanced"
|
| 11 |
+
mkdir -p "$FONTS_DIR"
|
| 12 |
+
|
| 13 |
+
# Download and install Amiri font (best for Traditional Arabic)
|
| 14 |
+
echo "📥 Installing Amiri font..."
|
| 15 |
+
cd /tmp
|
| 16 |
+
wget -q "https://github.com/aliftype/amiri/releases/download/0.117/Amiri-0.117.zip" -O amiri.zip
|
| 17 |
+
unzip -q amiri.zip
|
| 18 |
+
cp Amiri-0.117/*.ttf "$FONTS_DIR/"
|
| 19 |
+
rm -rf amiri.zip Amiri-0.117/
|
| 20 |
+
|
| 21 |
+
# Download and install Scheherazade New font
|
| 22 |
+
echo "📥 Installing Scheherazade New font..."
|
| 23 |
+
wget -q "https://github.com/silnrsi/font-scheherazade/releases/download/v3.300/ScheherazadeNew-3.300.zip" -O scheherazade.zip
|
| 24 |
+
unzip -q scheherazade.zip
|
| 25 |
+
cp ScheherazadeNew-3.300/*.ttf "$FONTS_DIR/"
|
| 26 |
+
rm -rf scheherazade.zip ScheherazadeNew-3.300/
|
| 27 |
+
|
| 28 |
+
# Set proper permissions
|
| 29 |
+
chmod 644 "$FONTS_DIR"/*.ttf
|
| 30 |
+
|
| 31 |
+
# Update font cache
|
| 32 |
+
echo "🔄 Updating font cache..."
|
| 33 |
+
fc-cache -fv
|
| 34 |
+
|
| 35 |
+
# Verify Arabic fonts installation
|
| 36 |
+
echo "✅ Verifying Arabic fonts installation..."
|
| 37 |
+
fc-list | grep -i "amiri\|scheherazade\|noto.*arabic" | head -10
|
| 38 |
+
|
| 39 |
+
echo "🎯 Arabic fonts setup completed successfully!"
|
| 40 |
+
echo "Available Arabic fonts:"
|
| 41 |
+
fc-list | grep -i "arabic\|amiri\|scheherazade" | cut -d: -f2 | sort | uniq
|
docker-compose.yml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version: '3.8'
|
| 2 |
+
|
| 3 |
+
services:
|
| 4 |
+
docx-to-pdf-arabic:
|
| 5 |
+
build: .
|
| 6 |
+
container_name: docx-pdf-converter-arabic
|
| 7 |
+
ports:
|
| 8 |
+
- "7860:7860"
|
| 9 |
+
environment:
|
| 10 |
+
- LANG=ar_SA.UTF-8
|
| 11 |
+
- LC_ALL=ar_SA.UTF-8
|
| 12 |
+
- PYTHONUNBUFFERED=1
|
| 13 |
+
volumes:
|
| 14 |
+
# Optional: Mount local directories for testing
|
| 15 |
+
- ./test_files:/app/test_files:ro
|
| 16 |
+
- ./test_results:/app/test_results
|
| 17 |
+
restart: unless-stopped
|
| 18 |
+
healthcheck:
|
| 19 |
+
test: ["CMD", "curl", "-f", "http://localhost:7860/"]
|
| 20 |
+
interval: 30s
|
| 21 |
+
timeout: 10s
|
| 22 |
+
retries: 3
|
| 23 |
+
start_period: 40s
|
libreoffice_arabic_config.xml
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<?xml version="1.0" encoding="UTF-8"?>
|
| 2 |
+
<!-- LibreOffice Arabic RTL Configuration Template -->
|
| 3 |
+
<!-- This configuration ensures perfect Arabic text rendering and RTL support -->
|
| 4 |
+
<oor:items xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
|
| 5 |
+
|
| 6 |
+
<!-- Arabic Language and Locale Settings -->
|
| 7 |
+
<item oor:path="/org.openoffice.Setup/L10N">
|
| 8 |
+
<prop oor:name="ooLocale" oor:op="fuse">
|
| 9 |
+
<value>ar-SA</value>
|
| 10 |
+
</prop>
|
| 11 |
+
<prop oor:name="ooSetupSystemLocale" oor:op="fuse">
|
| 12 |
+
<value>ar-SA</value>
|
| 13 |
+
</prop>
|
| 14 |
+
</item>
|
| 15 |
+
|
| 16 |
+
<!-- CTL (Complex Text Layout) for Arabic -->
|
| 17 |
+
<item oor:path="/org.openoffice.Office.Common/I18N/CTL">
|
| 18 |
+
<prop oor:name="CTLFont" oor:op="fuse">
|
| 19 |
+
<value>true</value>
|
| 20 |
+
</prop>
|
| 21 |
+
<prop oor:name="CTLSequenceChecking" oor:op="fuse">
|
| 22 |
+
<value>true</value>
|
| 23 |
+
</prop>
|
| 24 |
+
<prop oor:name="CTLCursorMovement" oor:op="fuse">
|
| 25 |
+
<value>1</value>
|
| 26 |
+
</prop>
|
| 27 |
+
<prop oor:name="CTLTextNumerals" oor:op="fuse">
|
| 28 |
+
<value>1</value>
|
| 29 |
+
</prop>
|
| 30 |
+
<prop oor:name="CTLTypeAndReplace" oor:op="fuse">
|
| 31 |
+
<value>true</value>
|
| 32 |
+
</prop>
|
| 33 |
+
</item>
|
| 34 |
+
|
| 35 |
+
<!-- Arabic Default Fonts -->
|
| 36 |
+
<item oor:path="/org.openoffice.VCL/DefaultFonts">
|
| 37 |
+
<prop oor:name="ar_SANS" oor:op="fuse">
|
| 38 |
+
<value>Amiri;Noto Naskh Arabic;Liberation Sans</value>
|
| 39 |
+
</prop>
|
| 40 |
+
<prop oor:name="ar_SERIF" oor:op="fuse">
|
| 41 |
+
<value>Amiri;Noto Naskh Arabic;Liberation Serif</value>
|
| 42 |
+
</prop>
|
| 43 |
+
<prop oor:name="ar_FIXED" oor:op="fuse">
|
| 44 |
+
<value>Liberation Mono;Noto Sans Mono</value>
|
| 45 |
+
</prop>
|
| 46 |
+
<prop oor:name="ar_UI" oor:op="fuse">
|
| 47 |
+
<value>Amiri;Noto Naskh Arabic;DejaVu Sans</value>
|
| 48 |
+
</prop>
|
| 49 |
+
</item>
|
| 50 |
+
|
| 51 |
+
<!-- Text Direction Settings -->
|
| 52 |
+
<item oor:path="/org.openoffice.Office.Writer/Layout/Other">
|
| 53 |
+
<prop oor:name="DefaultTextDirection" oor:op="fuse">
|
| 54 |
+
<value>2</value> <!-- RTL -->
|
| 55 |
+
</prop>
|
| 56 |
+
<prop oor:name="IsAlignTabStopPosition" oor:op="fuse">
|
| 57 |
+
<value>true</value>
|
| 58 |
+
</prop>
|
| 59 |
+
</item>
|
| 60 |
+
|
| 61 |
+
<!-- Page Layout for Arabic Documents -->
|
| 62 |
+
<item oor:path="/org.openoffice.Office.Writer/Layout/Page">
|
| 63 |
+
<prop oor:name="IsLandscape" oor:op="fuse">
|
| 64 |
+
<value>false</value>
|
| 65 |
+
</prop>
|
| 66 |
+
<prop oor:name="Width" oor:op="fuse">
|
| 67 |
+
<value>21000</value> <!-- A4 width in 1/100mm -->
|
| 68 |
+
</prop>
|
| 69 |
+
<prop oor:name="Height" oor:op="fuse">
|
| 70 |
+
<value>29700</value> <!-- A4 height in 1/100mm -->
|
| 71 |
+
</prop>
|
| 72 |
+
<prop oor:name="LeftMargin" oor:op="fuse">
|
| 73 |
+
<value>2000</value>
|
| 74 |
+
</prop>
|
| 75 |
+
<prop oor:name="RightMargin" oor:op="fuse">
|
| 76 |
+
<value>2000</value>
|
| 77 |
+
</prop>
|
| 78 |
+
<prop oor:name="TopMargin" oor:op="fuse">
|
| 79 |
+
<value>2000</value>
|
| 80 |
+
</prop>
|
| 81 |
+
<prop oor:name="BottomMargin" oor:op="fuse">
|
| 82 |
+
<value>2000</value>
|
| 83 |
+
</prop>
|
| 84 |
+
</item>
|
| 85 |
+
|
| 86 |
+
<!-- Disable Auto-formatting that might interfere with Arabic -->
|
| 87 |
+
<item oor:path="/org.openoffice.Office.Writer/AutoFunction/Format/Option">
|
| 88 |
+
<prop oor:name="UseReplacementTable" oor:op="fuse">
|
| 89 |
+
<value>false</value>
|
| 90 |
+
</prop>
|
| 91 |
+
<prop oor:name="TwoCapitalsAtStart" oor:op="fuse">
|
| 92 |
+
<value>false</value>
|
| 93 |
+
</prop>
|
| 94 |
+
<prop oor:name="CapitalAtStartSentence" oor:op="fuse">
|
| 95 |
+
<value>false</value>
|
| 96 |
+
</prop>
|
| 97 |
+
<prop oor:name="ChgToEnEmDash" oor:op="fuse">
|
| 98 |
+
<value>false</value>
|
| 99 |
+
</prop>
|
| 100 |
+
<prop oor:name="AddNonBrkSpace" oor:op="fuse">
|
| 101 |
+
<value>false</value>
|
| 102 |
+
</prop>
|
| 103 |
+
<prop oor:name="ChgQuotes" oor:op="fuse">
|
| 104 |
+
<value>false</value>
|
| 105 |
+
</prop>
|
| 106 |
+
</item>
|
| 107 |
+
|
| 108 |
+
</oor:items>
|
packages.txt
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
libreoffice
|
| 2 |
libreoffice-writer
|
|
|
|
|
|
|
| 3 |
fonts-liberation
|
| 4 |
fonts-liberation2
|
| 5 |
fonts-dejavu
|
|
@@ -10,6 +12,12 @@ fonts-noto-core
|
|
| 10 |
fonts-noto-ui-core
|
| 11 |
fonts-noto-mono
|
| 12 |
fonts-noto-color-emoji
|
|
|
|
|
|
|
| 13 |
fonts-opensymbol
|
| 14 |
fonts-freefont-ttf
|
|
|
|
|
|
|
| 15 |
fontconfig
|
|
|
|
|
|
|
|
|
| 1 |
libreoffice
|
| 2 |
libreoffice-writer
|
| 3 |
+
libreoffice-l10n-ar
|
| 4 |
+
libreoffice-help-ar
|
| 5 |
fonts-liberation
|
| 6 |
fonts-liberation2
|
| 7 |
fonts-dejavu
|
|
|
|
| 12 |
fonts-noto-ui-core
|
| 13 |
fonts-noto-mono
|
| 14 |
fonts-noto-color-emoji
|
| 15 |
+
fonts-noto-naskh
|
| 16 |
+
fonts-noto-kufi-arabic
|
| 17 |
fonts-opensymbol
|
| 18 |
fonts-freefont-ttf
|
| 19 |
+
fonts-amiri
|
| 20 |
+
fonts-scheherazade-new
|
| 21 |
fontconfig
|
| 22 |
+
wget
|
| 23 |
+
curl
|
run_local.py
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Local runner for DOCX to PDF converter with Arabic support
|
| 4 |
+
Run this script to test the converter locally before deploying to Hugging Face Spaces
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import subprocess
|
| 8 |
+
import sys
|
| 9 |
+
import os
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
def check_system_requirements():
|
| 13 |
+
"""Check if all system requirements are installed"""
|
| 14 |
+
print("🔍 Checking system requirements...")
|
| 15 |
+
|
| 16 |
+
requirements = {
|
| 17 |
+
"LibreOffice": ["libreoffice", "--version"],
|
| 18 |
+
"Font Cache": ["fc-cache", "--version"],
|
| 19 |
+
"Font List": ["fc-list", "--help"]
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
missing = []
|
| 23 |
+
for name, cmd in requirements.items():
|
| 24 |
+
try:
|
| 25 |
+
result = subprocess.run(cmd, capture_output=True, timeout=5)
|
| 26 |
+
if result.returncode == 0:
|
| 27 |
+
print(f"✅ {name}: Available")
|
| 28 |
+
else:
|
| 29 |
+
print(f"❌ {name}: Not working properly")
|
| 30 |
+
missing.append(name)
|
| 31 |
+
except (subprocess.TimeoutExpired, FileNotFoundError):
|
| 32 |
+
print(f"❌ {name}: Not found")
|
| 33 |
+
missing.append(name)
|
| 34 |
+
|
| 35 |
+
if missing:
|
| 36 |
+
print(f"\n⚠️ Missing requirements: {', '.join(missing)}")
|
| 37 |
+
print("\nTo install on Ubuntu/Debian:")
|
| 38 |
+
print("sudo apt-get update")
|
| 39 |
+
print("sudo apt-get install libreoffice libreoffice-writer fonts-liberation fonts-dejavu fonts-noto fontconfig")
|
| 40 |
+
return False
|
| 41 |
+
|
| 42 |
+
print("✅ All system requirements are available")
|
| 43 |
+
return True
|
| 44 |
+
|
| 45 |
+
def install_python_requirements():
|
| 46 |
+
"""Install Python requirements"""
|
| 47 |
+
print("\n📦 Installing Python requirements...")
|
| 48 |
+
try:
|
| 49 |
+
subprocess.run([sys.executable, "-m", "pip", "install", "-r", "requirements.txt"],
|
| 50 |
+
check=True)
|
| 51 |
+
print("✅ Python requirements installed successfully")
|
| 52 |
+
return True
|
| 53 |
+
except subprocess.CalledProcessError as e:
|
| 54 |
+
print(f"❌ Failed to install Python requirements: {e}")
|
| 55 |
+
return False
|
| 56 |
+
|
| 57 |
+
def setup_arabic_fonts():
|
| 58 |
+
"""Setup Arabic fonts if the script exists"""
|
| 59 |
+
script_path = Path("arabic_fonts_setup.sh")
|
| 60 |
+
if script_path.exists():
|
| 61 |
+
print("\n🔤 Setting up Arabic fonts...")
|
| 62 |
+
try:
|
| 63 |
+
# Make script executable
|
| 64 |
+
os.chmod(script_path, 0o755)
|
| 65 |
+
subprocess.run(["bash", str(script_path)], check=True)
|
| 66 |
+
print("✅ Arabic fonts setup completed")
|
| 67 |
+
return True
|
| 68 |
+
except subprocess.CalledProcessError as e:
|
| 69 |
+
print(f"⚠️ Arabic fonts setup failed: {e}")
|
| 70 |
+
print("Continuing without additional Arabic fonts...")
|
| 71 |
+
return False
|
| 72 |
+
else:
|
| 73 |
+
print("⚠️ Arabic fonts setup script not found, skipping...")
|
| 74 |
+
return False
|
| 75 |
+
|
| 76 |
+
def run_app():
|
| 77 |
+
"""Run the main application"""
|
| 78 |
+
print("\n🚀 Starting DOCX to PDF converter...")
|
| 79 |
+
print("The application will be available at: http://localhost:7860")
|
| 80 |
+
print("Press Ctrl+C to stop the application")
|
| 81 |
+
|
| 82 |
+
try:
|
| 83 |
+
subprocess.run([sys.executable, "app.py"], check=True)
|
| 84 |
+
except KeyboardInterrupt:
|
| 85 |
+
print("\n👋 Application stopped by user")
|
| 86 |
+
except subprocess.CalledProcessError as e:
|
| 87 |
+
print(f"❌ Application failed to start: {e}")
|
| 88 |
+
|
| 89 |
+
def main():
|
| 90 |
+
"""Main function"""
|
| 91 |
+
print("🔧 DOCX to PDF Converter - Local Setup")
|
| 92 |
+
print("=" * 50)
|
| 93 |
+
|
| 94 |
+
# Check system requirements
|
| 95 |
+
if not check_system_requirements():
|
| 96 |
+
print("\n❌ System requirements not met. Please install missing components.")
|
| 97 |
+
return 1
|
| 98 |
+
|
| 99 |
+
# Install Python requirements
|
| 100 |
+
if not install_python_requirements():
|
| 101 |
+
print("\n❌ Failed to install Python requirements.")
|
| 102 |
+
return 1
|
| 103 |
+
|
| 104 |
+
# Setup Arabic fonts (optional)
|
| 105 |
+
setup_arabic_fonts()
|
| 106 |
+
|
| 107 |
+
# Run the application
|
| 108 |
+
run_app()
|
| 109 |
+
|
| 110 |
+
return 0
|
| 111 |
+
|
| 112 |
+
if __name__ == "__main__":
|
| 113 |
+
sys.exit(main())
|
test_conversion.py
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for DOCX to PDF conversion with Arabic RTL support
|
| 4 |
+
This script tests the conversion functionality locally
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import sys
|
| 8 |
+
import os
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
# Add the current directory to Python path
|
| 12 |
+
sys.path.insert(0, str(Path(__file__).parent))
|
| 13 |
+
|
| 14 |
+
from app import convert_docx_to_pdf, setup_libreoffice, setup_font_environment
|
| 15 |
+
|
| 16 |
+
def test_arabic_conversion():
|
| 17 |
+
"""Test the Arabic DOCX to PDF conversion"""
|
| 18 |
+
|
| 19 |
+
print("🧪 Testing Arabic DOCX to PDF Conversion")
|
| 20 |
+
print("=" * 50)
|
| 21 |
+
|
| 22 |
+
# Check LibreOffice setup
|
| 23 |
+
print("1. Checking LibreOffice setup...")
|
| 24 |
+
if not setup_libreoffice():
|
| 25 |
+
print("❌ LibreOffice setup failed!")
|
| 26 |
+
return False
|
| 27 |
+
print("✅ LibreOffice setup successful")
|
| 28 |
+
|
| 29 |
+
# Setup font environment
|
| 30 |
+
print("\n2. Setting up font environment...")
|
| 31 |
+
setup_font_environment()
|
| 32 |
+
print("✅ Font environment setup completed")
|
| 33 |
+
|
| 34 |
+
# Check for test files
|
| 35 |
+
test_files_dir = Path("test_files")
|
| 36 |
+
if not test_files_dir.exists():
|
| 37 |
+
print(f"\n⚠️ Test files directory '{test_files_dir}' not found")
|
| 38 |
+
print("Please create test_files/ directory and add sample DOCX files")
|
| 39 |
+
return False
|
| 40 |
+
|
| 41 |
+
docx_files = list(test_files_dir.glob("*.docx"))
|
| 42 |
+
if not docx_files:
|
| 43 |
+
print(f"\n⚠️ No DOCX files found in '{test_files_dir}'")
|
| 44 |
+
print("Please add sample DOCX files to test the conversion")
|
| 45 |
+
return False
|
| 46 |
+
|
| 47 |
+
print(f"\n3. Found {len(docx_files)} DOCX files for testing:")
|
| 48 |
+
for docx_file in docx_files:
|
| 49 |
+
print(f" 📄 {docx_file.name}")
|
| 50 |
+
|
| 51 |
+
# Test conversion for each file
|
| 52 |
+
results_dir = Path("test_results")
|
| 53 |
+
results_dir.mkdir(exist_ok=True)
|
| 54 |
+
|
| 55 |
+
success_count = 0
|
| 56 |
+
total_count = len(docx_files)
|
| 57 |
+
|
| 58 |
+
for docx_file in docx_files:
|
| 59 |
+
print(f"\n4. Testing conversion: {docx_file.name}")
|
| 60 |
+
print("-" * 30)
|
| 61 |
+
|
| 62 |
+
# Create a mock file object
|
| 63 |
+
class MockFile:
|
| 64 |
+
def __init__(self, path):
|
| 65 |
+
self.name = str(path)
|
| 66 |
+
|
| 67 |
+
mock_file = MockFile(docx_file)
|
| 68 |
+
|
| 69 |
+
try:
|
| 70 |
+
pdf_path, status_message = convert_docx_to_pdf(mock_file)
|
| 71 |
+
|
| 72 |
+
if pdf_path and os.path.exists(pdf_path):
|
| 73 |
+
# Move the result to test_results directory
|
| 74 |
+
result_name = docx_file.stem + "_converted.pdf"
|
| 75 |
+
result_path = results_dir / result_name
|
| 76 |
+
|
| 77 |
+
import shutil
|
| 78 |
+
shutil.move(pdf_path, result_path)
|
| 79 |
+
|
| 80 |
+
print(f"✅ Conversion successful!")
|
| 81 |
+
print(f"📁 Output: {result_path}")
|
| 82 |
+
print(f"📊 Status: {status_message[:100]}...")
|
| 83 |
+
success_count += 1
|
| 84 |
+
else:
|
| 85 |
+
print(f"❌ Conversion failed!")
|
| 86 |
+
print(f"📊 Error: {status_message}")
|
| 87 |
+
|
| 88 |
+
except Exception as e:
|
| 89 |
+
print(f"❌ Conversion error: {str(e)}")
|
| 90 |
+
|
| 91 |
+
# Summary
|
| 92 |
+
print(f"\n🎯 Test Summary:")
|
| 93 |
+
print(f" Total files: {total_count}")
|
| 94 |
+
print(f" Successful: {success_count}")
|
| 95 |
+
print(f" Failed: {total_count - success_count}")
|
| 96 |
+
print(f" Success rate: {(success_count/total_count)*100:.1f}%")
|
| 97 |
+
|
| 98 |
+
if success_count > 0:
|
| 99 |
+
print(f"\n📁 Results saved in: {results_dir}")
|
| 100 |
+
|
| 101 |
+
return success_count == total_count
|
| 102 |
+
|
| 103 |
+
def create_sample_test_files():
|
| 104 |
+
"""Create sample test files for testing"""
|
| 105 |
+
test_files_dir = Path("test_files")
|
| 106 |
+
test_files_dir.mkdir(exist_ok=True)
|
| 107 |
+
|
| 108 |
+
print("📝 Creating sample test files...")
|
| 109 |
+
print("Note: You need to manually create DOCX files with Arabic content")
|
| 110 |
+
print("Suggested test cases:")
|
| 111 |
+
print("1. Simple Arabic text document")
|
| 112 |
+
print("2. Document with Arabic tables")
|
| 113 |
+
print("3. Mixed Arabic/English document")
|
| 114 |
+
print("4. Document with Arabic headers and footers")
|
| 115 |
+
print("5. Document with Arabic bullet points")
|
| 116 |
+
print(f"\nPlace your test DOCX files in: {test_files_dir.absolute()}")
|
| 117 |
+
|
| 118 |
+
if __name__ == "__main__":
|
| 119 |
+
if len(sys.argv) > 1 and sys.argv[1] == "--create-samples":
|
| 120 |
+
create_sample_test_files()
|
| 121 |
+
else:
|
| 122 |
+
test_arabic_conversion()
|