| # Recommended Free Arabic PDF To Audio Stack
|
|
|
| This is the compact decision report generated from the current research watchlist.
|
|
|
| ## Use Now
|
|
|
| | Layer | Recommendation | Why |
|
| | --- | --- | --- |
|
| | Embedded PDFs | PyMuPDF text extraction first | It is free, fast, and avoids OCR errors when the PDF already contains usable Arabic text. |
|
| | Scanned PDFs | `OCR_ENGINE=tesseract OCR_RENDER_ZOOM=2 TESSERACT_PSM=4` | It produced the most readable text on the 5-page Arabic OCR benchmark while staying much faster than the comparison modes. | |
| | Default voice | SILMA TTS | Arabic-focused Fusha/MSA voice with normalization and tashkeel options. |
|
| | Download/storage | Worker-local retained audio files | Free by default and avoids Vercel's 4.5 MB function payload limit; Hugging Face free CPU disk is 50 GB but non-persistent, so downloads are short-lived. |
|
| | Hosted shape | Vercel shell plus Docker worker via `WORKER_BASE_URL` | Vercel serves the easy website while the worker handles large PDFs, OCR, and TTS on free CPU Space hardware when the job size is reasonable. |
|
|
|
| ## Install First On A Stronger Worker
|
|
|
| | Candidate | Type | Why | Next Step |
|
| | --- | --- | --- | --- |
|
| | QARI-OCR 0.4 | ocr | Directly trained for Arabic OCR on Islamic books and Arabic manuscripts. | Install the sidecar or build the worker with INSTALL_QARI_OCR=1, then benchmark against arabic-max/arabic/arabic-qwen-ocr/katib-ocr/paddleocr/tesseract on the 5-page sample. |
|
| | PaddleOCR-VL-1.6 | ocr | Fresh Apache-2.0 PaddleOCR document parser release with a June 2026 paper signal; the model card claims SOTA document parsing/text performance and the license file is Apache-2.0, but Arabic-book quality still needs same-page scoring. | Build with INSTALL_PADDLEOCR_VL=1 only after the smaller Arabic OCR stack is not clean enough, then benchmark the same 5-page Arabic sample before any full-book run. |
|
| | KATIB 0.8B | ocr | Fine-tuned specifically for Arabic OCR, including printed and handwritten text, while being much smaller than QARI 4B. | Install the sidecar or build the worker with INSTALL_KATIB_OCR=1, then benchmark against arabic-max/arabic/arabic-qwen-ocr/qari-ocr/paddleocr/tesseract on the 5-page sample. |
|
| | Arabic-GLM-OCR-v2 | ocr | Recent Arabic OCR model card claims strong Arabic document extraction and noise reduction; it is wired as an optional sidecar so it can be scored against QARI/KATIB/Arabic-Qwen/Baseer on the target book pages. | Install the sidecar or build the worker with INSTALL_ARABIC_GLM_OCR=1, then benchmark it on the same 5-page sample before any full-book run. |
|
| | Arabic-Qwen3.5-OCR-v4 | ocr | Recent Arabic OCR model card claims Arabic printed, handwritten, classical, and diacritic handling in a smaller 0.9B model. | Install the sidecar or build the worker with INSTALL_ARABIC_QWEN_OCR=1, then benchmark against arabic-max/arabic/katib-ocr/qari-ocr/paddleocr/tesseract on the 5-page sample. |
|
| | Tawkeed OCR | ocr | Arabic-first OCR model forked from QARI-OCR v0.3 and fine-tuned for Arabic documents, handwriting, and scene text; useful to test when QARI 0.4 is too heavy or when edge-style deployment matters. | Install the sidecar or build the worker with INSTALL_TAWKEED_OCR=1, then benchmark against QARI 0.4, KATIB, Arabic-Qwen, Baseer, and Tesseract on the same 5-page sample. |
|
| | Baseer OCR V1.0 | ocr | Arabic-specific VLM OCR for complex legal documents, multi-column layouts, stamps, tables, and handwritten/printed Arabic. | Install the sidecar or build the worker with INSTALL_BASEER_OCR=1, then benchmark against arabic-max/arabic/arabic-qwen-ocr/katib-ocr/qari-ocr/paddleocr/tesseract on the 5-page sample. |
|
| | Habibi-TTS MSA | tts | Arabic-specific 2026 TTS family worth comparing against SILMA on MSA passages. | Install the optional sidecar and listen against the same cleaned OCR sample. |
|
| | Supertonic 3 | tts | Supertonic 3 supports Arabic, runs locally with ONNX on CPU, and is much smaller than GPU-class multilingual voices, making it a practical free benchmark voice for long-book workers. | Install the sidecar with scripts/setup_supertonic.ps1 or build with INSTALL_SUPERTONIC=1, then benchmark it against SILMA/Habibi on the same cleaned Arabic text. |
|
|
|
| ## Benchmark Before Promoting
|
|
|
| These are promising free/open candidates, but they should not replace the default stack until they win on the same 5-page Arabic sample and same cleaned TTS text.
|
|
|
| ### OCR
|
|
|
| | Candidate | License | Why It Stays Benchmark-Only |
|
| | --- | --- | --- |
|
| | QARI-OCR 0.4 GGUF | Apache-2.0 via QARI 0.4 model card; confirm GGUF packaging metadata before production | Benchmark the GGUF package externally on the same exported Arabic pages against the wired QARI sidecar, KATIB, Arabic-Qwen, Baseer, PaddleOCR, and Tesseract before considering a llama.cpp-style worker path. |
|
| | oi-OCR | Apache-2.0 | Export the same selected Arabic page images and compare its Markdown/text output against QARI/KATIB/Arabic-Qwen/PaddleOCR/Tesseract before considering any wiring. |
|
| | NuExtract3 | Apache-2.0 | Use document-to-Markdown/content mode on the exported page images and score the resulting Arabic text against QARI/KATIB/Arabic-Qwen/Baseer/PaddleOCR/Tesseract before promotion. |
|
| | Qianfan-OCR | Apache-2.0 | Benchmark externally only after QARI/KATIB/Arabic-Qwen/Baseer/PaddleOCR are not clean enough; score it on the same exported Arabic book pages before considering any worker wiring. |
|
| | Chandra OCR 2 | Apache-2.0 code; modified OpenRAIL-M model weights | Benchmark externally on the same exported page images for hard layouts, tables, forms, or mixed-language pages; keep QARI/KATIB/Arabic-Qwen/Baseer first for Arabic books unless Chandra wins same-page scoring and the license/runtime fit. |
|
| | dots.ocr | MIT | Run externally on the same exported Arabic page images and score the resulting text against QARI/KATIB/Arabic-Qwen/Baseer/PaddleOCR/Tesseract before considering any worker wiring. |
|
| | olmOCR Arabic LoRA v2 | Apache-2.0 adapter; confirm base model license/runtime before production | Run externally on the same exported full-page manuscript images and compare against Ketaba, QARI, HAFITH/Glimpse line workflows, Kraken/eScriptorium, and the wired Arabic OCR baseline before considering any sidecar work. |
|
| | Arabic Large Nougat | GPL-3.0 | Run externally on the same exported Arabic book page images and compare Markdown/text output against QARI, KATIB, Arabic-Qwen, Baseer, PaddleOCR, Tesseract, and the other external OCR benchmarks before considering any separate license-aware workflow. |
|
| | DocTR Arabic FAST/PARSEQ | Apache-2.0 detector; recognition card lacks clear metadata, confirm before production | Benchmark externally on the same exported Arabic page images and promote only if the recognition model license is confirmed and it beats PaddleOCR/Tesseract/EasyOCR on book text ordering and word preservation. |
|
| | Kraken/eScriptorium Arabic script | Apache-2.0 engine; model license depends on selected Kraken model | Export the same selected page images, run Kraken/eScriptorium with an Arabic-script recognition model or line-cropped workflow, then score the resulting text against the wired Arabic OCR stack before considering any sidecar work. |
|
| | Kairawan/Qalamus manuscript OCR | free web service; engine/package license not established | Use only as an external comparison when the source PDF is manuscript-like; do not wire it into the app unless a reusable open engine, API terms, privacy story, and same-page scoring beat QARI/KATIB/Kraken/HAFITH on the selected sample. |
|
| | GLM-OCR Arabic/French documents | check model card/base license before production use | Benchmark externally for administrative/form-like Arabic PDFs and compare against Arabic-GLM-OCR-v2, QARI, KATIB, Baseer, PaddleOCR, and Tesseract before wiring. |
|
|
|
| ### TTS
|
|
|
| | Candidate | License | Why It Stays Benchmark-Only |
|
| | --- | --- | --- |
|
| | Mishkala Tashkeel | Apache-2.0 | Benchmark on the same cleaned speech sample before wiring. Promote only if listening tests improve pronunciation without changing meaning or adding distracting/incorrect harakat. |
|
| | Tashkeel-350M | Apache-2.0 | Export the same cleaned Arabic TTS sample, create a Tashkeel-350M diacritized copy, synthesize plain/Mishkala/Tashkeel-350M with the same voice, and score meaning preservation plus long-listen comfort. |
|
| | Mushkil | Apache-2.0 | Export the same cleaned Arabic TTS sample, create a Mushkil-diacritized copy, synthesize plain/Mishkala/Tashkeel-350M/Mushkil with the same voice, and score meaning preservation plus long-listen comfort. |
|
| | Thaka KSAA-2026 speech diacritization | CC BY 4.0 paper; implementation/model license not established | Track for released code/weights or a permissive checkpoint. Until then, keep website preprocessing limited to same-sample Mishkala/Tashkeel-350M/Mushkil listening tests and meaning-preservation scoring. |
|
| | 3arab-TTS 500M | Apache-2.0 | Export the same cleaned Arabic text used for SILMA/Habibi, then compare base and VoiceDesign variants for audiobook comfort, stability, and long-form pacing. |
|
| | KaniTTS Arabic | model card says Apache-2.0, but Hugging Face metadata reports lfm1.0; confirm before production | Export the same cleaned Arabic sample used for SILMA/Habibi, then benchmark naturalness, skipped words, pacing, runtime, and license fit before considering app wiring. |
|
| | Emirati VITS Male | Apache-2.0 | Benchmark only when the target PDF benefits from Emirati/Gulf pronunciation; keep SILMA/Habibi ahead for MSA books unless listening tests say otherwise. |
|
| | VoxCPM2 | Apache-2.0 | Benchmark externally with the same cleaned Arabic sample before deciding whether it is worth integrating. |
|
| | Voxtral TTS | CC-BY-NC-4.0 | Benchmark only as a personal/non-commercial strong-worker comparison using the same cleaned Arabic sample; do not wire it as the default public/free website voice. |
|
| | OmniVoice | Apache-2.0 | Export the same cleaned Arabic text used for SILMA/Habibi and compare Arabic naturalness, speed, and setup complexity before wiring it into the app. |
|
| | OmniVoice Arabic LoRA | Apache-2.0 | Benchmark only after the base OmniVoice command is working, using the exact same cleaned Arabic sample and reference audio. |
|
| | Arabic-text-to-speech OmniVoice | Apache-2.0 | Export the same cleaned Arabic sample used for SILMA/Habibi and compare naturalness, skipped words, repetition, runtime, and setup complexity before any app wiring. |
|
| | Lahgtna OmniVoice v2 | license not declared on model card | Benchmark externally only when dialect pronunciation matters, confirm licensing before production, and keep SILMA/Habibi ahead for MSA books until listening tests prove otherwise. |
|
| | TADA multilingual TTS | Llama 3.2 license | Export the same cleaned Arabic sample and benchmark with language='ar' only if the Llama 3.2 license is acceptable; keep SILMA/Habibi ahead for the permissive default. |
|
| | Lahgtna Chatterbox | MIT | Export the same cleaned Arabic text and listen for repetition/stability before considering app wiring. |
|
| | NAMAA-Saudi-TTS | MIT | Benchmark only when Saudi dialect pronunciation fits the target PDF; keep SILMA/Habibi first for MSA books and compare against Saudi Arabic Qwen3-TTS and Emirati voices before wiring. |
|
|
|
| ### Current Voice Priority
|
|
|
| Use SILMA first for the practical free Arabic audiobook voice. On a stronger worker, benchmark Habibi MSA and OmniVoice next. Keep KaniTTS benchmark-only until the `lfm1.0` Hugging Face license metadata is reconciled with the model-card Apache-2.0 text.
|
|
|
| ## Promotion Rule
|
|
|
| Promote a model only when all of these are true:
|
|
|
| 1. It is free for the intended personal/family use.
|
| 2. Its license is acceptable for the deployment.
|
| 3. It beats the current stack on the same selected Arabic pages or same cleaned Arabic voice sample.
|
| 4. It preserves Arabic reading order, words, and pronunciation better than the default.
|
| 5. Its runtime is acceptable for the target worker.
|
| 6. The generated JSON score passes `scripts\model_promotion_gate.py` after human review.
|
|
|
| Current practical default: PyMuPDF -> `tesseract@2x-psm4` OCR -> SILMA TTS -> downloadable worker audio. |
|
|