Source Evidence For The Free Arabic PDF-To-Audio Stack
Last checked: June 7, 2026.
This file records why each source matters to the current recommendation. It is intentionally short so it can be checked quickly before deployment.
See docs/huggingface-model-metadata.md for the latest tracked Hugging Face model ID, license, and last-modified snapshot used by this recommendation.
OCR Sources
| Source | Link | Evidence Used |
|---|---|---|
| EasyOCR | https://github.com/JaidedAI/EasyOCR | Free local OCR engine with Arabic support; useful fallback for older scans and layouts. |
| PaddleOCR PP-OCRv5 multilingual recognition | https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.en.md | Documents Arabic-script recognition model support, including arabic_PP-OCRv5_mobile_rec. |
| PaddleOCR OCR pipeline | https://www.paddleocr.ai/main/en/version3.x/pipeline_usage/OCR.html | Confirms the current PaddleOCR OCR pipeline interface used by the sidecar. |
| PaddleOCR latest docs | https://www.paddleocr.ai/latest/en/index.html | Tracks current PaddleOCR releases and document parser direction. |
| PP-OCRv5 paper | https://arxiv.org/abs/2603.24373 | Supports the lightweight OCR default choice before heavy VLM OCR. |
| QARI-OCR 0.4 model | https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct | Arabic-specific OCR VLM trained for Islamic books and Arabic manuscripts, Apache-2.0, used as the optional strong-worker Arabic OCR upgrade; current model card shows no hosted inference provider, so it needs a local/worker runtime. |
| QARI-OCR 0.4 GGUF | https://huggingface.co/marwan-osama/Qari-OCR-0.4.0-VL-4B-Instruct-GGUF | Newer GGUF packaging signal for QARI-OCR 0.4; keep benchmark-only until it matches the wired QARI sidecar on the same Arabic pages and its packaging/license metadata is confirmed. |
| QARI-OCR v0.3 lighter model | https://huggingface.co/NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct | Smaller configurable QARI fallback for workers that cannot handle the 4B model. |
| QARI-OCR paper | https://arxiv.org/abs/2506.02295 | Research background for QARI's Arabic OCR/document understanding role. |
| KATIB 0.8B Arabic OCR model | https://huggingface.co/oddadmix/Katib-Qwen3.5-0.8B-0.1 | Apache-2.0 Arabic OCR VLM fine-tuned for Arabic printed and handwritten text; wired as the smaller optional Arabic-trained OCR sidecar. |
| Ketaba-OCR LoRA | https://huggingface.co/HassanB4/Ketaba-OCR-LoRA | Apache-2.0 Arabic manuscript OCR LoRA benchmark candidate; not wired because it needs a separate base VLM plus adapter setup, but worth testing when QARI/KATIB/PaddleOCR struggle. |
| Qari-OCR-LoRA | https://huggingface.co/HassanB4/Qari-OCR-LoRA | Apache-2.0 experimental QARI-family manuscript OCR LoRA from the NakbaNLP 2026 Arabic manuscript task; keep as a secondary external benchmark after Ketaba because the model card says Ketaba was the primary winning submission. |
| Tawkeed OCR | https://huggingface.co/tawkeed-sa/tawkeed-ocr | Apache-2.0 Arabic-first OCR model forked from QARI-OCR v0.3 and fine-tuned for Arabic documents, handwriting, and scene text; wired as an optional sidecar for short-sample benchmarks when QARI-OCR 0.4 is too heavy or edge-style Arabic OCR matters. |
| PaddleOCR-VL-1.6 model | https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.6 | Apache-2.0 0.9B document parser model; useful as a strong-worker OCR/layout benchmark after the smaller Arabic-specific stack is tested. |
| PaddleOCR-VL-1.6 paper | https://arxiv.org/abs/2606.03264 | June 2026 research background for the current PaddleOCR-VL-1.6 document parsing upgrade. |
| oi-OCR | https://huggingface.co/oi-uae/oi-OCR | Apache-2.0 English/Arabic PDF document parser with April 2026 ParseBench claims; keep as an external structured-document benchmark, not the Arabic-book default. |
| Falcon-OCR | https://huggingface.co/tiiuae/Falcon-OCR | Apache-2.0 compact 300M document OCR VLM; useful watchlist candidate for strong-worker Arabic benchmarks, not default until it beats the Arabic-specific stack. |
| Falcon Perception paper | https://arxiv.org/abs/2603.27365 | Research background for Falcon-OCR's early-fusion document OCR design and public benchmark claims. |
| Baseer OCR model | https://huggingface.co/AbdoTarek/Baseer-OCR-V1.0 | Apache-2.0 Arabic document OCR VLM fine-tuned from Qwen2-VL-2B for complex Arabic legal documents, multi-column layouts, stamps, tables, and handwritten/printed Arabic; wired as an optional strong-worker sidecar. |
| Baseer OCR paper | https://arxiv.org/abs/2509.18174 | Research background for Baseer as a future Arabic OCR comparison. |
| Arabic-GLM-OCR-v2 | https://huggingface.co/sherif1313/Arabic-GLM-OCR-v2 | New Apache-2.0 Arabic OCR VLM; wired as an optional sidecar and still benchmarked on short samples before full-book use because claims need independent scoring on the actual book pages. |
| Arabic-Qwen3.5-OCR-v4 | https://huggingface.co/sherif1313/Arabic-Qwen3.5-OCR-v4 | Recent Apache-2.0 0.9B Arabic OCR VLM for printed, handwritten, classical, and diacritic-heavy Arabic; wired as an optional sidecar and benchmarked before full-book use. |
| aNS Qwen3-VL Arabic OCR v3 | https://huggingface.co/aNS2024/qwen3-vl-arabic-ocr-v3 | Fresh Qwen3-VL-2B Arabic OCR fine-tune; the public card shows no hosted inference provider and sparse OCR/license evidence, so keep it external until it beats QARI/KATIB/Arabic-Qwen/Baseer on the same selected Arabic pages. |
| Waraqon v3 Arabic OCR HTML Qari | https://huggingface.co/FatimahEmadEldin/Waraqon-v3-Arabic-OCR-HTML-Qari | Apache-2.0 Qari-family Arabic OCR fine-tune for HTML/structured output; keep external until normalized readable text beats QARI/KATIB/Arabic-Qwen/Baseer on the same pages because audiobook generation needs faithful continuous Arabic text more than raw markup. |
| DeepSeek-OCR-2 | https://huggingface.co/deepseek-ai/DeepSeek-OCR-2 | Official Apache-2.0 3B DeepSeek OCR successor with 2026 paper/model-card references and public document OCR benchmark results; keep external because it is not Arabic-specific and needs GPU/large-worker inference. |
| DeepSeek Arabic OCR v6 | https://huggingface.co/melsiddieg/deepseek_ocr_arabic_v6 | Apache-2.0 Arabic-labeled DeepSeek-OCR fine-tune; newer than the v4/v5 Arabic fine-tunes, but keep benchmark-only because the card has sparse Arabic-book evidence and no inference provider deployment. |
| Loay Arabic-OCR-DeepSeek-OCR-2 | https://huggingface.co/loay/Arabic-OCR-DeepSeek-OCR-2 | Apache-2.0 merged DeepSeek-OCR-2 Arabic fine-tune for high-precision OCR and structural layout analysis; keep benchmark-only until it beats QARI/KATIB/Arabic-Qwen/Baseer on the same selected Arabic book pages. |
| Arabic-English handwritten OCR Qwen3-VL | https://huggingface.co/sherif1313/Arabic-English-handwritten-OCR-Qwen3-VL-4B | Apache-2.0 Qwen3-VL-4B handwritten Arabic/English OCR watchlist model; keep external because the card says it is research-oriented and not deployed by inference providers. |
| Arabic-English handwritten OCR v3 | https://huggingface.co/sherif1313/Arabic-English-handwritten-OCR-v3 | Apache-2.0 Qwen2.5-VL 3B-class Arabic/English handwritten OCR watchlist model; keep external for handwriting/manuscript-heavy pages because it is large and not deployed by inference providers. |
| Arabic handwritten OCR 4-bit Qwen2.5-VL | https://huggingface.co/sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v3 | Apache-2.0 4-bit Arabic handwritten OCR checkpoint with about 2.44GB of model assets; keep external as a lighter handwriting/manuscript benchmark when the full handwritten model is too heavy. |
| NAKBA Arabic manuscript line OCR baseline | https://huggingface.co/U4RASD/ar-ms-baseline | NAKBA NLP 2026 Arabic manuscript understanding baseline fine-tuned from Qwen3-VL-8B for line-image transcription; keep external and line-level only until license fit and page-cropping workflow are proven. |
| HAFITH | https://huggingface.co/mdnaseif/hafith | Apache-2.0 historical Arabic manuscript recognition model with Arabic-native tokenization and 5.10% CER claims; keep external and line-level only because the model card says it requires pre-segmented text lines. |
| Glimpse RTL OCR | https://huggingface.co/surfiniaburger/unsloth_finetune_ocr_arabic | Apache-2.0 Arabic/Persian RTL text-line OCR model with 6.97% CER claims on unseen RTL text lines; keep external and line-level only until page-cropping workflow and same-book accuracy are proven. |
| olmOCR Arabic LoRA v2 | https://huggingface.co/hastyle/olmOCR-arabic-lora-v2 | Apache-2.0 Arabic manuscript OCR LoRA for full-page manuscript images; keep external/heavy because it needs the 7B olmOCR base and base license/runtime confirmation. |
| Arabic OCR Qwen2.5-VL GGUF | https://huggingface.co/mo1998/arabic-ocr-qwen2.5-vl | QariOCR-v0.3-trained Arabic/English OCR fine-tune on a Qwen2.5-VL 7B GGUF/Unsloth path; keep external with license confirmation because it is large and not inference-provider deployed. |
| Qwen3-VL Persian/Arabic line OCR | https://huggingface.co/mohajesmaeili/Qwen3-VL-2B-Persian-Arabic-Ocr-v1.0 | Apache-2.0 Qwen3-VL 2B Persian/Arabic OCR watchlist model; keep external unless pages are cropped into text lines because the model card says it was trained on individual lines and is not designed for full-page OCR. |
| DIMI Arabic OCR v2 | https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR-V2 | Apache-2.0 Arabic OCR LoRA fine-tuned from Qwen2.5-VL-7B; strong external benchmark candidate for printed Arabic and diacritics-heavy pages, but too heavy for the normal free worker default. |
| Loay Arabic-OCR-Qwen2.5-VL-7B | https://huggingface.co/loay/Arabic-OCR-Qwen2.5-VL-7B-Vision | Arabic OCR VLM fine-tuned from Qwen2.5-VL-7B for Arabic text in images; keep as a high-capacity external benchmark because 7B-class runtime is too heavy for the normal free family-worker default. |
| AtlasOCR | https://huggingface.co/atlasia/AtlasOCR | First open-source Darija/Moroccan Arabic OCR model; useful only for Darija-specific PDFs and needs license confirmation before production use. |
| NuExtract3 | https://huggingface.co/numind/NuExtract3 | Apache-2.0 4B multilingual document understanding model for OCR, document-to-Markdown, tables, forms, invoices, contracts, and multi-page PDFs; benchmark externally for complex layouts but keep QARI/KATIB/Arabic-Qwen first for Arabic books. |
| Qianfan-OCR | https://huggingface.co/baidu/Qianfan-OCR | Apache-2.0 5B multilingual document-intelligence OCR/VLM; benchmark externally only on large workers because it is not Arabic-book-specific and is too heavy for the default free family-worker path. |
| Chandra OCR 2 | https://github.com/datalab-to/chandra | Recent 4B multilingual OCR/layout model supporting Arabic among 90+ languages, strong for structured Markdown/HTML/JSON extraction, tables, forms, and handwriting; code is Apache-2.0 but weights are modified OpenRAIL-M, so keep it external and benchmark-only. |
| dots.ocr | https://huggingface.co/rednote-hilab/dots.ocr | MIT compact multilingual document parser that unifies layout detection and content recognition, with reading-order, table, and formula support; benchmark externally for difficult Arabic layouts but keep Arabic-trained OCR first for books. |
| Arabic Large Nougat | https://huggingface.co/MohamedRashad/arabic-large-nougat | GPL-3.0 Arabic book-page OCR-to-Markdown model; useful as an external benchmark for structured Arabic book text, but keep out of the default public hosted worker because of GPL licensing and hallucination/context-length caveats. |
| DocTR Arabic FAST/PARSEQ | https://huggingface.co/madskills/doctr-fast_base-arabic and https://huggingface.co/madskills/doctr-parseq-arabic | Apache-2.0 Arabic FAST detector paired with an Arabic PARSEQ recognizer; keep benchmark-only until recognizer licensing and same-page book accuracy are confirmed. |
| Kraken/eScriptorium Arabic script | https://kraken.re/main/index.html and https://escriptorium.eu/about | Free/open-source ATR/OCR workflow for historical and non-Latin scripts. Benchmark externally for historical Arabic print or manuscript-like PDFs with an Arabic-script Kraken model, but keep it out of the default family-worker path because model choice, line segmentation, and per-book training can matter. |
| Kairawan/Qalamus manuscript OCR | https://kairawan.org/ | Free 2026 Arabic and Islamic manuscript transcription service signal; useful as an external manuscript benchmark only because the reusable engine/package license, privacy/API terms, and worker integration path are not established. |
| GLM-OCR Arabic/French documents | https://huggingface.co/maloukafer/GLM-OCR-finetuned-documents | Recent GLM-OCR LoRA fine-tune for Arabic/French administrative and scanned documents; useful for forms or newspapers, not the Arabic-book default. |
| mimoha Arabic OCR | https://huggingface.co/mimoha/ocr | Apache-2.0 Arabic OCR card that says it extracts Arabic text from images, but the public card is sparse, so keep it low-priority and external. |
| Raqim post-OCR correction | https://www.sciencedirect.com/science/article/pii/S187705092600058X | 2026 Arabic OCR correction research using dictionary and LLM correction; useful to track, but not wired because automatic correction can alter exact book or religious wording before TTS. |
| Arabic Legal Documents OCR 1.0 | https://huggingface.co/bakrianoo/arabic-legal-documents-ocr-1.0 | Recent Gemma-licensed Arabic legal/scanned-document OCR VLM; benchmark externally only for legal or form-like PDFs because it is domain-specific and not permissive enough for the default family audiobook stack. |
| Surya | https://github.com/datalab-to/surya | Heavy OCR/layout path to test only on strong workers. |
TTS Sources
| Source | Link | Evidence Used |
|---|---|---|
| SILMA TTS | https://huggingface.co/silma-ai/silma-tts | Best free permissive local Arabic voice baseline in this project: Arabic/English, Fusha/MSA, 150M parameters, Arabic normalization/tashkeel support, MIT code, Apache-2.0 model weights; current model card shows no hosted inference provider, so production quality needs the worker path and worker runtime. |
| SILMA open source Arabic TTS models | https://silma.ai/open-source-arabic-tts-models | Official SILMA page confirming the open-source Arabic/English TTS model, Modern Standard Arabic support, 150M size, accepts text with or without tashkeel, voice cloning, and reported short-text latency. |
| SILMA Hugging Face launch article | https://huggingface.co/blog/silma-ai/opensource-arabic-english-text-to-speech-model | Primary launch article describing SILMA TTS v1 as a 150M Arabic/English model released under Apache-2.0, with Arabic text handling, chunking, normalization, and robustness improvements. |
| SILMA Arabic TTS benchmark | https://silma.ai/arabic-tts-benchmark | Confirms Arabic TTS quality still needs side-by-side listening because standard automatic metrics miss Arabic naturalness details. |
| Habibi-TTS | https://github.com/SWivid/Habibi-TTS | Optional MSA voice comparison path; specialized MSA model is Apache-2.0, while unified/SAU/UAE variants are non-commercial. |
| Habibi-TTS paper | https://arxiv.org/abs/2601.13802 | 2026 open-source Arabic TTS research source for the multi-dialect Habibi family and benchmark. |
| Mishkala Tashkeel | https://huggingface.co/flokymind/mishkala | Apache-2.0 lightweight Arabic diacritization model; track as a pronunciation-preprocessor benchmark, not a default, because automatic tashkeel can change perceived meaning or sound distracting if wrong. |
| Tashkeel-350M | https://huggingface.co/Etherll/Tashkeel-350M | Apache-2.0 350M Arabic diacritization model; benchmark beside Mishkala on the same cleaned TTS sample because better pronunciation must be proven by listening and meaning preservation, not assumed from model size. |
| Mushkil | https://huggingface.co/riotu-lab/mushkil | Apache-2.0 AraT5V2 Arabic diacritization model; keep as another pronunciation-preprocessor benchmark beside Mishkala and Tashkeel-350M because automatic harakat can help pronunciation but must preserve meaning and listening comfort. |
| Thaka KSAA-2026 speech diacritization | https://arxiv.org/abs/2605.25928 and https://www.codabench.org/competitions/11859/ | Late-May 2026 KSAA shared-task winning paper for Arabic speech/text diacritization; track as a research signal only because it describes a CATT-Whisper ensemble and benchmark result, not a simple permissive model to deploy in the PDF-to-audio worker. |
| 3arab-TTS 500M | https://huggingface.co/sherif1313/3arab-TTS-500M-v1 | Apache-2.0 Arabic-only 500M text-to-speech model, updated in late May 2026; benchmark externally against SILMA/Habibi because it is new and not yet proven for long audiobook passages. |
| 3arab-TTS-500M-v1-VoiceDesign | https://huggingface.co/sherif1313/3arab-TTS-500M-v1-VoiceDesign | Apache-2.0 VoiceDesign variant updated June 2026, with selectable voice styles; use the same cleaned Arabic sample for manual listening tests before app wiring. |
| KaniTTS Arabic | https://huggingface.co/nineninesix/kani-tts-400m-ar | Arabic-only 400M TTS model with high-speed claims; Hugging Face metadata currently reports lfm1.0 even though the page text describes Apache-style licensing, so benchmark externally and confirm license fit before app wiring. |
| Emirati VITS Male | https://huggingface.co/vadimbelsky/emirati-vits-male-1.0 | Apache-2.0 bilingual Emirati Arabic/English VITS voice; useful for Gulf dialect comparison, but keep it benchmark-only for MSA books unless same-text listening tests beat SILMA/Habibi. |
| VoxCPM2 | https://huggingface.co/openbmb/VoxCPM2 | Apache-2.0 multilingual TTS with Arabic among 30 supported languages, 2B parameters, and 48 kHz output; track as a strong-worker voice benchmark candidate. |
| VoxCPM paper | https://arxiv.org/abs/2509.24650 | Research background for VoxCPM/VoxCPM2 tokenizer-free multilingual TTS and open Apache-2.0 release. |
| Voxtral TTS | https://huggingface.co/mistralai/Voxtral-4B-TTS-2603 | Open-weight Mistral TTS model with Arabic among 9 supported languages, but the model card lists cc-by-nc-4.0 and 4B GPU-oriented deployment, so keep it personal/non-commercial and external. |
| Voxtral TTS paper | https://arxiv.org/abs/2603.25551 | Research background for Voxtral TTS quality and multilingual voice-cloning claims. |
| MOSS-TTS-Nano | https://github.com/OpenMOSS/MOSS-TTS-Nano | Apache-2.0 multilingual 0.1B TTS model with Arabic support, packaged CLI, and ONNX CPU path; track as a CPU-friendly voice benchmark candidate before wiring into the app. |
| Supertonic 3 | https://huggingface.co/Supertone/supertonic-3 | OpenRAIL model with 99M on-device TTS, Arabic support, and ONNX CPU inference; wired as an optional local benchmark voice, not the Arabic-first default. |
| Kyutai Pocket TTS | https://kyutai.org/tts | Current official page is attractive for CPU real-time TTS, but its listed Pocket TTS languages are English, French, German, Spanish, Portuguese, and Italian, not Arabic, so it is excluded from the Arabic voice candidate list until Arabic support appears. |
| OmniVoice | https://huggingface.co/k2-fsa/OmniVoice | Apache-2.0 0.6B zero-shot TTS with 646-language coverage, Arabic included, high current usage, and published 2026 OmniVoice evidence; benchmark as the priority permissive strong-worker voice after SILMA/Habibi. |
| OmniVoice Arabic LoRA | https://huggingface.co/vivooglobal/omnivoice-lora-ar | Apache-2.0 Arabic LoRA adapter for OmniVoice intended to improve Arabic zero-shot voice cloning; benchmark after base OmniVoice works. |
| Arabic-text-to-speech OmniVoice | https://huggingface.co/bilalRHCH/Arabic-text-to-speech | Apache-2.0 Arabic-labeled OmniVoice packaging with 646-language OmniVoice support and a demo Space signal; keep as a same-sample strong-worker benchmark until it proves long-form MSA audiobook quality against SILMA/Habibi. |
| Lahgtna OmniVoice v2 | https://huggingface.co/oddadmix/lahgtna-omnivoice-v2 | New Arabic-dialect OmniVoice fine-tune with broad dialect tags and diacritics support; benchmark externally for dialectal content and confirm licensing before production wiring. |
| TADA multilingual TTS | https://huggingface.co/HumeAI/tada-3b-ml | Free/open-weight multilingual TTS model under the Llama 3.2 license, with Arabic aligner support and text-acoustic alignment to reduce off-script speech; benchmark externally only after checking license fit because it is a 3B-class strong-worker option, not the practical default. |
| Lahgtna Chatterbox | https://huggingface.co/oddadmix/lahgtna-chatterbox-v1 | MIT Arabic dialect TTS benchmark candidate based on Chatterbox; useful for dialectal speech tests, but the model card notes repetition can occur. |
| NAMAA-Saudi-TTS | https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS | MIT Saudi Arabic Chatterbox Multilingual fine-tune; benchmark only for Saudi/Gulf dialect fit because the card says it targets everyday Saudi speech rather than MSA books. |
| NAMAA-Saudi-TTS-V2 | https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS-V2 | Newer Najdi/Saudi Habibi/F5-TTS voice-cloning fine-tune; do not use as the free default because it is CC-BY-NC-SA-4.0, dialect-specific, reference-audio based, and not deployed by an inference provider. |
| NAMAA-Egyptian-TTS | https://huggingface.co/NAMAA-Space/NAMAA-Egyptian-TTS | MIT Egyptian Arabic Chatterbox Multilingual fine-tune with local/hosted inference examples and live demo; benchmark only for Egyptian/dialectal text because it targets everyday Egyptian speech, not MSA books, and the card notes number/pronunciation limitations. |
| Saudi Chatterbox fine-tune | https://huggingface.co/FatimahEmadEldin/saudi-tts-chatterbox-finetuned | Apache-2.0 Saudi Arabic Chatterbox Multilingual fine-tune; compare externally with NAMAA-Saudi-TTS and Saudi Qwen3-TTS for Gulf dialect material. |
| Saudi TTS | https://huggingface.co/AhmedEladl/saudi-tts | Apache-2.0 high-quality Saudi Arabic dialect TTS candidate; benchmark externally with the same cleaned sample because it is dialect-specific and not a proven MSA audiobook voice. |
| Egyptian Arabic Chatterbox | https://huggingface.co/AliAbdallah/egyptian-arabic-tts-chatterbox | Apache-2.0 Egyptian Arabic Chatterbox fine-tune with 120 hours of clean Egyptian Arabic data; benchmark only for Egyptian/dialectal text because it is single-speaker, GPU-oriented, and may not perform well on non-Egyptian Arabic. |
| NileTTS-XTTS | https://huggingface.co/KickItLikeShika/NileTTS-XTTS | Apache-2.0 Egyptian Arabic XTTS fine-tune from the 2026 NileTTS paper; benchmark only for Egyptian/dialectal content because it is optimized for Egyptian Arabic rather than MSA books. |
| Arabic XTTS-v2 Egyptian fine-tune | https://huggingface.co/Moeeldouma/arabic-tts-xtts-v2 | Recent Arabic XTTS-v2 improvement project with Egyptian speaker fine-tuning and documented same-text comparisons; benchmark only for dialectal content because the XTTS-v2 base uses the Coqui Public Model License and the setup is not the permissive default path. |
| NileTTS paper | https://arxiv.org/abs/2602.15675 | Research source for the NileTTS dataset/model; reports 38 hours of Egyptian Arabic speech and open resources, making it useful as a dialect benchmark but not a general MSA default. |
| Chatterbox-Multilingual | https://github.com/resemble-ai/chatterbox | MIT multilingual TTS/voice-cloning candidate that lists Arabic support; benchmark externally on the same cleaned Arabic sample before wiring. |
| Chatterbox Arabic fine-tune | https://huggingface.co/juliardi/chatterbox-multilingual-finetuned-arabic | MIT Arabic-focused Chatterbox adapter claiming improved Arabic pronunciation, diacritics, MSA support, and common dialect support; benchmark on the same cleaned MSA book passage before considering app wiring. |
| Chatterbox-Multilingual ONNX | https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX | MIT ONNX packaging for Chatterbox-Multilingual with Arabic support; useful as a CPU/ONNX voice benchmark before wiring. |
| tts-arabic-onnx | https://huggingface.co/nipponjo/tts-arabic-onnx | Arabic-only ONNX FastPitch/MixerTTS package with speaker, pace, vocoder, and vowelizer options; benchmark as a compact CPU candidate but confirm overall model/repo licensing before production. |
| Spark-TTS Arabic | https://huggingface.co/azeddinShr/Spark-TTS-Arabic-Complete | Apache-2.0 Spark-TTS Arabic fine-tune on ClArTTS; promising for Classical/MSA tests but requires the Spark-TTS repo, reference workflow, and diacritized input. |
| Sofelia-TTS | https://huggingface.co/hamdallah/Sofelia-TTS | Apache-2.0 Palestinian Arabic TTS/voice-cloning model; useful for dialect tests, not a default MSA audiobook voice. |
| Arabic-F5-TTS-v2 | https://huggingface.co/IbrahimSalah/Arabic-F5-TTS-v2 | Arabic MSA voice candidate that is not a default because it is non-commercial and requires fully diacritized text. |
| Qwen3-TTS 0.6B Base | https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base | Apache-2.0 TTS family, but current official released language list does not include Arabic, so do not promote for this Arabic reader yet. |
| Qwen3-TTS 1.7B Base | https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base | Apache-2.0 larger base model in the same family; still not promoted by itself for Arabic because the official released language list excludes Arabic. |
| Egyptian Arabic Qwen3-TTS | https://huggingface.co/itshamdi404/Egy_Arabic_Qwen3-TTS-12Hz-1.7B-Base | Apache-2.0 Qwen3-TTS fine-tune for Egyptian Arabic; benchmark externally only for Egyptian/dialectal content because it is 1.7B, not inference-provider deployed, and not the MSA default for books. |
| Saudi Arabic Qwen3-TTS | https://huggingface.co/vadimbelsky/qwen3-TTS-KSA | Apache-2.0 Qwen3-TTS fine-tune for Saudi/KSA Arabic; benchmark externally for Saudi/Gulf dialect fit, not as the MSA default. |
| Emirati Qwen3.5-TTS | https://huggingface.co/vadimbelsky/qwen3.5-TTS-Emirati | Apache-2.0 Qwen3-TTS-family fine-tune for Emirati Arabic; benchmark beside Emirati VITS Male when Gulf pronunciation matters. |
| Qwen3-TTS technical report | https://arxiv.org/abs/2601.15621 | Confirms the Qwen3-TTS release and Apache-2.0 status, while the official model card remains the source for released language coverage. |
| MMS Arabic TTS | https://huggingface.co/facebook/mms-tts-ara | Useful hosted fallback for testing, but non-commercial licensing keeps it out of the permissive local default. |
Hosting Sources
| Source | Link | Evidence Used |
|---|---|---|
| Vercel FastAPI deployment | https://vercel.com/docs/frameworks/backend/fastapi | Vercel website shell deployment shape. |
| Vercel environment variables | https://vercel.com/docs/environment-variables | Required deployment configuration path. |
| Vercel Functions limits | https://vercel.com/docs/functions/limitations/ | Python/Node functions are useful for the shell but have finite memory/duration and a 4.5 MB request/response body limit, so 100 MB+ PDFs and generated audio must bypass Vercel Functions. |
| Vercel Blob usage and pricing | https://vercel.com/docs/vercel-blob/usage-and-pricing | Optional permanent hosted audio storage has a Hobby free allowance, but audio downloads consume storage, operation, transfer, and edge-request quota, so it is not the default free path. |
| Hugging Face Docker Spaces | https://huggingface.co/docs/hub/main/en/spaces-sdks-docker | Free Docker worker packaging path. |
| Hugging Face Spaces overview | https://huggingface.co/docs/hub/main/spaces-overview | CPU Basic Spaces are currently free with 2 vCPU, 16 GB RAM, and 50 GB non-persistent disk by default; good for demos/small jobs, but cold starts and ephemeral storage mean audio should be treated as short-lived. |
| Hugging Face Hub storage limits | https://huggingface.co/docs/hub/main/storage-limits | Confirms generous public Hub storage but a bounded private free tier; useful context for not treating generated private audiobook files as unlimited archival storage. |
Practical Conclusion
The current best free practical process is:
- Use PyMuPDF embedded text first.
- Use maximum Arabic OCR for scanned pages.
- Benchmark a representative 5-page sample before a full book.
- Keep QARI-OCR, QARI-OCR 0.4 GGUF, Tawkeed OCR, KATIB, Arabic-Qwen3.5-OCR-v4, aNS Qwen3-VL Arabic OCR v3, Waraqon v3 Arabic OCR HTML Qari, Arabic-GLM-OCR-v2, DeepSeek-OCR-2, DeepSeek Arabic OCR v6, Loay Arabic-OCR-DeepSeek-OCR-2, Arabic-English handwritten OCR Qwen3-VL, Arabic-English handwritten OCR v3, Arabic handwritten OCR 4-bit Qwen2.5-VL, NAKBA Arabic manuscript line OCR baseline, HAFITH, Glimpse RTL OCR, olmOCR Arabic LoRA v2, Arabic OCR Qwen2.5-VL GGUF, Baseer, Ketaba-OCR, Qari-OCR-LoRA, DIMI Arabic OCR v2, Loay Arabic-OCR-Qwen2.5-VL-7B, Arabic Legal Documents OCR 1.0, PaddleOCR-VL, oi-OCR, NuExtract3, Qianfan-OCR, Chandra OCR 2, dots.ocr, Arabic Large Nougat, DocTR Arabic FAST/PARSEQ, Kraken/eScriptorium Arabic script, Kairawan/Qalamus manuscript OCR, GLM-OCR Arabic/French documents, mimoha Arabic OCR, Falcon-OCR, AtlasOCR, and Surya optional for strong workers, external services, or short benchmarks.
- Use SILMA as the first local voice to test.
- Compare Mishkala, Tashkeel-350M, Mushkil, Habibi MSA, 3arab-TTS 500M, KaniTTS Arabic, Emirati VITS Male, Supertonic 3, MOSS-TTS-Nano, OmniVoice/Arabic LoRA, Arabic-text-to-speech OmniVoice, Lahgtna OmniVoice v2, Lahgtna Chatterbox, NAMAA-Saudi-TTS, NAMAA-Egyptian-TTS, Saudi Chatterbox fine-tune, Saudi TTS, Egyptian Arabic Chatterbox, NileTTS-XTTS, Arabic XTTS-v2 Egyptian fine-tune, Chatterbox-Multilingual, Chatterbox Arabic fine-tune, Chatterbox-Multilingual ONNX, tts-arabic-onnx, Spark-TTS Arabic, Sofelia-TTS, Egyptian Arabic Qwen3-TTS, Saudi Arabic Qwen3-TTS, Emirati Qwen3.5-TTS, VoxCPM2, Voxtral TTS, and eSpeak NG when pronunciation or runtime needs change; track Thaka KSAA-2026 speech diacritization as research only until code/weights are released; keep Kyutai Pocket TTS, NAMAA-Saudi-TTS-V2, Arabic-F5-TTS-v2, Arabic XTTS-v2 Egyptian fine-tune, and Voxtral TTS personal/license-review only or excluded until Arabic support/licensing is verified, and keep base Qwen3-TTS out until Arabic support is verified.
- Store generated audio on the worker as short-lived downloadable files by default; use Vercel Blob or object storage only when permanent hosted links are worth the free-tier quota tradeoff.
- Verify a deployed worker with both embedded-text and scanned-OCR smoke tests, usable extracted text, OCR-path proof, and real audio file signatures before treating the hosted system as complete.
Run python scripts\prove_local_readiness.py --refresh-research to save a local readiness report before deployment. Run python scripts\prove_live_deployment.py after the hosted worker is live.