Spaces:

Syncre
/

arabic-audio-reader-worker

Running

App Files Files Community

arabic-audio-reader-worker / docs /source-evidence.md

Syncre

Deploy Arabic Audio Reader worker

2e1a095 verified 1 day ago

preview code

raw

history blame contribute delete

31 kB

Source Evidence For The Free Arabic PDF-To-Audio Stack

Last checked: June 7, 2026.

This file records why each source matters to the current recommendation. It is intentionally short so it can be checked quickly before deployment.

See docs/huggingface-model-metadata.md for the latest tracked Hugging Face model ID, license, and last-modified snapshot used by this recommendation.

OCR Sources

Source	Link	Evidence Used
EasyOCR	https://github.com/JaidedAI/EasyOCR	Free local OCR engine with Arabic support; useful fallback for older scans and layouts.
PaddleOCR PP-OCRv5 multilingual recognition	https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.en.md	Documents Arabic-script recognition model support, including `arabic_PP-OCRv5_mobile_rec`.
PaddleOCR OCR pipeline	https://www.paddleocr.ai/main/en/version3.x/pipeline_usage/OCR.html	Confirms the current PaddleOCR OCR pipeline interface used by the sidecar.
PaddleOCR latest docs	https://www.paddleocr.ai/latest/en/index.html	Tracks current PaddleOCR releases and document parser direction.
PP-OCRv5 paper	https://arxiv.org/abs/2603.24373	Supports the lightweight OCR default choice before heavy VLM OCR.
QARI-OCR 0.4 model	https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct	Arabic-specific OCR VLM trained for Islamic books and Arabic manuscripts, Apache-2.0, used as the optional strong-worker Arabic OCR upgrade; current model card shows no hosted inference provider, so it needs a local/worker runtime.
QARI-OCR 0.4 GGUF	https://huggingface.co/marwan-osama/Qari-OCR-0.4.0-VL-4B-Instruct-GGUF	Newer GGUF packaging signal for QARI-OCR 0.4; keep benchmark-only until it matches the wired QARI sidecar on the same Arabic pages and its packaging/license metadata is confirmed.
QARI-OCR v0.3 lighter model	https://huggingface.co/NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct	Smaller configurable QARI fallback for workers that cannot handle the 4B model.
QARI-OCR paper	https://arxiv.org/abs/2506.02295	Research background for QARI's Arabic OCR/document understanding role.
KATIB 0.8B Arabic OCR model	https://huggingface.co/oddadmix/Katib-Qwen3.5-0.8B-0.1	Apache-2.0 Arabic OCR VLM fine-tuned for Arabic printed and handwritten text; wired as the smaller optional Arabic-trained OCR sidecar.
Ketaba-OCR LoRA	https://huggingface.co/HassanB4/Ketaba-OCR-LoRA	Apache-2.0 Arabic manuscript OCR LoRA benchmark candidate; not wired because it needs a separate base VLM plus adapter setup, but worth testing when QARI/KATIB/PaddleOCR struggle.
Qari-OCR-LoRA	https://huggingface.co/HassanB4/Qari-OCR-LoRA	Apache-2.0 experimental QARI-family manuscript OCR LoRA from the NakbaNLP 2026 Arabic manuscript task; keep as a secondary external benchmark after Ketaba because the model card says Ketaba was the primary winning submission.
Tawkeed OCR	https://huggingface.co/tawkeed-sa/tawkeed-ocr	Apache-2.0 Arabic-first OCR model forked from QARI-OCR v0.3 and fine-tuned for Arabic documents, handwriting, and scene text; wired as an optional sidecar for short-sample benchmarks when QARI-OCR 0.4 is too heavy or edge-style Arabic OCR matters.
PaddleOCR-VL-1.6 model	https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.6	Apache-2.0 0.9B document parser model; useful as a strong-worker OCR/layout benchmark after the smaller Arabic-specific stack is tested.
PaddleOCR-VL-1.6 paper	https://arxiv.org/abs/2606.03264	June 2026 research background for the current PaddleOCR-VL-1.6 document parsing upgrade.
oi-OCR	https://huggingface.co/oi-uae/oi-OCR	Apache-2.0 English/Arabic PDF document parser with April 2026 ParseBench claims; keep as an external structured-document benchmark, not the Arabic-book default.
Falcon-OCR	https://huggingface.co/tiiuae/Falcon-OCR	Apache-2.0 compact 300M document OCR VLM; useful watchlist candidate for strong-worker Arabic benchmarks, not default until it beats the Arabic-specific stack.
Falcon Perception paper	https://arxiv.org/abs/2603.27365	Research background for Falcon-OCR's early-fusion document OCR design and public benchmark claims.
Baseer OCR model	https://huggingface.co/AbdoTarek/Baseer-OCR-V1.0	Apache-2.0 Arabic document OCR VLM fine-tuned from Qwen2-VL-2B for complex Arabic legal documents, multi-column layouts, stamps, tables, and handwritten/printed Arabic; wired as an optional strong-worker sidecar.
Baseer OCR paper	https://arxiv.org/abs/2509.18174	Research background for Baseer as a future Arabic OCR comparison.
Arabic-GLM-OCR-v2	https://huggingface.co/sherif1313/Arabic-GLM-OCR-v2	New Apache-2.0 Arabic OCR VLM; wired as an optional sidecar and still benchmarked on short samples before full-book use because claims need independent scoring on the actual book pages.
Arabic-Qwen3.5-OCR-v4	https://huggingface.co/sherif1313/Arabic-Qwen3.5-OCR-v4	Recent Apache-2.0 0.9B Arabic OCR VLM for printed, handwritten, classical, and diacritic-heavy Arabic; wired as an optional sidecar and benchmarked before full-book use.
aNS Qwen3-VL Arabic OCR v3	https://huggingface.co/aNS2024/qwen3-vl-arabic-ocr-v3	Fresh Qwen3-VL-2B Arabic OCR fine-tune; the public card shows no hosted inference provider and sparse OCR/license evidence, so keep it external until it beats QARI/KATIB/Arabic-Qwen/Baseer on the same selected Arabic pages.
Waraqon v3 Arabic OCR HTML Qari	https://huggingface.co/FatimahEmadEldin/Waraqon-v3-Arabic-OCR-HTML-Qari	Apache-2.0 Qari-family Arabic OCR fine-tune for HTML/structured output; keep external until normalized readable text beats QARI/KATIB/Arabic-Qwen/Baseer on the same pages because audiobook generation needs faithful continuous Arabic text more than raw markup.
DeepSeek-OCR-2	https://huggingface.co/deepseek-ai/DeepSeek-OCR-2	Official Apache-2.0 3B DeepSeek OCR successor with 2026 paper/model-card references and public document OCR benchmark results; keep external because it is not Arabic-specific and needs GPU/large-worker inference.
DeepSeek Arabic OCR v6	https://huggingface.co/melsiddieg/deepseek_ocr_arabic_v6	Apache-2.0 Arabic-labeled DeepSeek-OCR fine-tune; newer than the v4/v5 Arabic fine-tunes, but keep benchmark-only because the card has sparse Arabic-book evidence and no inference provider deployment.
Loay Arabic-OCR-DeepSeek-OCR-2	https://huggingface.co/loay/Arabic-OCR-DeepSeek-OCR-2	Apache-2.0 merged DeepSeek-OCR-2 Arabic fine-tune for high-precision OCR and structural layout analysis; keep benchmark-only until it beats QARI/KATIB/Arabic-Qwen/Baseer on the same selected Arabic book pages.
Arabic-English handwritten OCR Qwen3-VL	https://huggingface.co/sherif1313/Arabic-English-handwritten-OCR-Qwen3-VL-4B	Apache-2.0 Qwen3-VL-4B handwritten Arabic/English OCR watchlist model; keep external because the card says it is research-oriented and not deployed by inference providers.
Arabic-English handwritten OCR v3	https://huggingface.co/sherif1313/Arabic-English-handwritten-OCR-v3	Apache-2.0 Qwen2.5-VL 3B-class Arabic/English handwritten OCR watchlist model; keep external for handwriting/manuscript-heavy pages because it is large and not deployed by inference providers.
Arabic handwritten OCR 4-bit Qwen2.5-VL	https://huggingface.co/sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v3	Apache-2.0 4-bit Arabic handwritten OCR checkpoint with about 2.44GB of model assets; keep external as a lighter handwriting/manuscript benchmark when the full handwritten model is too heavy.
NAKBA Arabic manuscript line OCR baseline	https://huggingface.co/U4RASD/ar-ms-baseline	NAKBA NLP 2026 Arabic manuscript understanding baseline fine-tuned from Qwen3-VL-8B for line-image transcription; keep external and line-level only until license fit and page-cropping workflow are proven.
HAFITH	https://huggingface.co/mdnaseif/hafith	Apache-2.0 historical Arabic manuscript recognition model with Arabic-native tokenization and 5.10% CER claims; keep external and line-level only because the model card says it requires pre-segmented text lines.
Glimpse RTL OCR	https://huggingface.co/surfiniaburger/unsloth_finetune_ocr_arabic	Apache-2.0 Arabic/Persian RTL text-line OCR model with 6.97% CER claims on unseen RTL text lines; keep external and line-level only until page-cropping workflow and same-book accuracy are proven.
olmOCR Arabic LoRA v2	https://huggingface.co/hastyle/olmOCR-arabic-lora-v2	Apache-2.0 Arabic manuscript OCR LoRA for full-page manuscript images; keep external/heavy because it needs the 7B olmOCR base and base license/runtime confirmation.
Arabic OCR Qwen2.5-VL GGUF	https://huggingface.co/mo1998/arabic-ocr-qwen2.5-vl	QariOCR-v0.3-trained Arabic/English OCR fine-tune on a Qwen2.5-VL 7B GGUF/Unsloth path; keep external with license confirmation because it is large and not inference-provider deployed.
Qwen3-VL Persian/Arabic line OCR	https://huggingface.co/mohajesmaeili/Qwen3-VL-2B-Persian-Arabic-Ocr-v1.0	Apache-2.0 Qwen3-VL 2B Persian/Arabic OCR watchlist model; keep external unless pages are cropped into text lines because the model card says it was trained on individual lines and is not designed for full-page OCR.
DIMI Arabic OCR v2	https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR-V2	Apache-2.0 Arabic OCR LoRA fine-tuned from Qwen2.5-VL-7B; strong external benchmark candidate for printed Arabic and diacritics-heavy pages, but too heavy for the normal free worker default.
Loay Arabic-OCR-Qwen2.5-VL-7B	https://huggingface.co/loay/Arabic-OCR-Qwen2.5-VL-7B-Vision	Arabic OCR VLM fine-tuned from Qwen2.5-VL-7B for Arabic text in images; keep as a high-capacity external benchmark because 7B-class runtime is too heavy for the normal free family-worker default.
AtlasOCR	https://huggingface.co/atlasia/AtlasOCR	First open-source Darija/Moroccan Arabic OCR model; useful only for Darija-specific PDFs and needs license confirmation before production use.
NuExtract3	https://huggingface.co/numind/NuExtract3	Apache-2.0 4B multilingual document understanding model for OCR, document-to-Markdown, tables, forms, invoices, contracts, and multi-page PDFs; benchmark externally for complex layouts but keep QARI/KATIB/Arabic-Qwen first for Arabic books.
Qianfan-OCR	https://huggingface.co/baidu/Qianfan-OCR	Apache-2.0 5B multilingual document-intelligence OCR/VLM; benchmark externally only on large workers because it is not Arabic-book-specific and is too heavy for the default free family-worker path.
Chandra OCR 2	https://github.com/datalab-to/chandra	Recent 4B multilingual OCR/layout model supporting Arabic among 90+ languages, strong for structured Markdown/HTML/JSON extraction, tables, forms, and handwriting; code is Apache-2.0 but weights are modified OpenRAIL-M, so keep it external and benchmark-only.
dots.ocr	https://huggingface.co/rednote-hilab/dots.ocr	MIT compact multilingual document parser that unifies layout detection and content recognition, with reading-order, table, and formula support; benchmark externally for difficult Arabic layouts but keep Arabic-trained OCR first for books.
Arabic Large Nougat	https://huggingface.co/MohamedRashad/arabic-large-nougat	GPL-3.0 Arabic book-page OCR-to-Markdown model; useful as an external benchmark for structured Arabic book text, but keep out of the default public hosted worker because of GPL licensing and hallucination/context-length caveats.
DocTR Arabic FAST/PARSEQ	https://huggingface.co/madskills/doctr-fast_base-arabic and https://huggingface.co/madskills/doctr-parseq-arabic	Apache-2.0 Arabic FAST detector paired with an Arabic PARSEQ recognizer; keep benchmark-only until recognizer licensing and same-page book accuracy are confirmed.
Kraken/eScriptorium Arabic script	https://kraken.re/main/index.html and https://escriptorium.eu/about	Free/open-source ATR/OCR workflow for historical and non-Latin scripts. Benchmark externally for historical Arabic print or manuscript-like PDFs with an Arabic-script Kraken model, but keep it out of the default family-worker path because model choice, line segmentation, and per-book training can matter.
Kairawan/Qalamus manuscript OCR	https://kairawan.org/	Free 2026 Arabic and Islamic manuscript transcription service signal; useful as an external manuscript benchmark only because the reusable engine/package license, privacy/API terms, and worker integration path are not established.
GLM-OCR Arabic/French documents	https://huggingface.co/maloukafer/GLM-OCR-finetuned-documents	Recent GLM-OCR LoRA fine-tune for Arabic/French administrative and scanned documents; useful for forms or newspapers, not the Arabic-book default.
mimoha Arabic OCR	https://huggingface.co/mimoha/ocr	Apache-2.0 Arabic OCR card that says it extracts Arabic text from images, but the public card is sparse, so keep it low-priority and external.
Raqim post-OCR correction	https://www.sciencedirect.com/science/article/pii/S187705092600058X	2026 Arabic OCR correction research using dictionary and LLM correction; useful to track, but not wired because automatic correction can alter exact book or religious wording before TTS.
Arabic Legal Documents OCR 1.0	https://huggingface.co/bakrianoo/arabic-legal-documents-ocr-1.0	Recent Gemma-licensed Arabic legal/scanned-document OCR VLM; benchmark externally only for legal or form-like PDFs because it is domain-specific and not permissive enough for the default family audiobook stack.
Surya	https://github.com/datalab-to/surya	Heavy OCR/layout path to test only on strong workers.

TTS Sources

Source	Link	Evidence Used
SILMA TTS	https://huggingface.co/silma-ai/silma-tts	Best free permissive local Arabic voice baseline in this project: Arabic/English, Fusha/MSA, 150M parameters, Arabic normalization/tashkeel support, MIT code, Apache-2.0 model weights; current model card shows no hosted inference provider, so production quality needs the worker path and worker runtime.
SILMA open source Arabic TTS models	https://silma.ai/open-source-arabic-tts-models	Official SILMA page confirming the open-source Arabic/English TTS model, Modern Standard Arabic support, 150M size, accepts text with or without tashkeel, voice cloning, and reported short-text latency.
SILMA Hugging Face launch article	https://huggingface.co/blog/silma-ai/opensource-arabic-english-text-to-speech-model	Primary launch article describing SILMA TTS v1 as a 150M Arabic/English model released under Apache-2.0, with Arabic text handling, chunking, normalization, and robustness improvements.
SILMA Arabic TTS benchmark	https://silma.ai/arabic-tts-benchmark	Confirms Arabic TTS quality still needs side-by-side listening because standard automatic metrics miss Arabic naturalness details.
Habibi-TTS	https://github.com/SWivid/Habibi-TTS	Optional MSA voice comparison path; specialized MSA model is Apache-2.0, while unified/SAU/UAE variants are non-commercial.
Habibi-TTS paper	https://arxiv.org/abs/2601.13802	2026 open-source Arabic TTS research source for the multi-dialect Habibi family and benchmark.
Mishkala Tashkeel	https://huggingface.co/flokymind/mishkala	Apache-2.0 lightweight Arabic diacritization model; track as a pronunciation-preprocessor benchmark, not a default, because automatic tashkeel can change perceived meaning or sound distracting if wrong.
Tashkeel-350M	https://huggingface.co/Etherll/Tashkeel-350M	Apache-2.0 350M Arabic diacritization model; benchmark beside Mishkala on the same cleaned TTS sample because better pronunciation must be proven by listening and meaning preservation, not assumed from model size.
Mushkil	https://huggingface.co/riotu-lab/mushkil	Apache-2.0 AraT5V2 Arabic diacritization model; keep as another pronunciation-preprocessor benchmark beside Mishkala and Tashkeel-350M because automatic harakat can help pronunciation but must preserve meaning and listening comfort.
Thaka KSAA-2026 speech diacritization	https://arxiv.org/abs/2605.25928 and https://www.codabench.org/competitions/11859/	Late-May 2026 KSAA shared-task winning paper for Arabic speech/text diacritization; track as a research signal only because it describes a CATT-Whisper ensemble and benchmark result, not a simple permissive model to deploy in the PDF-to-audio worker.
3arab-TTS 500M	https://huggingface.co/sherif1313/3arab-TTS-500M-v1	Apache-2.0 Arabic-only 500M text-to-speech model, updated in late May 2026; benchmark externally against SILMA/Habibi because it is new and not yet proven for long audiobook passages.
3arab-TTS-500M-v1-VoiceDesign	https://huggingface.co/sherif1313/3arab-TTS-500M-v1-VoiceDesign	Apache-2.0 VoiceDesign variant updated June 2026, with selectable voice styles; use the same cleaned Arabic sample for manual listening tests before app wiring.
KaniTTS Arabic	https://huggingface.co/nineninesix/kani-tts-400m-ar	Arabic-only 400M TTS model with high-speed claims; Hugging Face metadata currently reports `lfm1.0` even though the page text describes Apache-style licensing, so benchmark externally and confirm license fit before app wiring.
Emirati VITS Male	https://huggingface.co/vadimbelsky/emirati-vits-male-1.0	Apache-2.0 bilingual Emirati Arabic/English VITS voice; useful for Gulf dialect comparison, but keep it benchmark-only for MSA books unless same-text listening tests beat SILMA/Habibi.
VoxCPM2	https://huggingface.co/openbmb/VoxCPM2	Apache-2.0 multilingual TTS with Arabic among 30 supported languages, 2B parameters, and 48 kHz output; track as a strong-worker voice benchmark candidate.
VoxCPM paper	https://arxiv.org/abs/2509.24650	Research background for VoxCPM/VoxCPM2 tokenizer-free multilingual TTS and open Apache-2.0 release.
Voxtral TTS	https://huggingface.co/mistralai/Voxtral-4B-TTS-2603	Open-weight Mistral TTS model with Arabic among 9 supported languages, but the model card lists `cc-by-nc-4.0` and 4B GPU-oriented deployment, so keep it personal/non-commercial and external.
Voxtral TTS paper	https://arxiv.org/abs/2603.25551	Research background for Voxtral TTS quality and multilingual voice-cloning claims.
MOSS-TTS-Nano	https://github.com/OpenMOSS/MOSS-TTS-Nano	Apache-2.0 multilingual 0.1B TTS model with Arabic support, packaged CLI, and ONNX CPU path; track as a CPU-friendly voice benchmark candidate before wiring into the app.
Supertonic 3	https://huggingface.co/Supertone/supertonic-3	OpenRAIL model with 99M on-device TTS, Arabic support, and ONNX CPU inference; wired as an optional local benchmark voice, not the Arabic-first default.
Kyutai Pocket TTS	https://kyutai.org/tts	Current official page is attractive for CPU real-time TTS, but its listed Pocket TTS languages are English, French, German, Spanish, Portuguese, and Italian, not Arabic, so it is excluded from the Arabic voice candidate list until Arabic support appears.
OmniVoice	https://huggingface.co/k2-fsa/OmniVoice	Apache-2.0 0.6B zero-shot TTS with 646-language coverage, Arabic included, high current usage, and published 2026 OmniVoice evidence; benchmark as the priority permissive strong-worker voice after SILMA/Habibi.
OmniVoice Arabic LoRA	https://huggingface.co/vivooglobal/omnivoice-lora-ar	Apache-2.0 Arabic LoRA adapter for OmniVoice intended to improve Arabic zero-shot voice cloning; benchmark after base OmniVoice works.
Arabic-text-to-speech OmniVoice	https://huggingface.co/bilalRHCH/Arabic-text-to-speech	Apache-2.0 Arabic-labeled OmniVoice packaging with 646-language OmniVoice support and a demo Space signal; keep as a same-sample strong-worker benchmark until it proves long-form MSA audiobook quality against SILMA/Habibi.
Lahgtna OmniVoice v2	https://huggingface.co/oddadmix/lahgtna-omnivoice-v2	New Arabic-dialect OmniVoice fine-tune with broad dialect tags and diacritics support; benchmark externally for dialectal content and confirm licensing before production wiring.
TADA multilingual TTS	https://huggingface.co/HumeAI/tada-3b-ml	Free/open-weight multilingual TTS model under the Llama 3.2 license, with Arabic aligner support and text-acoustic alignment to reduce off-script speech; benchmark externally only after checking license fit because it is a 3B-class strong-worker option, not the practical default.
Lahgtna Chatterbox	https://huggingface.co/oddadmix/lahgtna-chatterbox-v1	MIT Arabic dialect TTS benchmark candidate based on Chatterbox; useful for dialectal speech tests, but the model card notes repetition can occur.
NAMAA-Saudi-TTS	https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS	MIT Saudi Arabic Chatterbox Multilingual fine-tune; benchmark only for Saudi/Gulf dialect fit because the card says it targets everyday Saudi speech rather than MSA books.
NAMAA-Saudi-TTS-V2	https://huggingface.co/NAMAA-Space/NAMAA-Saudi-TTS-V2	Newer Najdi/Saudi Habibi/F5-TTS voice-cloning fine-tune; do not use as the free default because it is CC-BY-NC-SA-4.0, dialect-specific, reference-audio based, and not deployed by an inference provider.
NAMAA-Egyptian-TTS	https://huggingface.co/NAMAA-Space/NAMAA-Egyptian-TTS	MIT Egyptian Arabic Chatterbox Multilingual fine-tune with local/hosted inference examples and live demo; benchmark only for Egyptian/dialectal text because it targets everyday Egyptian speech, not MSA books, and the card notes number/pronunciation limitations.
Saudi Chatterbox fine-tune	https://huggingface.co/FatimahEmadEldin/saudi-tts-chatterbox-finetuned	Apache-2.0 Saudi Arabic Chatterbox Multilingual fine-tune; compare externally with NAMAA-Saudi-TTS and Saudi Qwen3-TTS for Gulf dialect material.
Saudi TTS	https://huggingface.co/AhmedEladl/saudi-tts	Apache-2.0 high-quality Saudi Arabic dialect TTS candidate; benchmark externally with the same cleaned sample because it is dialect-specific and not a proven MSA audiobook voice.
Egyptian Arabic Chatterbox	https://huggingface.co/AliAbdallah/egyptian-arabic-tts-chatterbox	Apache-2.0 Egyptian Arabic Chatterbox fine-tune with 120 hours of clean Egyptian Arabic data; benchmark only for Egyptian/dialectal text because it is single-speaker, GPU-oriented, and may not perform well on non-Egyptian Arabic.
NileTTS-XTTS	https://huggingface.co/KickItLikeShika/NileTTS-XTTS	Apache-2.0 Egyptian Arabic XTTS fine-tune from the 2026 NileTTS paper; benchmark only for Egyptian/dialectal content because it is optimized for Egyptian Arabic rather than MSA books.
Arabic XTTS-v2 Egyptian fine-tune	https://huggingface.co/Moeeldouma/arabic-tts-xtts-v2	Recent Arabic XTTS-v2 improvement project with Egyptian speaker fine-tuning and documented same-text comparisons; benchmark only for dialectal content because the XTTS-v2 base uses the Coqui Public Model License and the setup is not the permissive default path.
NileTTS paper	https://arxiv.org/abs/2602.15675	Research source for the NileTTS dataset/model; reports 38 hours of Egyptian Arabic speech and open resources, making it useful as a dialect benchmark but not a general MSA default.
Chatterbox-Multilingual	https://github.com/resemble-ai/chatterbox	MIT multilingual TTS/voice-cloning candidate that lists Arabic support; benchmark externally on the same cleaned Arabic sample before wiring.
Chatterbox Arabic fine-tune	https://huggingface.co/juliardi/chatterbox-multilingual-finetuned-arabic	MIT Arabic-focused Chatterbox adapter claiming improved Arabic pronunciation, diacritics, MSA support, and common dialect support; benchmark on the same cleaned MSA book passage before considering app wiring.
Chatterbox-Multilingual ONNX	https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX	MIT ONNX packaging for Chatterbox-Multilingual with Arabic support; useful as a CPU/ONNX voice benchmark before wiring.
tts-arabic-onnx	https://huggingface.co/nipponjo/tts-arabic-onnx	Arabic-only ONNX FastPitch/MixerTTS package with speaker, pace, vocoder, and vowelizer options; benchmark as a compact CPU candidate but confirm overall model/repo licensing before production.
Spark-TTS Arabic	https://huggingface.co/azeddinShr/Spark-TTS-Arabic-Complete	Apache-2.0 Spark-TTS Arabic fine-tune on ClArTTS; promising for Classical/MSA tests but requires the Spark-TTS repo, reference workflow, and diacritized input.
Sofelia-TTS	https://huggingface.co/hamdallah/Sofelia-TTS	Apache-2.0 Palestinian Arabic TTS/voice-cloning model; useful for dialect tests, not a default MSA audiobook voice.
Arabic-F5-TTS-v2	https://huggingface.co/IbrahimSalah/Arabic-F5-TTS-v2	Arabic MSA voice candidate that is not a default because it is non-commercial and requires fully diacritized text.
Qwen3-TTS 0.6B Base	https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base	Apache-2.0 TTS family, but current official released language list does not include Arabic, so do not promote for this Arabic reader yet.
Qwen3-TTS 1.7B Base	https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base	Apache-2.0 larger base model in the same family; still not promoted by itself for Arabic because the official released language list excludes Arabic.
Egyptian Arabic Qwen3-TTS	https://huggingface.co/itshamdi404/Egy_Arabic_Qwen3-TTS-12Hz-1.7B-Base	Apache-2.0 Qwen3-TTS fine-tune for Egyptian Arabic; benchmark externally only for Egyptian/dialectal content because it is 1.7B, not inference-provider deployed, and not the MSA default for books.
Saudi Arabic Qwen3-TTS	https://huggingface.co/vadimbelsky/qwen3-TTS-KSA	Apache-2.0 Qwen3-TTS fine-tune for Saudi/KSA Arabic; benchmark externally for Saudi/Gulf dialect fit, not as the MSA default.
Emirati Qwen3.5-TTS	https://huggingface.co/vadimbelsky/qwen3.5-TTS-Emirati	Apache-2.0 Qwen3-TTS-family fine-tune for Emirati Arabic; benchmark beside Emirati VITS Male when Gulf pronunciation matters.
Qwen3-TTS technical report	https://arxiv.org/abs/2601.15621	Confirms the Qwen3-TTS release and Apache-2.0 status, while the official model card remains the source for released language coverage.
MMS Arabic TTS	https://huggingface.co/facebook/mms-tts-ara	Useful hosted fallback for testing, but non-commercial licensing keeps it out of the permissive local default.

Hosting Sources

Source	Link	Evidence Used
Vercel FastAPI deployment	https://vercel.com/docs/frameworks/backend/fastapi	Vercel website shell deployment shape.
Vercel environment variables	https://vercel.com/docs/environment-variables	Required deployment configuration path.
Vercel Functions limits	https://vercel.com/docs/functions/limitations/	Python/Node functions are useful for the shell but have finite memory/duration and a 4.5 MB request/response body limit, so 100 MB+ PDFs and generated audio must bypass Vercel Functions.
Vercel Blob usage and pricing	https://vercel.com/docs/vercel-blob/usage-and-pricing	Optional permanent hosted audio storage has a Hobby free allowance, but audio downloads consume storage, operation, transfer, and edge-request quota, so it is not the default free path.
Hugging Face Docker Spaces	https://huggingface.co/docs/hub/main/en/spaces-sdks-docker	Free Docker worker packaging path.
Hugging Face Spaces overview	https://huggingface.co/docs/hub/main/spaces-overview	CPU Basic Spaces are currently free with 2 vCPU, 16 GB RAM, and 50 GB non-persistent disk by default; good for demos/small jobs, but cold starts and ephemeral storage mean audio should be treated as short-lived.
Hugging Face Hub storage limits	https://huggingface.co/docs/hub/main/storage-limits	Confirms generous public Hub storage but a bounded private free tier; useful context for not treating generated private audiobook files as unlimited archival storage.

Practical Conclusion

The current best free practical process is:

Use PyMuPDF embedded text first.
Use maximum Arabic OCR for scanned pages.
Benchmark a representative 5-page sample before a full book.
Keep QARI-OCR, QARI-OCR 0.4 GGUF, Tawkeed OCR, KATIB, Arabic-Qwen3.5-OCR-v4, aNS Qwen3-VL Arabic OCR v3, Waraqon v3 Arabic OCR HTML Qari, Arabic-GLM-OCR-v2, DeepSeek-OCR-2, DeepSeek Arabic OCR v6, Loay Arabic-OCR-DeepSeek-OCR-2, Arabic-English handwritten OCR Qwen3-VL, Arabic-English handwritten OCR v3, Arabic handwritten OCR 4-bit Qwen2.5-VL, NAKBA Arabic manuscript line OCR baseline, HAFITH, Glimpse RTL OCR, olmOCR Arabic LoRA v2, Arabic OCR Qwen2.5-VL GGUF, Baseer, Ketaba-OCR, Qari-OCR-LoRA, DIMI Arabic OCR v2, Loay Arabic-OCR-Qwen2.5-VL-7B, Arabic Legal Documents OCR 1.0, PaddleOCR-VL, oi-OCR, NuExtract3, Qianfan-OCR, Chandra OCR 2, dots.ocr, Arabic Large Nougat, DocTR Arabic FAST/PARSEQ, Kraken/eScriptorium Arabic script, Kairawan/Qalamus manuscript OCR, GLM-OCR Arabic/French documents, mimoha Arabic OCR, Falcon-OCR, AtlasOCR, and Surya optional for strong workers, external services, or short benchmarks.
Use SILMA as the first local voice to test.
Compare Mishkala, Tashkeel-350M, Mushkil, Habibi MSA, 3arab-TTS 500M, KaniTTS Arabic, Emirati VITS Male, Supertonic 3, MOSS-TTS-Nano, OmniVoice/Arabic LoRA, Arabic-text-to-speech OmniVoice, Lahgtna OmniVoice v2, Lahgtna Chatterbox, NAMAA-Saudi-TTS, NAMAA-Egyptian-TTS, Saudi Chatterbox fine-tune, Saudi TTS, Egyptian Arabic Chatterbox, NileTTS-XTTS, Arabic XTTS-v2 Egyptian fine-tune, Chatterbox-Multilingual, Chatterbox Arabic fine-tune, Chatterbox-Multilingual ONNX, tts-arabic-onnx, Spark-TTS Arabic, Sofelia-TTS, Egyptian Arabic Qwen3-TTS, Saudi Arabic Qwen3-TTS, Emirati Qwen3.5-TTS, VoxCPM2, Voxtral TTS, and eSpeak NG when pronunciation or runtime needs change; track Thaka KSAA-2026 speech diacritization as research only until code/weights are released; keep Kyutai Pocket TTS, NAMAA-Saudi-TTS-V2, Arabic-F5-TTS-v2, Arabic XTTS-v2 Egyptian fine-tune, and Voxtral TTS personal/license-review only or excluded until Arabic support/licensing is verified, and keep base Qwen3-TTS out until Arabic support is verified.
Store generated audio on the worker as short-lived downloadable files by default; use Vercel Blob or object storage only when permanent hosted links are worth the free-tier quota tradeoff.
Verify a deployed worker with both embedded-text and scanned-OCR smoke tests, usable extracted text, OCR-path proof, and real audio file signatures before treating the hosted system as complete.

Run python scripts\prove_local_readiness.py --refresh-research to save a local readiness report before deployment. Run python scripts\prove_live_deployment.py after the hosted worker is live.