YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π Voxtral Quran β AI Quran Assistant
Fine-tune Voxtral Mini 3B for Quran ASR & Verse Identification by Ahmed Haytham
An end-to-end Quranic AI assistant that listens to Quran recitation and tells you exactly which verse it is, with full context.
Status: Under construction
β οΈ Status: Under Construction
English:
This model is under active development. Training is currently constrained by limited GPU resources.
Note: The model is still in early stages of training. Progress is currently limited due to insufficient GPU resources. Further training and improvements are planned once more compute becomes available.
Ψ§ΩΨΉΨ±Ψ¨ΩΨ©:
ΩΨ°Ψ§ Ψ§ΩΩΩ
ΩΨ°Ψ¬ ΩΨ§ ΩΨ²Ψ§Ω ΩΩΨ― Ψ§ΩΨͺΨ·ΩΩΨ±. Ψ§ΩΨͺΨ―Ψ±ΩΨ¨ Ψ§ΩΨΨ§ΩΩ Ω
ΨΨ―ΩΨ― Ψ¨Ψ³Ψ¨Ψ¨ ΩΩΨ΅ Ω
ΩΨ§Ψ±Ψ― Ψ§ΩΩGPU.
β¨ Features
- Arabic Quran ASR β Transcribe Quran recitation with diacritics awareness
- Verse Identification β Identify Surah & Ayah from audio
- Rich Context β English translation, Juz number, Meccan/Medinan classification
- Interactive Demo β Gradio web interface with upload/record support
ποΈ Architecture
Audio Input
β
Voxtral ASR (LoRA fine-tuned)
β
Arabic Transcription
β
Fuzzy Matcher (diacritics-aware)
β
Quran Database (6,236 verses)
β
Rich Response (Surah, Ayah, Translation, Juz, Type)
π Quick Start
Option A: Kaggle Notebook (Recommended)
- Upload
kaggle_notebook.pyto Kaggle - Enable T4 GPU (free tier)
- Run all cells (~3-4 hours total)
Option B: Local / Cloud
# 1. Install
pip install -r requirements.txt
# 2. Download Quran metadata
python fetch_quran_data.py
# 3. Train ASR (Phase 1)
python train_quran_asr.py --max_samples 5000
# 4. Evaluate
python inference.py --eval --model ./voxtral-quran-asr
# 5. Launch demo
python gradio_demo.py --model ./voxtral-quran-asr
π Project Structure
vox_quran/
βββ train_quran_asr.py # Phase 1: LoRA ASR fine-tuning
βββ train_quran_qa.py # Phase 2: Instruct-mode QA fine-tuning
βββ build_qa_dataset.py # Build QA training data
βββ inference.py # Inference pipeline + evaluation
βββ quran_metadata.py # Quran database + fuzzy matching
βββ fetch_quran_data.py # Download Quran text/translations
βββ gradio_demo.py # Interactive Gradio demo
βββ kaggle_notebook.py # Self-contained Kaggle notebook
βββ requirements.txt # Dependencies
βββ data/
β βββ quran_data.json # Quran text + metadata (6,236 verses)
βββ README.md
π§ Training Details
| Parameter | Value |
|---|---|
| Base Model | Voxtral Mini 3B (mistralai/Voxtral-Mini-3B-2507) |
| Method | LoRA (r=16, Ξ±=32) |
| Dataset | EveryAyah (tarteel-ai/everyayah) |
| Samples | 5K (scale to 127K+) |
| GPU | T4 16GB (Kaggle free) |
| Epochs | 3 |
| Audio Encoder | Frozen |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
π Two-Phase Training
Phase 1: ASR Fine-tuning
- Pure transcription: Arabic audio β Arabic text
- Uses
apply_transcription_request(language="ar") - Expected WER: 5-15% on Quran recitation
Phase 2: QA Fine-tuning (Optional)
- Instruct mode: audio + question β structured response
- Uses
apply_chat_templatewith conversation format - Model outputs Surah, Ayah, translation directly
π― Example Output
Input: Audio recitation of "ΩΩΩΩ ΩΩΩΩ Ω±ΩΩΩΩΩΩ Ψ£ΩΨΩΨ―Ω"
| Field | Value |
|---|---|
| Surah | Al-Ikhlas β #112 |
| Ayah | 1 |
| English | "Say, He is Allah, One" |
| Juz | 30 |
| Type | Meccan |
| Confidence | 98% |
π Resources
- Model: Voxtral Mini 3B
- Dataset: EveryAyah
- Reference: Finetune-Voxtral-ASR
- Docs: Voxtral HuggingFace Docs
π License
MIT β Built with β€οΈ for the Ummah
- Downloads last month
- -
