🕌 Voxtral Quran — AI Quran Assistant

Fine-tune Voxtral Mini 3B for Quran ASR & Verse Identification by Ahmed Haytham

An end-to-end Quranic AI assistant that listens to Quran recitation and tells you exactly which verse it is, with full context.

Status: Under construction

⚠️ Status: Under Construction

English:
This model is under active development. Training is currently constrained by limited GPU resources.

Note: The model is still in early stages of training. Progress is currently limited due to insufficient GPU resources. Further training and improvements are planned once more compute becomes available.

العربية:
هذا النموذج لا يزال قيد التطوير. التدريب الحالي محدود بسبب نقص موارد الـGPU.

✨ Features

Arabic Quran ASR — Transcribe Quran recitation with diacritics awareness
Verse Identification — Identify Surah & Ayah from audio
Rich Context — English translation, Juz number, Meccan/Medinan classification
Interactive Demo — Gradio web interface with upload/record support

🏗️ Architecture

Audio Input
    ↓
Voxtral ASR (LoRA fine-tuned)
    ↓
Arabic Transcription
    ↓
Fuzzy Matcher (diacritics-aware)
    ↓
Quran Database (6,236 verses)
    ↓
Rich Response (Surah, Ayah, Translation, Juz, Type)

🚀 Quick Start

Option A: Kaggle Notebook (Recommended)

Upload kaggle_notebook.py to Kaggle
Enable T4 GPU (free tier)
Run all cells (~3-4 hours total)

Option B: Local / Cloud

# 1. Install
pip install -r requirements.txt

# 2. Download Quran metadata
python fetch_quran_data.py

# 3. Train ASR (Phase 1)
python train_quran_asr.py --max_samples 5000

# 4. Evaluate
python inference.py --eval --model ./voxtral-quran-asr

# 5. Launch demo
python gradio_demo.py --model ./voxtral-quran-asr

📁 Project Structure

vox_quran/
├── train_quran_asr.py      # Phase 1: LoRA ASR fine-tuning
├── train_quran_qa.py        # Phase 2: Instruct-mode QA fine-tuning
├── build_qa_dataset.py      # Build QA training data
├── inference.py             # Inference pipeline + evaluation
├── quran_metadata.py        # Quran database + fuzzy matching
├── fetch_quran_data.py      # Download Quran text/translations
├── gradio_demo.py           # Interactive Gradio demo
├── kaggle_notebook.py       # Self-contained Kaggle notebook
├── requirements.txt         # Dependencies
├── data/
│   └── quran_data.json      # Quran text + metadata (6,236 verses)
└── README.md

🔧 Training Details

Parameter	Value
Base Model	Voxtral Mini 3B (`mistralai/Voxtral-Mini-3B-2507`)
Method	LoRA (r=16, α=32)
Dataset	EveryAyah (`tarteel-ai/everyayah`)
Samples	5K (scale to 127K+)
GPU	T4 16GB (Kaggle free)
Epochs	3
Audio Encoder	Frozen
Target Modules	q_proj, k_proj, v_proj, o_proj

📊 Two-Phase Training

Phase 1: ASR Fine-tuning

Pure transcription: Arabic audio → Arabic text
Uses apply_transcription_request(language="ar")
Expected WER: 5-15% on Quran recitation

Phase 2: QA Fine-tuning (Optional)

Instruct mode: audio + question → structured response
Uses apply_chat_template with conversation format
Model outputs Surah, Ayah, translation directly

🎯 Example Output

Input: Audio recitation of "قُلْ هُوَ ٱللَّهُ أَحَدٌ"

Field	Value
Surah	Al-Ikhlas — #112
Ayah	1
English	"Say, He is Allah, One"
Juz	30
Type	Meccan
Confidence	98%

📚 Resources

Model: Voxtral Mini 3B
Dataset: EveryAyah
Reference: Finetune-Voxtral-ASR
Docs: Voxtral HuggingFace Docs

📜 License

MIT — Built with ❤️ for the Ummah

لازال تحت الانشاء

Downloads last month: -

Safetensors

Model size

5B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support