YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ•Œ Voxtral Quran β€” AI Quran Assistant

Fine-tune Voxtral Mini 3B for Quran ASR & Verse Identification by Ahmed Haytham

An end-to-end Quranic AI assistant that listens to Quran recitation and tells you exactly which verse it is, with full context.

Status: Under construction

⚠️ Status: Under Construction

English:
This model is under active development. Training is currently constrained by limited GPU resources.

Note: The model is still in early stages of training. Progress is currently limited due to insufficient GPU resources. Further training and improvements are planned once more compute becomes available.

Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ©:
Ω‡Ψ°Ψ§ Ψ§Ω„Ω†Ω…ΩˆΨ°Ψ¬ Ω„Ψ§ ΩŠΨ²Ψ§Ω„ Ω‚ΩŠΨ― Ψ§Ω„Ψͺطوير. Ψ§Ω„Ψͺدريب Ψ§Ω„Ψ­Ψ§Ω„ΩŠ Ω…Ψ­Ψ―ΩˆΨ― Ψ¨Ψ³Ψ¨Ψ¨ Ω†Ω‚Ψ΅ Ω…ΩˆΨ§Ψ±Ψ― Ψ§Ω„Ω€GPU.

✨ Features

  • Arabic Quran ASR β€” Transcribe Quran recitation with diacritics awareness
  • Verse Identification β€” Identify Surah & Ayah from audio
  • Rich Context β€” English translation, Juz number, Meccan/Medinan classification
  • Interactive Demo β€” Gradio web interface with upload/record support

πŸ—οΈ Architecture

Audio Input
    ↓
Voxtral ASR (LoRA fine-tuned)
    ↓
Arabic Transcription
    ↓
Fuzzy Matcher (diacritics-aware)
    ↓
Quran Database (6,236 verses)
    ↓
Rich Response (Surah, Ayah, Translation, Juz, Type)

πŸš€ Quick Start

Option A: Kaggle Notebook (Recommended)

  1. Upload kaggle_notebook.py to Kaggle
  2. Enable T4 GPU (free tier)
  3. Run all cells (~3-4 hours total)

Option B: Local / Cloud

# 1. Install
pip install -r requirements.txt

# 2. Download Quran metadata
python fetch_quran_data.py

# 3. Train ASR (Phase 1)
python train_quran_asr.py --max_samples 5000

# 4. Evaluate
python inference.py --eval --model ./voxtral-quran-asr

# 5. Launch demo
python gradio_demo.py --model ./voxtral-quran-asr

πŸ“ Project Structure

vox_quran/
β”œβ”€β”€ train_quran_asr.py      # Phase 1: LoRA ASR fine-tuning
β”œβ”€β”€ train_quran_qa.py        # Phase 2: Instruct-mode QA fine-tuning
β”œβ”€β”€ build_qa_dataset.py      # Build QA training data
β”œβ”€β”€ inference.py             # Inference pipeline + evaluation
β”œβ”€β”€ quran_metadata.py        # Quran database + fuzzy matching
β”œβ”€β”€ fetch_quran_data.py      # Download Quran text/translations
β”œβ”€β”€ gradio_demo.py           # Interactive Gradio demo
β”œβ”€β”€ kaggle_notebook.py       # Self-contained Kaggle notebook
β”œβ”€β”€ requirements.txt         # Dependencies
β”œβ”€β”€ data/
β”‚   └── quran_data.json      # Quran text + metadata (6,236 verses)
└── README.md

πŸ”§ Training Details

Parameter Value
Base Model Voxtral Mini 3B (mistralai/Voxtral-Mini-3B-2507)
Method LoRA (r=16, Ξ±=32)
Dataset EveryAyah (tarteel-ai/everyayah)
Samples 5K (scale to 127K+)
GPU T4 16GB (Kaggle free)
Epochs 3
Audio Encoder Frozen
Target Modules q_proj, k_proj, v_proj, o_proj

πŸ“Š Two-Phase Training

Phase 1: ASR Fine-tuning

  • Pure transcription: Arabic audio β†’ Arabic text
  • Uses apply_transcription_request(language="ar")
  • Expected WER: 5-15% on Quran recitation

Phase 2: QA Fine-tuning (Optional)

  • Instruct mode: audio + question β†’ structured response
  • Uses apply_chat_template with conversation format
  • Model outputs Surah, Ayah, translation directly

🎯 Example Output

Input: Audio recitation of "قُلْ Ω‡ΩΩˆΩŽ Ω±Ω„Ω„ΩŽΩ‘Ω‡Ω أَحَدٌ"

Field Value
Surah Al-Ikhlas β€” #112
Ayah 1
English "Say, He is Allah, One"
Juz 30
Type Meccan
Confidence 98%

πŸ“š Resources

πŸ“œ License

MIT β€” Built with ❀️ for the Ummah

image Ω„Ψ§Ψ²Ψ§Ω„ ΨͺΨ­Ψͺ Ψ§Ω„Ψ§Ω†Ψ΄Ψ§Ψ‘

Downloads last month
-
Safetensors
Model size
5B params
Tensor type
F32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support