# 🪶 Katib ASR: State-of-the-Art Pashto Speech Recognition > *Listening to the voices that the AI boom forgot.* Katib ASR is the most capable open-source Automatic Speech Recognition (ASR) model for the Pashto language (پښتو). Built on top of Whisper Large v3 and fine-tuned on the largest curated Pashto speech corpus assembled to date, Katib ASR brings real-time, highly accurate speech-to-text capabilities to millions of Pashto speakers. --- ## 🩸 The Story Behind Katib Building state-of-the-art AI usually takes massive corporate research labs, entire teams of engineers, and unlimited compute. **Katib ASR was built entirely solo.** The generative AI revolution is moving fast, but regional languages are being left behind. While developing voice-activated AI agents for medical clinics in Pakistan, the bottleneck became painfully clear: there was no reliable, high-fidelity transcription for Pashto. Training an ASR model for a low-resource language is a massive grind. It meant hunting down scarce, fragmented audio datasets, writing custom text normalizers to fix broken Arabic-script transcriptions, and maximizing A100 GPU compute to ensure the architecture could handle the complex phonetics of native Pashto speakers. Katib ASR is the result of that struggle — a dedicated, open-source model designed to give Pashto speakers a voice in the digital age. --- --- ## 🏆 Model Architecture & Performance This is not a generic multilingual model. Katib ASR is a **dedicated, purpose-built Pashto ASR system** — the only model of its kind at this scale. | Feature | Detail | |---|---| | 🧠 Base Model | Whisper Large v3 (1.55B parameters) | | 🗣️ Language | Pashto (پښتو) — Afghan & Pakistani dialects | | ⚡ Hardware | NVIDIA A100 80GB | | 🔢 WER | **28.23%** — best published result for open Pashto ASR | ### Evaluation Results Evaluated on a held-out Pashto test set not seen during training: | Metric | Score | |---|---| | Word Error Rate (WER) | **28.23%** | | Evaluation Loss | 0.3011 | > 💡 **For context:** The base `whisper-large-v3` model — with no Pashto fine-tuning — produces largely garbled or Arabic-language output on Pashto audio. Katib ASR delivers coherent, structured transcriptions where the base model fails entirely. --- ## 📚 Datasets & Text Normalization Katib ASR was trained on a multi-source, multi-dialect Pashto speech corpus carefully assembled and preprocessed from: - Common Voice Pashto 24 - FLEURS Pashto - A Custom Curated Pashto Corpus of in-house recordings ### Custom Pashto Text Normalization A key contribution of this model is a dedicated **Pashto text normalization pipeline** applied consistently to both training labels and inference output. It handles script variant inconsistencies across sources: - Arabic Kaf (ك) → Pashto Kaf (ک) - ݢ / گ → Pashto Gaf (ګ) - Arabic Yey / Alef Maqsura variants → Pashto Yey (ی) - All non-Arabic-script noise and punctuation removed This ensures the model produces clean, standardized Pashto script regardless of the source audio's original transcription. --- ## 🚀 Quick Start ### Using the Pipeline (Recommended) ```python from transformers import pipeline asr = pipeline( "automatic-speech-recognition", model="uzair0/Katib-ASR", torch_dtype="auto", device="cuda", chunk_length_s=30, ) result = asr("pashto_audio.wav") print(result["text"]) # Example output: "زه غواړم چی ښار ته لاړ کړم" ``` ### Direct Model Loading ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch processor = WhisperProcessor.from_pretrained("uzair0/Katib-ASR") model = WhisperForConditionalGeneration.from_pretrained( "uzair0/Katib-ASR", torch_dtype=torch.bfloat16 ).to("cuda") model.generation_config.language = "pashto" model.generation_config.task = "transcribe" model.generation_config.forced_decoder_ids = None model.generation_config.suppress_tokens = [] ``` --- ## ⚙️ Training Configuration | Parameter | Value | |---|---| | Base model | whisper-large-v3 | | Precision | bfloat16 + TF32 | | Effective batch size | 128 (64 × 2 grad accumulation) | | Learning rate | 1e-5 (linear schedule) | | Warmup steps | 92 | | Epochs | 3 | | Optimizer | AdamW Fused | | Gradient checkpointing | ✅ Enabled | --- ## 👨‍💻 Author & Citation Built from the ground up by **Muhammad Uzair** at the University of Peshawar. If you use Katib ASR in your research or applications, please consider citing it: ```bibtex @misc{katibasr2026, title = {Katib ASR: State-of-the-Art Pashto Automatic Speech Recognition}, author = {Muhammad Uzair}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/uzair0/Katib-ASR} } ``` --- *Built with ❤️ for the Pashto-speaking world.*