--- language: ur tags: - urdu - language-model - n-gram - kenlm --- # Urdu 5-gram Language Model This is a 5-gram language model trained on Urdu text for ASR decoding. ## Model Details - **Language**: Urdu (ur) - **Model Type**: 5-gram KenLM - **Training Data**: Combined Urdu ASR datasets - **Use Case**: Beam search decoding for Urdu ASR ## Files - `urdu_5gram.bin`: Binary n-gram model (KenLM format) - `config.json`: Model configuration ## Usage ```python from pyctcdecode import build_ctcdecoder import json # Load vocabulary (from your processor) vocab = ["", "", "", "", "|", ...] # Your vocab here # Build decoder decoder = build_ctcdecoder( vocab, kenlm_model_path='urdu_5gram.bin', alpha=0.5, beta=1.5 ) ``` ## Training Details - N-gram order: 5 - Pruning: Minimal (0 0 0 1) - Backend: KenLM ## Citation If you use this model, please cite the original datasets used for training.