Urdu 5-gram Language Model
This is a 5-gram language model trained on Urdu text for ASR decoding.
Model Details
- Language: Urdu (ur)
- Model Type: 5-gram KenLM
- Training Data: Combined Urdu ASR datasets
- Use Case: Beam search decoding for Urdu ASR
Files
urdu_5gram.bin: Binary n-gram model (KenLM format)config.json: Model configuration
Usage
from pyctcdecode import build_ctcdecoder
import json
# Load vocabulary (from your processor)
vocab = ["<pad>", "<s>", "</s>", "<unk>", "|", ...] # Your vocab here
# Build decoder
decoder = build_ctcdecoder(
vocab,
kenlm_model_path='urdu_5gram.bin',
alpha=0.5,
beta=1.5
)
Training Details
- N-gram order: 5
- Pruning: Minimal (0 0 0 1)
- Backend: KenLM
Citation
If you use this model, please cite the original datasets used for training.
- Downloads last month
- 42
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support