abidanoaman
/

urdu-language-model

Model card Files Files and versions

Urdu 5-gram Language Model

This is a 5-gram language model trained on Urdu text for ASR decoding.

Model Details

Language: Urdu (ur)
Model Type: 5-gram KenLM
Training Data: Combined Urdu ASR datasets
Use Case: Beam search decoding for Urdu ASR

Files

urdu_5gram.bin: Binary n-gram model (KenLM format)
config.json: Model configuration

Usage

from pyctcdecode import build_ctcdecoder
import json

# Load vocabulary (from your processor)
vocab = ["<pad>", "<s>", "</s>", "<unk>", "|", ...] # Your vocab here

# Build decoder
decoder = build_ctcdecoder(
    vocab,
    kenlm_model_path='urdu_5gram.bin',
    alpha=0.5,
    beta=1.5
)

Training Details

N-gram order: 5
Pruning: Minimal (0 0 0 1)
Backend: KenLM

Citation

If you use this model, please cite the original datasets used for training.

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support