--- license: apache-2.0 language: - bn metrics: - wer - cer base_model: - ai4bharat/indicwav2vec_v1_bengali pipeline_tag: automatic-speech-recognition ---

🚨 BRDialect 🚨 BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects

📝 Paper, 🖥️ Github
**BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the Ben10 dataset from Bengali.AI. ## Load the BRDialect ASR System **Prerequisite**
``` !pip install -U transformers !pip install https://github.com/kpu/kenlm/archive/master.zip !pip install pyctcdecode ``` **Log in to HuggingFace**
```python from huggingface_hub import login login("TOKEN") ``` **Load base model and BRDialect**
```python ## BRDialect from huggingface_hub import hf_hub_download kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa") state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth") ``` ```python from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM import torch import numpy as np import pyctcdecode import librosa base_model_id = "ai4bharat/indicwav2vec_v1_bengali" processor = AutoProcessor.from_pretrained(base_model_id) model = AutoModelForCTC.from_pretrained(base_model_id) model.load_state_dict(torch.load(state_dict_path)["model"]) vocab_dict = processor.tokenizer.get_vocab() sorted_vocab_dict = {k: v for k, v in sorted(vocab_dict.items(), key=lambda item: item[1])} decoder = pyctcdecode.build_ctcdecoder( list(sorted_vocab_dict.keys()), str(kenlm_model_path) ) processor_with_lm = Wav2Vec2ProcessorWithLM( feature_extractor=processor.feature_extractor, tokenizer=processor.tokenizer, decoder=decoder ) model.freeze_feature_encoder() model.eval() ``` ## Transcription Generation ```python sampling_rate = 16000 path = "AUDIO_PATH" frame, sr = librosa.load(path, sr=sampling_rate, mono=True) inputs = processor( frame, sampling_rate=sampling_rate, return_tensors="pt", padding=False ) with torch.no_grad(): logits = model(inputs.input_values.to("cpu")).logits np_logits = logits.squeeze(0).cpu().numpy() result = processor_with_lm.decode(np_logits, beam_width=256) text = result.text print(f"Transcription={text}") ``` ## Citation ``` @article{hasan2025banglatalk, title={BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects}, author={Hasan, Jakir and Dipta, Shubhashis Roy}, journal={arXiv preprint arXiv:2510.06188}, year={2025} } @inproceedings{javed2022towards, title={Towards building asr systems for the next billion users}, author={Javed, Tahir and Doddapaneni, Sumanth and Raman, Abhigyan and Bhogale, Kaushal Santosh and Ramesh, Gowtham and Kunchukuttan, Anoop and Kumar, Pratyush and Khapra, Mitesh M}, booktitle={Proceedings of the aaai conference on artificial intelligence}, volume={36}, number={10}, pages={10813--10821}, year={2022} } ```