File size: 3,218 Bytes
59d57f0 3474e4d 32b485a 3474e4d abeb08c 32b485a abeb08c 32b485a abeb08c 32b485a abeb08c 32b485a abeb08c 32b485a 3474e4d abeb08c 3474e4d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
license: apache-2.0
language:
- bn
metrics:
- wer
- cer
base_model:
- ai4bharat/indicwav2vec_v1_bengali
pipeline_tag: automatic-speech-recognition
---
<div align="center">
<h1>🚨 BRDialect 🚨
BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects </h1>
📝 <a href="https://arxiv.org/abs/2510.06188"><b>Paper</b></a>, 🖥️ <a href="https://github.com/Jak57/BanglaTalk"><b>Github</b></a>
</div>
**BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.
## Load the BRDialect ASR System
**Prerequisite**<br>
```
!pip install -U transformers
!pip install https://github.com/kpu/kenlm/archive/master.zip
!pip install pyctcdecode
```
**Log in to HuggingFace**<br>
```python
from huggingface_hub import login
login("TOKEN")
```
**Load base model and BRDialect**<br>
```python
## BRDialect
from huggingface_hub import hf_hub_download
kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
```
```python
from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
import torch
import numpy as np
import pyctcdecode
import librosa
base_model_id = "ai4bharat/indicwav2vec_v1_bengali"
processor = AutoProcessor.from_pretrained(base_model_id)
model = AutoModelForCTC.from_pretrained(base_model_id)
model.load_state_dict(torch.load(state_dict_path)["model"])
vocab_dict = processor.tokenizer.get_vocab()
sorted_vocab_dict = {k: v for k, v in sorted(vocab_dict.items(), key=lambda item: item[1])}
decoder = pyctcdecode.build_ctcdecoder(
list(sorted_vocab_dict.keys()),
str(kenlm_model_path)
)
processor_with_lm = Wav2Vec2ProcessorWithLM(
feature_extractor=processor.feature_extractor,
tokenizer=processor.tokenizer,
decoder=decoder
)
model.freeze_feature_encoder()
model.eval()
```
## Transcription Generation
```python
sampling_rate = 16000
path = "AUDIO_PATH"
frame, sr = librosa.load(path, sr=sampling_rate, mono=True)
inputs = processor(
frame,
sampling_rate=sampling_rate,
return_tensors="pt",
padding=False
)
with torch.no_grad():
logits = model(inputs.input_values.to("cpu")).logits
np_logits = logits.squeeze(0).cpu().numpy()
result = processor_with_lm.decode(np_logits, beam_width=256)
text = result.text
print(f"Transcription={text}")
```
## Citation
```
@article{hasan2025banglatalk,
title={BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects},
author={Hasan, Jakir and Dipta, Shubhashis Roy},
journal={arXiv preprint arXiv:2510.06188},
year={2025}
}
@inproceedings{javed2022towards,
title={Towards building asr systems for the next billion users},
author={Javed, Tahir and Doddapaneni, Sumanth and Raman, Abhigyan and Bhogale, Kaushal Santosh and Ramesh, Gowtham and Kunchukuttan, Anoop and Kumar, Pratyush and Khapra, Mitesh M},
booktitle={Proceedings of the aaai conference on artificial intelligence},
volume={36},
number={10},
pages={10813--10821},
year={2022}
}
``` |