Jakir057
/

BRDialect

Automatic Speech Recognition

Bengali

Model card Files Files and versions

xet

Community

Jakir057 commited on Oct 27, 2025

Commit

32b485a

verified ·

1 Parent(s): 3474e4d

Update README.md

Browse files

Files changed (1) hide show

README.md +74 -2

README.md CHANGED Viewed

@@ -18,12 +18,13 @@ BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects </
 📝 <a href="https://arxiv.org/abs/2510.06188"><b>Paper</b></a>, 🖥️ <a href="https://github.com/Jak57/BanglaTalk"><b>Github</b></a>
 </div>
 <!-- APT-Eval is the first and largest dataset to evaluate the AI-text detectors behavior for AI-polished texts.
 It contains almost **15K** text samples, polished by 5 different LLMs, for 6 different domains, with 2 major polishing types. All of these samples initially came from purely human written texts.
 It not only includes AI-polished texts, but also includes fine-grained involvement of AI/LLM.
-It is designed to push the boundary of AI-text detectors, for the scenarios where human uses LLM to minimally polish their own written texts.
-The overview of our dataset is given below --
 | **Polish Type**                           | **GPT-4o** | **Llama3.1-70B** | **Llama3-8B** | **Llama2-7B** | **DeepSeek-V3** | **Total** |
 |-------------------------------------------|------------|------------------|---------------|---------------|-- |-----------|
@@ -32,6 +33,77 @@ The overview of our dataset is given below --
 | **Percentage-based**                      | 2072       | 2048             | 1977          | 1282          | 2078 | 7379      |
 | **Total**                                 | 3224       | 3133             | 3102          | 2026          | 3219 | **15004** | -->
 <!-- ## Load the dataset

 📝 <a href="https://arxiv.org/abs/2510.06188"><b>Paper</b></a>, 🖥️ <a href="https://github.com/Jak57/BanglaTalk"><b>Github</b></a>
 </div>
+**BRDialect** - ASR system is trained on ten regional dialects of Bangladesh using the <a href="https://www.kaggle.com/competitions/ben10">Ben10</a> dataset from Bengali.AI.
 <!-- APT-Eval is the first and largest dataset to evaluate the AI-text detectors behavior for AI-polished texts.
 It contains almost **15K** text samples, polished by 5 different LLMs, for 6 different domains, with 2 major polishing types. All of these samples initially came from purely human written texts.
 It not only includes AI-polished texts, but also includes fine-grained involvement of AI/LLM.
+It is designed to push the boundary of AI-text detectors, for the scenarios where human uses LLM to minimally polish their own written texts. -->
+<!-- The overview of our dataset is given below --
 | **Polish Type**                           | **GPT-4o** | **Llama3.1-70B** | **Llama3-8B** | **Llama2-7B** | **DeepSeek-V3** | **Total** |
 |-------------------------------------------|------------|------------------|---------------|---------------|-- |-----------|
 | **Percentage-based**                      | 2072       | 2048             | 1977          | 1282          | 2078 | 7379      |
 | **Total**                                 | 3224       | 3133             | 3102          | 2026          | 3219 | **15004** | -->
+## Load the model
+**Prerequisite**<br>
+```
+!pip install -U transformers
+!pip install https://github.com/kpu/kenlm/archive/master.zip
+!pip install pyctcdecode
+```
+**Log in to HuggingFace**<br>
+```
+from huggingface_hub import login
+login("TOKEN")
+```
+**Load base model and BRDialect**<br>
+```
+## BRDialect
+from huggingface_hub import hf_hub_download
+kenlm_model_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/5gram_kenlm.arpa")
+state_dict_path = hf_hub_download(repo_id="Jakir057/BRDialect", filename="BRDialect/wav2vec2_bangla_regional_dialect.pth")
+```
+```
+from transformers import AutoProcessor, AutoModelForCTC, Wav2Vec2ProcessorWithLM
+import torch
+import numpy as np
+import pyctcdecode
+import librosa
+base_model_id = "ai4bharat/indicwav2vec_v1_bengali"
+processor = AutoProcessor.from_pretrained(base_model_id)
+model = AutoModelForCTC.from_pretrained(base_model_id)
+model.load_state_dict(torch.load(state_dict_path)["model"])
+vocab_dict = processor.tokenizer.get_vocab()
+sorted_vocab_dict = {k: v for k, v in sorted(vocab_dict.items(), key=lambda item: item[1])}
+decoder = pyctcdecode.build_ctcdecoder(
+    list(sorted_vocab_dict.keys()),
+    str(kenlm_model_path)
+)
+processor_with_lm = Wav2Vec2ProcessorWithLM(
+    feature_extractor=processor.feature_extractor,
+    tokenizer=processor.tokenizer,
+    decoder=decoder
+)
+model.freeze_feature_encoder()
+model.eval()
+```
+## Transcription Generation
+```
+sampling_rate = 16000
+path = "AUDIO_PATH"
+frame, sr = librosa.load(path, sr=sampling_rate, mono=True)
+inputs = processor(
+    frame,
+    sampling_rate=sampling_rate,
+    return_tensors="pt",
+    padding=False
+)
+with torch.no_grad():
+    logits = model(inputs.input_values.to("cpu")).logits
+np_logits = logits.squeeze(0).cpu().numpy()
+result = processor_with_lm.decode(np_logits, beam_width=256)
+text = result.text
+print(f"Transcription={text}")
+```
 <!-- ## Load the dataset