Jakir057
/

BRDialect

Automatic Speech Recognition

Bengali

Model card Files Files and versions

xet

Community

Jakir057 commited on Oct 23, 2025

Commit

3474e4d

verified ·

1 Parent(s): 59d57f0

Update README.md

Browse files

Files changed (1) hide show

README.md +73 -1

README.md CHANGED Viewed

@@ -8,4 +8,76 @@ metrics:
 base_model:
 - ai4bharat/indicwav2vec_v1_bengali
 pipeline_tag: automatic-speech-recognition
----

 base_model:
 - ai4bharat/indicwav2vec_v1_bengali
 pipeline_tag: automatic-speech-recognition
+---
+<div align="center">
+<h1>🚨 BRDialect 🚨
+BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects </h1>
+📝 <a href="https://arxiv.org/abs/2510.06188"><b>Paper</b></a>, 🖥️ <a href="https://github.com/Jak57/BanglaTalk"><b>Github</b></a>
+</div>
+<!-- APT-Eval is the first and largest dataset to evaluate the AI-text detectors behavior for AI-polished texts.
+It contains almost **15K** text samples, polished by 5 different LLMs, for 6 different domains, with 2 major polishing types. All of these samples initially came from purely human written texts.
+It not only includes AI-polished texts, but also includes fine-grained involvement of AI/LLM.
+It is designed to push the boundary of AI-text detectors, for the scenarios where human uses LLM to minimally polish their own written texts.
+The overview of our dataset is given below --
+| **Polish Type**                           | **GPT-4o** | **Llama3.1-70B** | **Llama3-8B** | **Llama2-7B** | **DeepSeek-V3** | **Total** |
+|-------------------------------------------|------------|------------------|---------------|---------------|-- |-----------|
+| **no-polish / pure HWT**                  | -          | -                | -             | -             | - | 300       |
+| **Degree-based**                          | 1152       | 1085             | 1125          | 744           | 1141 | 4406      |
+| **Percentage-based**                      | 2072       | 2048             | 1977          | 1282          | 2078 | 7379      |
+| **Total**                                 | 3224       | 3133             | 3102          | 2026          | 3219 | **15004** | -->
+<!-- ## Load the dataset
+To load the dataset, install the library `datasets` with `pip install datasets`. Then,
+```
+from datasets import load_dataset
+apt_eval_dataset = load_dataset("smksaha/apt-eval")
+```
+If you also want to access the original human written text samples, use this
+```
+from datasets import load_dataset
+dataset = load_dataset("smksaha/apt-eval", data_files={
+    "test": "merged_apt_eval_dataset.csv",
+    "original": "original.csv"
+})
+```  -->
+<!--
+## Data fields
+The RAID dataset has the following fields
+```
+1. `id`: A id that uniquely identifies each sample
+2. `polish_type`: The type of polishing that was used to generate this text sample
+    - Choices: `['degree-based', 'percentage-based']`
+3. `polishing_degree`: The degree of polishing that was used by the polisher to generate this text sample
+    - Choices: `["extreme_minor", "minor", "slight_major", "major"]`
+4. `polishing_percent`: The percetnage of original text was prompted to the polisher to generate this text sample
+    - Choices: `["1", "5", "10", "20", "35", "50", "75"]`
+5. `polisher`: The LLMs were used as polisher
+    - Choices: `["DeepSeek-V3", "GPT-4o", "Llama3.1-70B", "Llama3-8B", "Llama2-7B"]`
+6. `domain`: The genre from where the original human written text was taken
+    - Choices: `['blog', 'email_content', 'game_review', 'news', 'paper_abstract', 'speech']`
+7. `generation`: The text of the generation
+8. `sem_similarity`: The semantic similarity between polished text and original human written text
+9. `levenshtein_distance`: The levenshtein distance between polished text and original human written text
+10. `jaccard_distance`: The jaccard distance between polished text and original human written text
+``` -->
+## Citation
+```
+@article{hasan2025banglatalk,
+  title={BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects},
+  author={Hasan, Jakir and Dipta, Shubhashis Roy},
+  journal={arXiv preprint arXiv:2510.06188},
+  year={2025}
+}
+```