cobrayyxx commited on
Commit
c5cd884
·
verified ·
1 Parent(s): c809477

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -6
README.md CHANGED
@@ -5,11 +5,14 @@ base_model: openai/whisper-small
5
  tags:
6
  - generated_from_trainer
7
  metrics:
8
- - bleu
9
  - wer
10
  model-index:
11
  - name: whisper-small-be2en
12
  results: []
 
 
 
 
13
  ---
14
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # whisper-small-be2en
19
 
20
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the None dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.0323
23
  - Bleu: 47.49
@@ -26,15 +29,15 @@ It achieves the following results on the evaluation set:
26
 
27
  ## Model description
28
 
29
- More information needed
30
 
31
- ## Intended uses & limitations
32
 
33
- More information needed
34
 
35
  ## Training and evaluation data
36
 
37
- More information needed
38
 
39
  ## Training procedure
40
 
@@ -58,6 +61,13 @@ The following hyperparameters were used during training:
58
  | 0.0229 | 2.0 | 12410 | 0.0312 | 46.92 | 88.33 | 38.6426 |
59
  | 0.0318 | 3.0 | 18615 | 0.0323 | 47.49 | 88.36 | 38.0952 |
60
 
 
 
 
 
 
 
 
61
 
62
  ### Framework versions
63
 
@@ -65,3 +75,57 @@ The following hyperparameters were used during training:
65
  - Pytorch 2.5.1+cu121
66
  - Datasets 3.4.0
67
  - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - generated_from_trainer
7
  metrics:
 
8
  - wer
9
  model-index:
10
  - name: whisper-small-be2en
11
  results: []
12
+ datasets:
13
+ - kreasof-ai/bigc-bem-eng
14
+ - kreasof-ai/bemba-speech-csikasote
15
+ pipeline_tag: automatic-speech-recognition
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
20
 
21
  # whisper-small-be2en
22
 
23
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the [Big-C Dataset](https://huggingface.co/datasets/kreasof-ai/bem-eng-bigc) and [Bemba-Speech](https://huggingface.co/datasets/kreasof-ai/bemba-speech-csikasote).
24
  It achieves the following results on the evaluation set:
25
  - Loss: 0.0323
26
  - Bleu: 47.49
 
29
 
30
  ## Model description
31
 
32
+ This model is a transcription model for Bemba Audio.
33
 
34
+ ## Intended uses
35
 
36
+ This model was used for the Bemba-to-English translation task as part of the IWSLT 2025 Low-Resource Track.
37
 
38
  ## Training and evaluation data
39
 
40
+ This model was trained using the `train+dev` split from BembaSpeech Dataset and `train+val` split from Big-C Dataset. Meanwhile for evaluation, this model used `test` split from Big-C and BembaSpeech Dataset.
41
 
42
  ## Training procedure
43
 
 
61
  | 0.0229 | 2.0 | 12410 | 0.0312 | 46.92 | 88.33 | 38.6426 |
62
  | 0.0318 | 3.0 | 18615 | 0.0323 | 47.49 | 88.36 | 38.0952 |
63
 
64
+ ### Model Evaluation
65
+ Performance of this model was evaluated using WER on the test split of Big-C dataset.
66
+
67
+ | Finetuned/Baseline | WER |
68
+ | ------------------ | ------ |
69
+ | Baseline | 157.50 |
70
+ | Finetuned | 35.64 |
71
 
72
  ### Framework versions
73
 
 
75
  - Pytorch 2.5.1+cu121
76
  - Datasets 3.4.0
77
  - Tokenizers 0.21.0
78
+
79
+ ## Citation
80
+
81
+ ```
82
+ @misc{radford2022whisper,
83
+ doi = {10.48550/ARXIV.2212.04356},
84
+ url = {https://arxiv.org/abs/2212.04356},
85
+ author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
86
+ title = {Robust Speech Recognition via Large-Scale Weak Supervision},
87
+ publisher = {arXiv},
88
+ year = {2022},
89
+ copyright = {arXiv.org perpetual, non-exclusive license}
90
+ }
91
+
92
+ @inproceedings{sikasote-etal-2023-big,
93
+ title = "{BIG}-{C}: a Multimodal Multi-Purpose Dataset for {B}emba",
94
+ author = "Sikasote, Claytone and
95
+ Mukonde, Eunice and
96
+ Alam, Md Mahfuz Ibn and
97
+ Anastasopoulos, Antonios",
98
+ editor = "Rogers, Anna and
99
+ Boyd-Graber, Jordan and
100
+ Okazaki, Naoaki",
101
+ booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
102
+ month = jul,
103
+ year = "2023",
104
+ address = "Toronto, Canada",
105
+ publisher = "Association for Computational Linguistics",
106
+ url = "https://aclanthology.org/2023.acl-long.115",
107
+ doi = "10.18653/v1/2023.acl-long.115",
108
+ pages = "2062--2078",
109
+ abstract = "We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the {``}traditionally{''} used high-resourced ones. All data and code are publicly available: [\url{https://github.com/csikasote/bigc}](\url{https://github.com/csikasote/bigc}).",
110
+ }
111
+
112
+ @InProceedings{sikasote-anastasopoulos:2022:LREC,
113
+ author = {Sikasote, Claytone and Anastasopoulos, Antonios},
114
+ title = {BembaSpeech: A Speech Recognition Corpus for the Bemba Language},
115
+ booktitle = {Proceedings of the Language Resources and Evaluation Conference},
116
+ month = {June},
117
+ year = {2022},
118
+ address = {Marseille, France},
119
+ publisher = {European Language Resources Association},
120
+ pages = {7277--7283},
121
+ abstract = {We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30\% of the population in Zambia. To assess its usefulness for training and testing ASR systems for Bemba, we explored different approaches; supervised pre-training (training from scratch), cross-lingual transfer learning from a monolingual English pre-trained model using DeepSpeech on the portion of the dataset and fine-tuning large scale self-supervised Wav2Vec2.0 based multilingual pre-trained models on the complete BembaSpeech corpus. From our experiments, the 1 billion XLS-R parameter model gives the best results. The model achieves a word error rate (WER) of 32.91\%, results demonstrating that model capacity significantly improves performance and that multilingual pre-trained models transfers cross-lingual acoustic representation better than monolingual pre-trained English model on the BembaSpeech for the Bemba ASR. Lastly, results also show that the corpus can be used for building ASR systems for Bemba language.},
122
+ url = {https://aclanthology.org/2022.lrec-1.790}
123
+ }
124
+ ```
125
+ # Contact
126
+
127
+ This model was trained by [Hazim](https://huggingface.co/cobrayyxx).
128
+
129
+ # Acknowledgments
130
+
131
+ Huge thanks to [Yasmin Moslem](https://huggingface.co/ymoslem) for her supervision, and [Habibullah Akbar](https://huggingface.co/ChavyvAkvar) the founder of Kreasof-AI, for his leadership and support.