diarray commited on
Commit
eb9e638
·
verified ·
1 Parent(s): a2fe85c

Push model using huggingface_hub.

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +145 -0
  3. soloba-tdt-0.6b-v1.5.nemo +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ soloba-tdt-0.6b-v1.5.nemo filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bm
4
+ library_name: nemo
5
+ datasets:
6
+ - RobotsMali/kunkado
7
+
8
+ thumbnail: null
9
+ tags:
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - audio
13
+ - Transducer
14
+ - FastConformer
15
+ - Conformer
16
+ - pytorch
17
+ - Bambara
18
+ - NeMo
19
+ license: cc-by-4.0
20
+ base_model: RobotsMali/soloba-tdt-0.6b-v0.5
21
+ model-index:
22
+ - name: soloba-tdt-0.6b-v1.5
23
+ results:
24
+ - task:
25
+ name: Automatic Speech Recognition
26
+ type: automatic-speech-recognition
27
+ dataset:
28
+ name: Kunkado
29
+ type: RobotsMali/kunkado
30
+ split: test
31
+ args:
32
+ language: bm
33
+ metrics:
34
+ - name: Test WER
35
+ type: wer
36
+ value: 39.7866505648225
37
+ - name: Test CER
38
+ type: cer
39
+ value: 23.216155838453484
40
+ - task:
41
+ name: Automatic Speech Recognition
42
+ type: automatic-speech-recognition
43
+ dataset:
44
+ name: Nyana Eval
45
+ type: RobotsMali/nyana-eval
46
+ split: test
47
+ args:
48
+ language: bm
49
+ metrics:
50
+ - name: Test WER
51
+ type: wer
52
+ value: XX.XXX
53
+ - name: Test CER
54
+ type: cer
55
+ value: YY.YYY
56
+
57
+ metrics:
58
+ - wer
59
+ - cer
60
+ pipeline_tag: automatic-speech-recognition
61
+ ---
62
+
63
+ # Soloba-TDT-600M Series
64
+
65
+ <style>
66
+ img {
67
+ display: inline;
68
+ }
69
+ </style>
70
+
71
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--CTC-blue#model-badge)](#model-architecture)
72
+ | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
73
+ | [![Language](https://img.shields.io/badge/Language-bm-orange#model-badge)](#datasets)
74
+
75
+ `soloba-tdt-0.6b-v1.5` is a fine tuned version of [`RobotsMali/soloba-tdt-0.6b-v0.5`](https://huggingface.co/RobotsMali/soloba-ctc-0.6b-v2) on RobotsMali/kunkado. This model does not consistently produce Capitalizations and Punctuations and it cannot produce acoustic event tags like those found in Kunkado its transcriptions. It was fine-tuned using **NVIDIA NeMo**.
76
+
77
+ ## **🚨 Important Note**
78
+ This model, along with its associated resources, is part of an **ongoing research effort**, improvements and refinements are expected in future versions. Users should be aware that:
79
+
80
+ - **The model may not generalize very well accross all speaking conditions and dialects.**
81
+ - **Community feedback is welcome, and contributions are encouraged to refine the model further.**
82
+
83
+ ## NVIDIA NeMo: Training
84
+
85
+ To fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
86
+
87
+ ```bash
88
+ pip install nemo-toolkit['asr']
89
+ ```
90
+
91
+ ## How to Use This Model
92
+
93
+ Note that this model has been released for research purposes primarily.
94
+
95
+ ### Load Model with NeMo
96
+ ```python
97
+ import nemo.collections.asr as nemo_asr
98
+ asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="RobotsMali/soloba-tdt-0.6b-v1.5")
99
+ ```
100
+
101
+ ### Transcribe Audio
102
+ ```python
103
+ model.eval()
104
+ # Assuming you have a test audio file named sample_audio.wav
105
+ asr_model.transcribe(['sample_audio.wav'])
106
+ ```
107
+
108
+ ### Input
109
+
110
+ This model accepts any **mono-channel audio (wav files)** as input and resamples them to *16 kHz sample rate* before performing the forward pass
111
+
112
+ ### Output
113
+
114
+ This model provides transcribed speech as an hypothesis object with a text attribute containing the transcription string for a given speech sample. (nemo>=2.3)
115
+
116
+ ## Model Architecture
117
+
118
+ This model uses a FastConformer Ecoder and an autoregressive Token-and-Duration Transducer decoder, a variant of RNN-T that predicts jointly learn to predict a token and its duration. FastConformer is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
119
+
120
+
121
+ ## Training
122
+
123
+ The NeMo toolkit was used for finetuning this model for **40,000 steps** over `RobotsMali/soloba-tdt-0.6b-v0.5` model with bacth_size 32. The finetuning codes and configurations can be found at [RobotsMali-AI/bambara-asr](https://github.com/RobotsMali-AI/bambara-asr/).
124
+
125
+ The tokenizer for this model was trained on the text transcripts of the train set of RobotsMali/kunkado using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
126
+
127
+ ## Dataset
128
+ This model was fine-tuned on the [kunkado](https://huggingface.co/datasets/RobotsMali/kunkado) dataset, the human-reviewed subset, which consists of **~40 hours of transcribed Bambara speech data**. The text was normalized with the [bambara-normalizer](https://pypi.org/project/bambara-normalizer/) prior to training, normalizing numbers, removing punctuations and removings tags.
129
+
130
+
131
+ ## Performance
132
+
133
+ We report the Word Error Rate (WER) and Character Error Rate (CER) for this model:
134
+
135
+ | Benchmark | Decoding | WER (%) &darr; | CER (%) &darr; |
136
+ |---------------|----------|-----------------|-----------------|
137
+ | Kunkado | CTC | 39.78 | 23.21 |
138
+ | Nyana Eval | CTC | XX.XX | YY.YY |
139
+
140
+ ## License
141
+ This model is released under the **CC-BY-4.0** license. By using this model, you agree to the terms of the license.
142
+
143
+ ---
144
+
145
+ Feel free to open a discussion on Hugging Face or [file an issue](https://github.com/RobotsMali-AI/bambara-asr/issues) on GitHub for help or contributions.
soloba-tdt-0.6b-v1.5.nemo ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f308039c2d3fb526ffa3f0bf633e0ab542c59be7d626885e793443d07a01da5
3
+ size 2469580800