devasheeshG commited on
Commit
06130d7
·
1 Parent(s): a78724f
Files changed (1) hide show
  1. README.md +22 -21
README.md CHANGED
@@ -5,14 +5,14 @@ tags:
5
  - pytorch
6
  - audio
7
  - speech
8
- - automatic-speech-recognition
9
  - whisper
10
  - wav2vec2
11
 
12
  model-index:
13
  - name: whisper_medium_fp16_transformers
14
  results:
15
- - task:
16
  type: automatic-speech-recognition
17
  name: Automatic Speech Recognition
18
  dataset:
@@ -44,7 +44,7 @@ model-index:
44
  name: Test CER
45
  description: Character Error Rate
46
 
47
- - task:
48
  type: automatic-speech-recognition
49
  name: Automatic Speech Recognition
50
  dataset:
@@ -75,8 +75,8 @@ model-index:
75
  value: 0
76
  name: Test CER
77
  description: Character Error Rate
78
-
79
- - task:
80
  type: automatic-speech-recognition
81
  name: Automatic Speech Recognition
82
  dataset:
@@ -88,23 +88,23 @@ model-index:
88
  language: hi
89
  metrics:
90
  - type: wer
91
- value: 0
92
  name: Test WER
93
  description: Word Error Rate
94
  - type: mer
95
- value: 0
96
  name: Test MER
97
  description: Match Error Rate
98
  - type: wil
99
- value: 0
100
  name: Test WIL
101
  description: Word Information Lost
102
  - type: wip
103
- value: 0
104
  name: Test WIP
105
  description: Word Information Preserved
106
  - type: cer
107
- value: 0
108
  name: Test CER
109
  description: Character Error Rate
110
 
@@ -144,7 +144,7 @@ language:
144
  - da
145
  - hu
146
  - ta
147
- - 'no'
148
  - th
149
  - ur
150
  - hr
@@ -215,6 +215,7 @@ language:
215
  - jw
216
  - su
217
  ---
 
218
  ## Versions:
219
 
220
  - CUDA: 12.1
@@ -242,9 +243,9 @@ language:
242
  | M1 (CPU) | - | - | N/A | N/A |
243
  | M1 (GPU -> 'mps') | - | - | N/A | N/A |
244
 
245
-
246
  - **NOTE: TensorCores are efficient in mixed-precision calculations**
247
  - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)**
 
248
  - Punchuation: True
249
 
250
  ## Model Error Benchmarks:
@@ -257,16 +258,16 @@ language:
257
 
258
  ### Hindi (test.tsv) [Common Voice 14.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_14_0)
259
 
260
- **Test done on RTX 3060 on 2557 Samples**
261
 
262
- | | WER | MER | WIL | WIP | CER |
263
- | ----------------------- | --- | --- | --- | --- | --- |
264
- | Original_Model (54 min) | - | - | - | - | - |
265
- | This_Model (38 min) | - | - | - | - | - |
266
 
267
  ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-clean)
268
 
269
- **Test done on RTX 3060 on ___ Samples**
270
 
271
  | | WER | MER | WIL | WIP | CER |
272
  | -------------- | --- | --- | --- | --- | --- |
@@ -275,7 +276,7 @@ language:
275
 
276
  ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-other)
277
 
278
- **Test done on RTX 3060 on ___ Samples**
279
 
280
  | | WER | MER | WIL | WIP | CER |
281
  | -------------- | --- | --- | --- | --- | --- |
@@ -290,7 +291,7 @@ language:
290
 
291
  ## Usage
292
 
293
- A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
294
 
295
  Firstly, clone this repo and place all the files inside a folder.
296
 
@@ -312,7 +313,7 @@ from whisper_large_v2_fp16_transformers import Model
312
  # Initilise the model
313
  model = Model(
314
  model_name_or_path='whisper_large_v2_fp16_transformers',
315
- cuda_visible_device="0",
316
  device='cuda',
317
  )
318
  ```
 
5
  - pytorch
6
  - audio
7
  - speech
8
+ - automatic-speech-recognition
9
  - whisper
10
  - wav2vec2
11
 
12
  model-index:
13
  - name: whisper_medium_fp16_transformers
14
  results:
15
+ - task:
16
  type: automatic-speech-recognition
17
  name: Automatic Speech Recognition
18
  dataset:
 
44
  name: Test CER
45
  description: Character Error Rate
46
 
47
+ - task:
48
  type: automatic-speech-recognition
49
  name: Automatic Speech Recognition
50
  dataset:
 
75
  value: 0
76
  name: Test CER
77
  description: Character Error Rate
78
+
79
+ - task:
80
  type: automatic-speech-recognition
81
  name: Automatic Speech Recognition
82
  dataset:
 
88
  language: hi
89
  metrics:
90
  - type: wer
91
+ value: 44.64
92
  name: Test WER
93
  description: Word Error Rate
94
  - type: mer
95
+ value: 41.69
96
  name: Test MER
97
  description: Match Error Rate
98
  - type: wil
99
+ value: 59.53
100
  name: Test WIL
101
  description: Word Information Lost
102
  - type: wip
103
+ value: 40.46
104
  name: Test WIP
105
  description: Word Information Preserved
106
  - type: cer
107
+ value: 16.80
108
  name: Test CER
109
  description: Character Error Rate
110
 
 
144
  - da
145
  - hu
146
  - ta
147
+ - "no"
148
  - th
149
  - ur
150
  - hr
 
215
  - jw
216
  - su
217
  ---
218
+
219
  ## Versions:
220
 
221
  - CUDA: 12.1
 
243
  | M1 (CPU) | - | - | N/A | N/A |
244
  | M1 (GPU -> 'mps') | - | - | N/A | N/A |
245
 
 
246
  - **NOTE: TensorCores are efficient in mixed-precision calculations**
247
  - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)**
248
+
249
  - Punchuation: True
250
 
251
  ## Model Error Benchmarks:
 
258
 
259
  ### Hindi (test.tsv) [Common Voice 14.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_14_0)
260
 
261
+ **Test done on RTX 3060 on 1000 Samples**
262
 
263
+ | | WER | MER | WIL | WIP | CER |
264
+ | ----------------------- | ----- | ----- | ----- | ----- | ----- |
265
+ | Original_Model (30 min) | 43.99 | 41.65 | 59.47 | 40.52 | 16.23 |
266
+ | This_Model (20 min) | 44.64 | 41.69 | 59.53 | 40.46 | 16.80 |
267
 
268
  ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-clean)
269
 
270
+ **Test done on RTX 3060 on \_\_\_ Samples**
271
 
272
  | | WER | MER | WIL | WIP | CER |
273
  | -------------- | --- | --- | --- | --- | --- |
 
276
 
277
  ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-other)
278
 
279
+ **Test done on RTX 3060 on \_\_\_ Samples**
280
 
281
  | | WER | MER | WIL | WIP | CER |
282
  | -------------- | --- | --- | --- | --- | --- |
 
291
 
292
  ## Usage
293
 
294
+ A file `__init__.py` is contained inside this repo which contains all the code to use this model.
295
 
296
  Firstly, clone this repo and place all the files inside a folder.
297
 
 
313
  # Initilise the model
314
  model = Model(
315
  model_name_or_path='whisper_large_v2_fp16_transformers',
316
+ cuda_visible_device="0",
317
  device='cuda',
318
  )
319
  ```