devasheeshG
/

whisper_large_v2_fp16_transformers

@@ -5,14 +5,14 @@ tags:
   - pytorch
   - audio
   - speech
-  - automatic-speech-recognition
   - whisper
   - wav2vec2
 model-index:
   - name: whisper_medium_fp16_transformers
     results:
-      - task:
           type: automatic-speech-recognition
           name: Automatic Speech Recognition
         dataset:
@@ -44,7 +44,7 @@ model-index:
             name: Test CER
             description: Character Error Rate
-      - task:
           type: automatic-speech-recognition
           name: Automatic Speech Recognition
         dataset:
@@ -75,8 +75,8 @@ model-index:
             value: 0
             name: Test CER
             description: Character Error Rate
-      - task:
           type: automatic-speech-recognition
           name: Automatic Speech Recognition
         dataset:
@@ -88,23 +88,23 @@ model-index:
             language: hi
         metrics:
           - type: wer
-            value: 0
             name: Test WER
             description: Word Error Rate
           - type: mer
-            value: 0
             name: Test MER
             description: Match Error Rate
           - type: wil
-            value: 0
             name: Test WIL
             description: Word Information Lost
           - type: wip
-            value: 0
             name: Test WIP
             description: Word Information Preserved
           - type: cer
-            value: 0
             name: Test CER
             description: Character Error Rate
@@ -144,7 +144,7 @@ language:
   - da
   - hu
   - ta
-  - 'no'
   - th
   - ur
   - hr
@@ -215,6 +215,7 @@ language:
   - jw
   - su
 ---
 ## Versions:
 - CUDA: 12.1
@@ -242,9 +243,9 @@ language:
   | M1 (CPU)          | -                  | -       | N/A       | N/A         |
   | M1 (GPU -> 'mps') | -                  | -       | N/A       | N/A         |
   - **NOTE: TensorCores are efficient in mixed-precision calculations**
   - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)**
 - Punchuation: True
 ## Model Error Benchmarks:
@@ -257,16 +258,16 @@ language:
 ### Hindi (test.tsv) [Common Voice 14.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_14_0)
-**Test done on RTX 3060 on 2557 Samples**
-|                         | WER | MER | WIL | WIP | CER |
-| ----------------------- | --- | --- | --- | --- | --- |
-| Original_Model (54 min) | -   | -   | -   | -   | -   |
-| This_Model (38 min)     | -   | -   | -   | -   | -   |
 ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-clean)
-**Test done on RTX 3060 on ___ Samples**
 |                | WER | MER | WIL | WIP | CER |
 | -------------- | --- | --- | --- | --- | --- |
@@ -275,7 +276,7 @@ language:
 ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-other)
-**Test done on RTX 3060 on ___ Samples**
 |                | WER | MER | WIL | WIP | CER |
 | -------------- | --- | --- | --- | --- | --- |
@@ -290,7 +291,7 @@ language:
 ## Usage
-A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
 Firstly, clone this repo and place all the files inside a folder.
@@ -312,7 +313,7 @@ from whisper_large_v2_fp16_transformers import Model
 # Initilise the model
 model = Model(
             model_name_or_path='whisper_large_v2_fp16_transformers',
-            cuda_visible_device="0",
             device='cuda',
       )
 ```

   - pytorch
   - audio
   - speech
+  - automatic-speech-recognition
   - whisper
   - wav2vec2
 model-index:
   - name: whisper_medium_fp16_transformers
     results:
+      - task:
           type: automatic-speech-recognition
           name: Automatic Speech Recognition
         dataset:
             name: Test CER
             description: Character Error Rate
+      - task:
           type: automatic-speech-recognition
           name: Automatic Speech Recognition
         dataset:
             value: 0
             name: Test CER
             description: Character Error Rate
+      - task:
           type: automatic-speech-recognition
           name: Automatic Speech Recognition
         dataset:
             language: hi
         metrics:
           - type: wer
+            value: 44.64
             name: Test WER
             description: Word Error Rate
           - type: mer
+            value: 41.69
             name: Test MER
             description: Match Error Rate
           - type: wil
+            value: 59.53
             name: Test WIL
             description: Word Information Lost
           - type: wip
+            value: 40.46
             name: Test WIP
             description: Word Information Preserved
           - type: cer
+            value: 16.80
             name: Test CER
             description: Character Error Rate
   - da
   - hu
   - ta
+  - "no"
   - th
   - ur
   - hr
   - jw
   - su
 ---
 ## Versions:
 - CUDA: 12.1
   | M1 (CPU)          | -                  | -       | N/A       | N/A         |
   | M1 (GPU -> 'mps') | -                  | -       | N/A       | N/A         |
   - **NOTE: TensorCores are efficient in mixed-precision calculations**
   - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)**
 - Punchuation: True
 ## Model Error Benchmarks:
 ### Hindi (test.tsv) [Common Voice 14.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_14_0)
+**Test done on RTX 3060 on 1000 Samples**
+|                         | WER   | MER   | WIL   | WIP   | CER   |
+| ----------------------- | ----- | ----- | ----- | ----- | ----- |
+| Original_Model (30 min) | 43.99 | 41.65 | 59.47 | 40.52 | 16.23 |
+| This_Model (20 min)     | 44.64 | 41.69 | 59.53 | 40.46 | 16.80 |
 ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-clean)
+**Test done on RTX 3060 on \_\_\_ Samples**
 |                | WER | MER | WIL | WIP | CER |
 | -------------- | --- | --- | --- | --- | --- |
 ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-other)
+**Test done on RTX 3060 on \_\_\_ Samples**
 |                | WER | MER | WIL | WIP | CER |
 | -------------- | --- | --- | --- | --- | --- |
 ## Usage
+A file `__init__.py` is contained inside this repo which contains all the code to use this model.
 Firstly, clone this repo and place all the files inside a folder.
 # Initilise the model
 model = Model(
             model_name_or_path='whisper_large_v2_fp16_transformers',
+            cuda_visible_device="0",
             device='cuda',
       )
 ```