devasheeshG
/

whisper_medium_fp16_transformers

@@ -4,7 +4,76 @@ pipeline_tag: automatic-speech-recognition
 tags:
   - pytorch
   - audio
   - automatic-speech-recognition
 language:
   - en
   - zh
@@ -117,7 +186,7 @@ language:
 * transformers Version: 4.30.2
 * accelerate Version: 0.20.3
-## BENCHMARK:
 - RAM: 2.8 GB (Original_Model: 5.5GB)
 - VRAM: 1812 MB (Original_Model: 6GB)
@@ -130,17 +199,44 @@ language:
   | 1660 Super        | OOM                  | 3.3     | 1,408     | -           |
   | Collab (Tesla T4) | 2.8                  | 2.2     | 2,560     | 320         |
   | Collab (CPU)      | 35                   | -       | -         | -           |
   - **NOTE: TensorCores are efficient in mixed-precision calculations**
-  - CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
 - Punchuation: True
-## Usage
 A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
 Firstly, clone this repo and place all the files inside a folder.
-# Make sure you have git-lfs installed (https://git-lfs.com)
 ```bash
 git lfs install
 git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers

 tags:
   - pytorch
   - audio
+  - speech
   - automatic-speech-recognition
+  - whisper
+  - wav2vec2
+model-index:
+  - name: whisper_medium_fp16_transformers
+    results:
+      - task:
+        type: automatic-speech-recognition
+        name: Automatic Speech Recognition
+        dataset:
+          type: common_voice
+          name: Common Voice (14.0) (Hindi) (test.tsv -> 2557 samples used)
+          metrics:
+            - type: wer
+              value: 1.7
+              name: Test WER
+              description: Word Error Rate
+            - type: mer
+              value: 1.1
+              name: Test MER
+              description: Match Error Rate
+            - type: wil
+              value: 3,584
+              name: Test WIL
+              description: Word Information Lost
+            - type: wip
+              value: 112
+              name: Test WIP
+              description: Word Information Preserved
+            - type: cer
+              value: 1.7
+              name: Test CER
+              description: Character Error Rate
+      - task:
+        type: automatic-speech-recognition
+        name: Automatic Speech Recognition
+        dataset:
+          type: common_voice
+          name: Common Voice (14.0) (English) (test.tsv -> 2557 samples used)
+          metrics:
+            - type: wer
+              value: -
+              name: Test WER
+              description: Word Error Rate
+            - type: mer
+              value: -
+              name: Test MER
+              description: Match Error Rate
+            - type: wil
+              value: -
+              name: Test WIL
+              description: Word Information Lost
+            - type: wip
+              value: -
+              name: Test WIP
+              description: Word Information Preserved
+            - type: cer
+              value: -
+              name: Test CER
+              description: Character Error Rate
+widget:
+  - example_title: Librispeech sample 1
+    src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
+  - example_title: Librispeech sample 2
+    src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
 language:
   - en
   - zh
 * transformers Version: 4.30.2
 * accelerate Version: 0.20.3
+## Model Benchmarks:
 - RAM: 2.8 GB (Original_Model: 5.5GB)
 - VRAM: 1812 MB (Original_Model: 6GB)
   | 1660 Super        | OOM                  | 3.3     | 1,408     | -           |
   | Collab (Tesla T4) | 2.8                  | 2.2     | 2,560     | 320         |
   | Collab (CPU)      | 35                   | -       | -         | -           |
+  | M1 (CPU)          | -                    | -       | -         | -           |
+  | M1 (GPU -> 'mps') | -                    | -       | -         | -           |
   - **NOTE: TensorCores are efficient in mixed-precision calculations**
+  - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)**
 - Punchuation: True
+## Model Error Benchmarks:
+- **WER: Word Error Rate**
+- **MER: Match Error Rate**
+- **WIL: Word Information Lost**
+- **WIP: Word Information Preserved**
+- **CER: Character Error Rate**
+### Hindi (test.tsv -> 2557 samples used) [Common Voice 14.0](https://commonvoice.mozilla.org/en/datasets)
+  |                   | WER                  | MER     | WIL       | WIP         | CER |
+  | ----------------- | -------------------- | ------- | --------- | ----------- | --- |
+  | Original_Model    | -                    | -       | -         | -           | -   |
+  | This_Model        | -                    | -       | -         | -           | -   |
+### English
+  |                   | WER                  | MER     | WIL       | WIP         | CER |
+  | ----------------- | -------------------- | ------- | --------- | ----------- | --- |
+  | Original_Model    | -                    | -       | -         | -           | -   |
+  | This_Model        | -                    | -       | -         | -           | -   |
+- **'jiwer' library is used for calculations**
+## Code:
+  - ### [$\textbf{Will be soon Uploaded on Github}$ ](https://github.com/devasheeshG)
+## Usage
 A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
 Firstly, clone this repo and place all the files inside a folder.
+### Make sure you have git-lfs installed (https://git-lfs.com)
 ```bash
 git lfs install
 git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers

__init__.py CHANGED Viewed

@@ -1,19 +1,3 @@
-"""
-CUDA: 12.1
-cuDNN Version: 8.9.2.26_1.0-1_amd64
-Tensorflow Version: 2.12.0
-Torch Version: 2.1.0.dev20230606+cu121
-Transformers Version: 4.30.2
-BENCHMARK:
-    - RAM: 2.8 GB
-    - VRAM: 1812 MB
-    - test.wav: 23 s
-        - GPU (3060) -> 1.1s    (TensorCore is used for fp16 inference)
-        - GPU (1660S) -> 3.3s
-        - CPU -> torch.float16 not supported on CPU (Ryzen 5 3600)
-    - Punchuation: True
-"""
 from transformers import (
     WhisperForConditionalGeneration, WhisperProcessor, WhisperConfig
 )

 from transformers import (
     WhisperForConditionalGeneration, WhisperProcessor, WhisperConfig
 )