Pengwin30 commited on
Commit
a5aa2dc
Β·
verified Β·
1 Parent(s): ab1ea01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -84
README.md CHANGED
@@ -1,85 +1,88 @@
1
- ---
2
- license: mit
3
- base_model:
4
- - openai/whisper-medium
5
- language: en
6
- tags:
7
- - automatic-speech-recognition
8
- - whisper
9
- - fine-tuning
10
- - speech
11
- model-index:
12
- - name: whisper-medium-finetuned-custom
13
- results:
14
- - task:
15
- type: automatic-speech-recognition
16
- dataset:
17
- name: Custom Audio Dataset
18
- type: audio
19
- metrics:
20
- - name: Word Error Rate
21
- type: wer
22
- value: 0.XX # Replace with actual WER
23
- ---
24
-
25
- # Whisper Medium Fine-Tuned on Custom English Dataset
26
-
27
- This model is a fine-tuned version of OpenAI's [`whisper-medium`](https://huggingface.co/openai/whisper-medium), optimized for transcribing English speech from a custom dataset.
28
-
29
- ## πŸ› οΈ Model Details
30
-
31
- - **Base Model:** openai/whisper-medium
32
- - **Fine-tuned by:** Winardi (Research by Ms. Tong Rong)
33
- - **Language:** English (monolingual)
34
- - **Framework:** PyTorch, Hugging Face Transformers
35
-
36
- ## πŸ“š Training Data
37
-
38
- The model was fine-tuned on a proprietary/custom audio dataset using `metadata(clean1).csv`. Corrupted or low-quality audio files were excluded. The data was split as follows:
39
-
40
- - **Training:** 80%
41
- - **Validation:** 10%
42
- - **Testing:** 10% (used only for evaluation, not during training)
43
-
44
- ## 🎯 Intended Use
45
-
46
- This model is intended for **automatic speech recognition (ASR)** in English, especially for environments similar to the training dataset (e.g., single-speaker, clean audio).
47
-
48
- ## πŸ“‰ Performance
49
-
50
- - **Metric:** Word Error Rate (WER)
51
- - **WER:** `2.07%`
52
- - **WER with Limited Vocalubary:** `3.23%`
53
-
54
- ## 🚫 Limitations
55
-
56
- - Not robust to heavy background noise or overlapping speech
57
- - May not perform well on dialects or accents not represented in training data
58
- - Only supports English input
59
-
60
- ## πŸ’¬ How to Use
61
-
62
- ```python
63
- from transformers import pipeline
64
-
65
- asr = pipeline("automatic-speech-recognition", model="your-username/whisper-medium-finetuned-custom")
66
- result = asr("path/to/audio.wav")
67
- print(result["text"])
68
- ```
69
-
70
- ## πŸ“œ License
71
-
72
- This model is licensed under the **MIT License**.
73
-
74
- ## πŸ™ Citation
75
-
76
- If you use this model in your work, please cite:
77
-
78
- ```
79
- @misc{whisper-finetuned-custom,
80
- author = {Tong Rong, Winardi},
81
- title = {Whisper Medium Fine-Tuned on Custom Dataset},
82
- year = {2025},
83
- url = {https://huggingface.co/your-username/whisper-medium-finetuned-custom}
84
- }
 
 
 
85
  ```
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - openai/whisper-medium
5
+ language: en
6
+ tags:
7
+ - automatic-speech-recognition
8
+ - whisper
9
+ - fine-tuning
10
+ - speech
11
+ model-index:
12
+ - name: Pengwin30/whisper-medium-fine-tuned
13
+ results:
14
+ - task:
15
+ type: automatic-speech-recognition
16
+ dataset:
17
+ name: Custom Audio Dataset
18
+ type: audio
19
+ metrics:
20
+ - name: Word Error Rate
21
+ type: wer
22
+ value: 2.07%
23
+ - name: Word Error Rate With Limited Vocabulary
24
+ type: wer
25
+ value: 3.23%
26
+ ---
27
+
28
+ # Whisper Medium Fine-Tuned on Custom English Dataset
29
+
30
+ This model is a fine-tuned version of OpenAI's [`whisper-medium`](https://huggingface.co/openai/whisper-medium), optimized for transcribing English speech from a custom dataset.
31
+
32
+ ## πŸ› οΈ Model Details
33
+
34
+ - **Base Model:** openai/whisper-medium
35
+ - **Fine-tuned by:** Winardi (Research by Ms. Tong Rong)
36
+ - **Language:** English (monolingual)
37
+ - **Framework:** PyTorch, Hugging Face Transformers
38
+
39
+ ## πŸ“š Training Data
40
+
41
+ The model was fine-tuned on a proprietary/custom audio dataset using `metadata(clean1).csv`. Corrupted or low-quality audio files were excluded. The data was split as follows:
42
+
43
+ - **Training:** 80%
44
+ - **Validation:** 10%
45
+ - **Testing:** 10% (used only for evaluation, not during training)
46
+
47
+ ## 🎯 Intended Use
48
+
49
+ This model is intended for **automatic speech recognition (ASR)** in English, especially for environments similar to the training dataset (e.g., single-speaker, clean audio).
50
+
51
+ ## πŸ“‰ Performance
52
+
53
+ - **Metric:** Word Error Rate (WER)
54
+ - **WER:** `2.07%`
55
+ - **WER with Limited Vocalubary:** `3.23%`
56
+
57
+ ## 🚫 Limitations
58
+
59
+ - Not robust to heavy background noise or overlapping speech
60
+ - May not perform well on dialects or accents not represented in training data
61
+ - Only supports English input
62
+
63
+ ## πŸ’¬ How to Use
64
+
65
+ ```python
66
+ from transformers import pipeline
67
+
68
+ asr = pipeline("automatic-speech-recognition", model="Pengwin30/whisper-medium-fine-tuned")
69
+ result = asr("path/to/audio.wav")
70
+ print(result["text"])
71
+ ```
72
+
73
+ ## πŸ“œ License
74
+
75
+ This model is licensed under the **MIT License**.
76
+
77
+ ## πŸ™ Citation
78
+
79
+ If you use this model in your work, please cite:
80
+
81
+ ```
82
+ @misc{Pengwin30/whisper-medium-fine-tuned,
83
+ author = {Tong Rong, Winardi},
84
+ title = {Whisper Medium Fine-Tuned on Custom Dataset},
85
+ year = {2025},
86
+ url = {https://huggingface.co/Pengwin30/whisper-medium-fine-tuned}
87
+ }
88
  ```