fj11 commited on
Commit
f58140b
·
verified ·
1 Parent(s): 932e928

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -61
README.md CHANGED
@@ -10,66 +10,93 @@ base_model:
10
  - openai/whisper-small
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- # ScreenTalk
17
 
18
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the DataLabX/ScreenTalk-XS dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.375
21
- - Wer: 21.27
22
-
23
- ## Model description
24
-
25
- More information needed
26
-
27
- ## Intended uses & limitations
28
-
29
- More information needed
30
-
31
- ## Training and evaluation data
32
-
33
- More information needed
34
-
35
- ## Training procedure
36
-
37
- ### Training hyperparameters
38
-
39
- The following hyperparameters were used during training:
40
- - learning_rate: 5e-05
41
- - train_batch_size: 8
42
- - eval_batch_size: 8
43
- - seed: 42
44
- - gradient_accumulation_steps: 8
45
- - total_train_batch_size: 64
46
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
- - lr_scheduler_type: linear
48
- - lr_scheduler_warmup_steps: 10
49
- - training_steps: 200
50
-
51
- ### Training results
52
-
53
-
54
- |Step | Training Loss| Validation Loss| Wer|
55
- |:-------------:|:-----:|:----:|:---------------:|
56
- |20 |1.151500 |1.001086 |22.332719|
57
- |40 |0.702400 |0.612483 |26.639884|
58
- |60 |0.364800 |0.417527 |23.004478|
59
- |80 |0.375300 |0.399105 |22.089041|
60
- |100 |0.383800 |0.395203 |22.833246|
61
- |120 |0.335800 |0.383432 |22.589568|
62
- |140 |0.146200 |0.392425 |22.010011|
63
- |160 |0.163600 |0.384719 |21.502898|
64
- |180 |0.158700 |0.377762 |21.364594|
65
- |200 |0.158300 |0.375860 |21.272392|
66
-
67
-
68
-
69
- ### Framework versions
70
-
71
- - PEFT 0.14.0
72
- - Transformers 4.48.3
73
- - Pytorch 2.5.1+cu124
74
- - Datasets 3.3.2
75
- - Tokenizers 0.21.0
 
10
  - openai/whisper-small
11
  ---
12
 
13
+ # **ScreenTalk**
14
+ **Fine-tuned version of `openai/whisper-small` on the `DataLabX/ScreenTalk-XS` dataset**
15
+
16
+ ## **Model Summary**
17
+ ScreenTalk is a fine-tuned version of OpenAI's Whisper-Small model, specifically trained for speech-to-text transcription using the **DataLabX/ScreenTalk-XS** dataset. The model is optimized to improve automatic speech recognition (ASR) performance in its target domain.
18
+
19
+ On the evaluation set, it achieves:
20
+ - **Loss**: `0.375`
21
+ - **Word Error Rate (WER)**: `21.27%`
22
+
23
+ ## **Intended Uses & Limitations**
24
+ ### **Intended Use Cases**
25
+ - **Speech-to-text transcription** for audio in the domain covered by `ScreenTalk-XS`
26
+ - **Automatic subtitling** and **audio content analysis**
27
+ - **Voice-assisted applications** where accurate ASR is needed
28
+
29
+ ### **Limitations**
30
+ - May not generalize well to **out-of-domain** data
31
+ - Performance is dependent on **audio quality** and **background noise**
32
+ - The model is optimized for English (or the target language in `ScreenTalk-XS`)
33
+
34
+ ## **Training and Evaluation Data**
35
+ The model was fine-tuned on the `DataLabX/ScreenTalk-XS` dataset, which contains domain-specific speech recordings. The dataset has been preprocessed and formatted to enhance ASR capabilities in specific contexts.
36
+
37
+ ## **Training Procedure**
38
+ ### **Hyperparameters**
39
+ The model was trained with the following hyperparameters:
40
+
41
+ | Hyperparameter | Value |
42
+ |--------------------------------|-------------|
43
+ | Learning Rate | `5e-05` |
44
+ | Train Batch Size | `8` |
45
+ | Eval Batch Size | `8` |
46
+ | Seed | `42` |
47
+ | Gradient Accumulation Steps | `8` |
48
+ | Total Train Batch Size | `64` |
49
+ | Optimizer | `AdamW` (β1=0.9, β2=0.999, ε=1e-08) |
50
+ | Learning Rate Scheduler | `Linear` |
51
+ | Warmup Steps | `10` |
52
+ | Total Training Steps | `200` |
53
+
54
+ ### **Training Progress**
55
+ The model was trained for **200 steps**, and the WER improved over time:
56
+
57
+ | Step | Training Loss | Validation Loss | WER (%) |
58
+ |------|--------------|----------------|---------|
59
+ | 20 | 1.1515 | 1.0011 | 22.33 |
60
+ | 40 | 0.7024 | 0.6125 | 26.64 |
61
+ | 60 | 0.3648 | 0.4175 | 23.00 |
62
+ | 80 | 0.3753 | 0.3991 | 22.09 |
63
+ | 100 | 0.3838 | 0.3952 | 22.83 |
64
+ | 120 | 0.3358 | 0.3834 | 22.59 |
65
+ | 140 | 0.1462 | 0.3924 | 22.01 |
66
+ | 160 | 0.1636 | 0.3847 | 21.50 |
67
+ | 180 | 0.1587 | 0.3778 | 21.36 |
68
+ | 200 | 0.1583 | 0.3759 | 21.27 |
69
+
70
+ ## **Framework Versions**
71
+ - **PEFT**: `0.14.0`
72
+ - **Transformers**: `4.48.3`
73
+ - **PyTorch**: `2.5.1+cu124`
74
+ - **Datasets**: `3.3.2`
75
+ - **Tokenizers**: `0.21.0`
76
+
77
+ ## **How to Use**
78
+ To load and use this model for inference:
79
+
80
+ ```python
81
+ from transformers import pipeline
82
+
83
+ asr_pipeline = pipeline("automatic-speech-recognition", model="your_hf_username/ScreenTalk")
84
+ audio_file = "path/to/audio.wav"
85
+
86
+ transcription = asr_pipeline(audio_file)
87
+ print(transcription["text"])
88
+ ```
89
+
90
+ ## **Citation**
91
+ If you use this model, please cite:
92
+
93
+ ```java
94
+ @misc{ScreenTalk,
95
+ title={ScreenTalk: A Fine-tuned Whisper-Small Model for Speech Recognition},
96
+ author={Your Name or Organization},
97
+ year={2025},
98
+ url={https://huggingface.co/your_hf_username/ScreenTalk}
99
+ }
100
+ ```
101
 
 
102