fj11 commited on
Commit
791ed96
·
verified ·
1 Parent(s): 40b9404

End of training

Browse files
README.md CHANGED
@@ -1,102 +1,59 @@
1
  ---
2
- datasets:
3
- - DataLabX/ScreenTalk-XS
4
  language:
5
  - en
6
  license: apache-2.0
7
- metrics:
8
- - wer
9
- base_model:
10
- - openai/whisper-small
 
 
 
 
 
11
  ---
12
 
13
- # **ScreenTalk**
14
- **Fine-tuned version of `openai/whisper-small` on the `DataLabX/ScreenTalk-XS` dataset**
15
-
16
- ## **Model Summary**
17
- ScreenTalk is a fine-tuned version of OpenAI's Whisper-Small model, specifically trained for speech-to-text transcription using the **DataLabX/ScreenTalk-XS** dataset. The model is optimized to improve automatic speech recognition (ASR) performance in its target domain.
18
-
19
- On the evaluation set, it achieves:
20
- - **Loss**: `0.375`
21
- - **Word Error Rate (WER)**: `21.27%`
22
-
23
- ## **Intended Uses & Limitations**
24
- ### **Intended Use Cases**
25
- - **Speech-to-text transcription** for audio in the domain covered by `ScreenTalk-XS`
26
- - **Automatic subtitling** and **audio content analysis**
27
- - **Voice-assisted applications** where accurate ASR is needed
28
-
29
- ### **Limitations**
30
- - May not generalize well to **out-of-domain** data
31
- - Performance is dependent on **audio quality** and **background noise**
32
- - The model is optimized for English (or the target language in `ScreenTalk-XS`)
33
-
34
- ## **Training and Evaluation Data**
35
- The model was fine-tuned on the `DataLabX/ScreenTalk-XS` dataset, which contains domain-specific speech recordings. The dataset has been preprocessed and formatted to enhance ASR capabilities in specific contexts.
36
 
37
- ## **Training Procedure**
38
- ### **Hyperparameters**
39
- The model was trained with the following hyperparameters:
40
 
41
- | Hyperparameter | Value |
42
- |--------------------------------|-------------|
43
- | Learning Rate | `5e-05` |
44
- | Train Batch Size | `8` |
45
- | Eval Batch Size | `8` |
46
- | Seed | `42` |
47
- | Gradient Accumulation Steps | `8` |
48
- | Total Train Batch Size | `64` |
49
- | Optimizer | `AdamW` (β1=0.9, β2=0.999, ε=1e-08) |
50
- | Learning Rate Scheduler | `Linear` |
51
- | Warmup Steps | `10` |
52
- | Total Training Steps | `200` |
53
 
54
- ### **Training Progress**
55
- The model was trained for **200 steps**, and the WER improved over time:
56
 
57
- | Step | Training Loss | Validation Loss | WER (%) |
58
- |------|--------------|----------------|---------|
59
- | 20 | 1.1515 | 1.0011 | 22.33 |
60
- | 40 | 0.7024 | 0.6125 | 26.64 |
61
- | 60 | 0.3648 | 0.4175 | 23.00 |
62
- | 80 | 0.3753 | 0.3991 | 22.09 |
63
- | 100 | 0.3838 | 0.3952 | 22.83 |
64
- | 120 | 0.3358 | 0.3834 | 22.59 |
65
- | 140 | 0.1462 | 0.3924 | 22.01 |
66
- | 160 | 0.1636 | 0.3847 | 21.50 |
67
- | 180 | 0.1587 | 0.3778 | 21.36 |
68
- | 200 | 0.1583 | 0.3759 | 21.27 |
69
 
70
- ## **Framework Versions**
71
- - **PEFT**: `0.14.0`
72
- - **Transformers**: `4.48.3`
73
- - **PyTorch**: `2.5.1+cu124`
74
- - **Datasets**: `3.3.2`
75
- - **Tokenizers**: `0.21.0`
76
 
77
- ## **How to Use**
78
- To load and use this model for inference:
79
 
80
- ```python
81
- from transformers import pipeline
82
 
83
- asr_pipeline = pipeline("automatic-speech-recognition", model="fj11/ScreenTalk-xs")
84
- audio_file = "path/to/audio.wav"
85
 
86
- transcription = asr_pipeline(audio_file)
87
- print(transcription["text"])
88
- ```
89
 
90
- ## **Citation**
91
- If you use this model, please cite:
92
 
93
- ```java
94
- @misc{ScreenTalk-xs,
95
- title={ScreenTalk: A Fine-tuned Whisper-Small Model for Speech Recognition},
96
- author={Your Name or Organization},
97
- year={2025},
98
- url={https://huggingface.co/fj11/ScreenTalk-xs}
99
- }
100
- ```
 
 
 
 
101
 
 
102
 
 
 
 
 
 
 
1
  ---
2
+ library_name: peft
 
3
  language:
4
  - en
5
  license: apache-2.0
6
+ base_model: openai/whisper-small
7
+ tags:
8
+ - hf-asr-leaderboard
9
+ - generated_from_trainer
10
+ datasets:
11
+ - DataLabX/ScreenTalk-XS
12
+ model-index:
13
+ - name: ScreenTalk-xs
14
+ results: []
15
  ---
16
 
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ # ScreenTalk-xs
 
 
21
 
22
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the DataLabX/ScreenTalk-XS dataset.
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ## Model description
 
25
 
26
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ ## Intended uses & limitations
 
 
 
 
 
29
 
30
+ More information needed
 
31
 
32
+ ## Training and evaluation data
 
33
 
34
+ More information needed
 
35
 
36
+ ## Training procedure
 
 
37
 
38
+ ### Training hyperparameters
 
39
 
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 5e-05
42
+ - train_batch_size: 8
43
+ - eval_batch_size: 8
44
+ - seed: 42
45
+ - gradient_accumulation_steps: 4
46
+ - total_train_batch_size: 32
47
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
+ - lr_scheduler_type: linear
49
+ - lr_scheduler_warmup_steps: 10
50
+ - training_steps: 5700
51
+ - mixed_precision_training: Native AMP
52
 
53
+ ### Framework versions
54
 
55
+ - PEFT 0.14.0
56
+ - Transformers 4.47.0
57
+ - Pytorch 2.5.1+cu121
58
+ - Datasets 3.2.0
59
+ - Tokenizers 0.21.0
adapter_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-small",
8
+ "bias": "none",
9
+ "eva_config": null,
10
+ "exclude_modules": null,
11
+ "fan_in_fan_out": false,
12
+ "inference_mode": true,
13
+ "init_lora_weights": true,
14
+ "layer_replication": null,
15
+ "layers_pattern": null,
16
+ "layers_to_transform": null,
17
+ "loftq_config": {},
18
+ "lora_alpha": 64,
19
+ "lora_bias": false,
20
+ "lora_dropout": 0.1,
21
+ "megatron_config": null,
22
+ "megatron_core": "megatron.core",
23
+ "modules_to_save": null,
24
+ "peft_type": "LORA",
25
+ "r": 32,
26
+ "rank_pattern": {},
27
+ "revision": null,
28
+ "target_modules": [
29
+ "v_proj",
30
+ "q_proj"
31
+ ],
32
+ "task_type": null,
33
+ "use_dora": false,
34
+ "use_rslora": false
35
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:726028606b861d502b0807e0611a0ed77905f5a1e69edcfb1f5a4416b2697269
3
+ size 14176064
runs/Mar08_01-33-44_9c4eb6e5c5b2/events.out.tfevents.1741397628.9c4eb6e5c5b2.31.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:186373ba88aa74b25eeb5928124a19b829ba62e42a9cd76acb0c3109049c5b63
3
+ size 19334
runs/Mar08_01-39-29_9c4eb6e5c5b2/events.out.tfevents.1741397972.9c4eb6e5c5b2.31.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e904a20b233e74733b82d4b34dc2bd2064e65212e50de656885241f197c1333b
3
+ size 6809
runs/Mar08_01-40-19_9c4eb6e5c5b2/events.out.tfevents.1741398023.9c4eb6e5c5b2.31.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f373acfa11124692a64f35253c7f046cfaf6d1b35882c7310c388b118607e171
3
+ size 14256
runs/Mar08_01-43-13_9c4eb6e5c5b2/events.out.tfevents.1741398201.9c4eb6e5c5b2.31.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffe64bf2536de8da459d54fdfe4e80dabee46c0d2a8e6f19bbb5d72a719752c8
3
+ size 5876
runs/Mar08_01-44-56_9c4eb6e5c5b2/events.out.tfevents.1741398301.9c4eb6e5c5b2.31.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7d75bb99a62bd5407c556dbe10823ad8b5d5a06e336c64a0ecb154b845f76f4
3
+ size 13323
runs/Mar08_01-48-20_9c4eb6e5c5b2/events.out.tfevents.1741398503.9c4eb6e5c5b2.31.5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18431f07ba1bae4c69b985953411815cccc76fbab28ed45733438817666d947c
3
+ size 5876
runs/Mar08_01-49-07_9c4eb6e5c5b2/events.out.tfevents.1741398551.9c4eb6e5c5b2.31.6 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1a710e4f527f1763a76d9d56b04213ee401d439d7b432a750eabdd8657b5d3c
3
+ size 8780
runs/Mar08_01-51-23_9c4eb6e5c5b2/events.out.tfevents.1741398686.9c4eb6e5c5b2.31.7 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e9a8d76a3de261f0a7b4e83f4278071cbec74174ff95e017e2b237b0c9dc54e
3
+ size 7016
runs/Mar08_01-55-52_9c4eb6e5c5b2/events.out.tfevents.1741398956.9c4eb6e5c5b2.31.8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a498dcb976065059978a94cb0ad1670381ef8a0c5c9e172fbddf9a0085603c63
3
+ size 6290
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ff489e9a35fdcdc3c7dedc4059cd48b01358bbdd11c4e90fd449571bff5ce21
3
  size 5496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:960e2e96a599f9957d3a8376346f9f010edb3fe5f7e6006b846acd86cc2dab9f
3
  size 5496