Update README.md
Browse files
README.md
CHANGED
|
@@ -10,66 +10,93 @@ base_model:
|
|
| 10 |
- openai/whisper-small
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
# ScreenTalk
|
| 17 |
|
| 18 |
-
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the DataLabX/ScreenTalk-XS dataset.
|
| 19 |
-
It achieves the following results on the evaluation set:
|
| 20 |
-
- Loss: 0.375
|
| 21 |
-
- Wer: 21.27
|
| 22 |
-
|
| 23 |
-
## Model description
|
| 24 |
-
|
| 25 |
-
More information needed
|
| 26 |
-
|
| 27 |
-
## Intended uses & limitations
|
| 28 |
-
|
| 29 |
-
More information needed
|
| 30 |
-
|
| 31 |
-
## Training and evaluation data
|
| 32 |
-
|
| 33 |
-
More information needed
|
| 34 |
-
|
| 35 |
-
## Training procedure
|
| 36 |
-
|
| 37 |
-
### Training hyperparameters
|
| 38 |
-
|
| 39 |
-
The following hyperparameters were used during training:
|
| 40 |
-
- learning_rate: 5e-05
|
| 41 |
-
- train_batch_size: 8
|
| 42 |
-
- eval_batch_size: 8
|
| 43 |
-
- seed: 42
|
| 44 |
-
- gradient_accumulation_steps: 8
|
| 45 |
-
- total_train_batch_size: 64
|
| 46 |
-
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 47 |
-
- lr_scheduler_type: linear
|
| 48 |
-
- lr_scheduler_warmup_steps: 10
|
| 49 |
-
- training_steps: 200
|
| 50 |
-
|
| 51 |
-
### Training results
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|Step | Training Loss| Validation Loss| Wer|
|
| 55 |
-
|:-------------:|:-----:|:----:|:---------------:|
|
| 56 |
-
|20 |1.151500 |1.001086 |22.332719|
|
| 57 |
-
|40 |0.702400 |0.612483 |26.639884|
|
| 58 |
-
|60 |0.364800 |0.417527 |23.004478|
|
| 59 |
-
|80 |0.375300 |0.399105 |22.089041|
|
| 60 |
-
|100 |0.383800 |0.395203 |22.833246|
|
| 61 |
-
|120 |0.335800 |0.383432 |22.589568|
|
| 62 |
-
|140 |0.146200 |0.392425 |22.010011|
|
| 63 |
-
|160 |0.163600 |0.384719 |21.502898|
|
| 64 |
-
|180 |0.158700 |0.377762 |21.364594|
|
| 65 |
-
|200 |0.158300 |0.375860 |21.272392|
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
### Framework versions
|
| 70 |
-
|
| 71 |
-
- PEFT 0.14.0
|
| 72 |
-
- Transformers 4.48.3
|
| 73 |
-
- Pytorch 2.5.1+cu124
|
| 74 |
-
- Datasets 3.3.2
|
| 75 |
-
- Tokenizers 0.21.0
|
|
|
|
| 10 |
- openai/whisper-small
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# **ScreenTalk**
|
| 14 |
+
**Fine-tuned version of `openai/whisper-small` on the `DataLabX/ScreenTalk-XS` dataset**
|
| 15 |
+
|
| 16 |
+
## **Model Summary**
|
| 17 |
+
ScreenTalk is a fine-tuned version of OpenAI's Whisper-Small model, specifically trained for speech-to-text transcription using the **DataLabX/ScreenTalk-XS** dataset. The model is optimized to improve automatic speech recognition (ASR) performance in its target domain.
|
| 18 |
+
|
| 19 |
+
On the evaluation set, it achieves:
|
| 20 |
+
- **Loss**: `0.375`
|
| 21 |
+
- **Word Error Rate (WER)**: `21.27%`
|
| 22 |
+
|
| 23 |
+
## **Intended Uses & Limitations**
|
| 24 |
+
### **Intended Use Cases**
|
| 25 |
+
- **Speech-to-text transcription** for audio in the domain covered by `ScreenTalk-XS`
|
| 26 |
+
- **Automatic subtitling** and **audio content analysis**
|
| 27 |
+
- **Voice-assisted applications** where accurate ASR is needed
|
| 28 |
+
|
| 29 |
+
### **Limitations**
|
| 30 |
+
- May not generalize well to **out-of-domain** data
|
| 31 |
+
- Performance is dependent on **audio quality** and **background noise**
|
| 32 |
+
- The model is optimized for English (or the target language in `ScreenTalk-XS`)
|
| 33 |
+
|
| 34 |
+
## **Training and Evaluation Data**
|
| 35 |
+
The model was fine-tuned on the `DataLabX/ScreenTalk-XS` dataset, which contains domain-specific speech recordings. The dataset has been preprocessed and formatted to enhance ASR capabilities in specific contexts.
|
| 36 |
+
|
| 37 |
+
## **Training Procedure**
|
| 38 |
+
### **Hyperparameters**
|
| 39 |
+
The model was trained with the following hyperparameters:
|
| 40 |
+
|
| 41 |
+
| Hyperparameter | Value |
|
| 42 |
+
|--------------------------------|-------------|
|
| 43 |
+
| Learning Rate | `5e-05` |
|
| 44 |
+
| Train Batch Size | `8` |
|
| 45 |
+
| Eval Batch Size | `8` |
|
| 46 |
+
| Seed | `42` |
|
| 47 |
+
| Gradient Accumulation Steps | `8` |
|
| 48 |
+
| Total Train Batch Size | `64` |
|
| 49 |
+
| Optimizer | `AdamW` (β1=0.9, β2=0.999, ε=1e-08) |
|
| 50 |
+
| Learning Rate Scheduler | `Linear` |
|
| 51 |
+
| Warmup Steps | `10` |
|
| 52 |
+
| Total Training Steps | `200` |
|
| 53 |
+
|
| 54 |
+
### **Training Progress**
|
| 55 |
+
The model was trained for **200 steps**, and the WER improved over time:
|
| 56 |
+
|
| 57 |
+
| Step | Training Loss | Validation Loss | WER (%) |
|
| 58 |
+
|------|--------------|----------------|---------|
|
| 59 |
+
| 20 | 1.1515 | 1.0011 | 22.33 |
|
| 60 |
+
| 40 | 0.7024 | 0.6125 | 26.64 |
|
| 61 |
+
| 60 | 0.3648 | 0.4175 | 23.00 |
|
| 62 |
+
| 80 | 0.3753 | 0.3991 | 22.09 |
|
| 63 |
+
| 100 | 0.3838 | 0.3952 | 22.83 |
|
| 64 |
+
| 120 | 0.3358 | 0.3834 | 22.59 |
|
| 65 |
+
| 140 | 0.1462 | 0.3924 | 22.01 |
|
| 66 |
+
| 160 | 0.1636 | 0.3847 | 21.50 |
|
| 67 |
+
| 180 | 0.1587 | 0.3778 | 21.36 |
|
| 68 |
+
| 200 | 0.1583 | 0.3759 | 21.27 |
|
| 69 |
+
|
| 70 |
+
## **Framework Versions**
|
| 71 |
+
- **PEFT**: `0.14.0`
|
| 72 |
+
- **Transformers**: `4.48.3`
|
| 73 |
+
- **PyTorch**: `2.5.1+cu124`
|
| 74 |
+
- **Datasets**: `3.3.2`
|
| 75 |
+
- **Tokenizers**: `0.21.0`
|
| 76 |
+
|
| 77 |
+
## **How to Use**
|
| 78 |
+
To load and use this model for inference:
|
| 79 |
+
|
| 80 |
+
```python
|
| 81 |
+
from transformers import pipeline
|
| 82 |
+
|
| 83 |
+
asr_pipeline = pipeline("automatic-speech-recognition", model="your_hf_username/ScreenTalk")
|
| 84 |
+
audio_file = "path/to/audio.wav"
|
| 85 |
+
|
| 86 |
+
transcription = asr_pipeline(audio_file)
|
| 87 |
+
print(transcription["text"])
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
## **Citation**
|
| 91 |
+
If you use this model, please cite:
|
| 92 |
+
|
| 93 |
+
```java
|
| 94 |
+
@misc{ScreenTalk,
|
| 95 |
+
title={ScreenTalk: A Fine-tuned Whisper-Small Model for Speech Recognition},
|
| 96 |
+
author={Your Name or Organization},
|
| 97 |
+
year={2025},
|
| 98 |
+
url={https://huggingface.co/your_hf_username/ScreenTalk}
|
| 99 |
+
}
|
| 100 |
+
```
|
| 101 |
|
|
|
|
| 102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|