Itbanque/ScreenTalk-XS
Viewer β’ Updated β’ 10k β’ 322 β’ 2
How to use fj11/ScreenTalk-xs with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="fj11/ScreenTalk-xs") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("fj11/ScreenTalk-xs")
model = AutoModelForSpeechSeq2Seq.from_pretrained("fj11/ScreenTalk-xs")openai/whisper-smallScreenTalk-xs is a fine-tuned version of OpenAI's whisper-small model, optimized for speech-to-text transcription on movies & TV show audio. This model is specifically trained to improve ASR (Automatic Speech Recognition) performance in dialogue-heavy scenarios.
The model was fine-tuned using the ScreenTalk-XS dataset, a collection of transcribed movie & TV audio.
| Hyperparameter | Value |
|---|---|
| Learning Rate | 5e-5 |
| Batch Size | 6 |
| Gradient Accumulation | 4 |
| Epochs | 5 |
LoRA Rank (r) |
4 |
| Optimizer | AdamW |
| Epoch | Training Loss | Validation Loss | WER (%) |
|---|---|---|---|
| 1 | 0.502400 | 0.333292 | 20.870653 |
| 2 | 0.244200 | 0.327987 | 20.580875 |
| 3 | 0.523600 | 0.325907 | 21.924394 |
| 4 | 0.445500 | 0.326386 | 20.508430 |
| 5 | 0.285700 | 0.327116 | 20.752107 |
Epoch 4, achieving WER = 20.50% | Model | WER (%) |
|---|---|
| Whisper-small (baseline) | 30.00% |
| ScreenTalk-xs (fine-tuned) | 27.00% β |
You can use this model for speech-to-text transcription with pipeline:
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="fj11/ScreenTalk-xs",
device=0 # Run on GPU
)
result = pipe("path/to/audio.wav")
print(result["text"])
If you use this model, please cite:
@misc{DataLabX2025ScreenTalkXS,
author = {DataLabX},
title = {ScreenTalk-xs: ASR Model Fine-Tuned on Movie & TV Audio},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DataLabX/ScreenTalk-xs}
}
Base model
openai/whisper-small