Instructions to use TalTechNLP/whisper-large-v3-et-subs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TalTechNLP/whisper-large-v3-et-subs with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="TalTechNLP/whisper-large-v3-et-subs")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("TalTechNLP/whisper-large-v3-et-subs") model = AutoModelForSpeechSeq2Seq.from_pretrained("TalTechNLP/whisper-large-v3-et-subs") - Notebooks
- Google Colab
- Kaggle
Introduction
This model is OpenAI Whisper large-v3, finetuned on ~770 hours of manually created subtitles from Estonian TV (ETV). Therefore, this model does not always create verbatim (word-by-word) subtitles but often rephrases the sentences and compresses text, especially in the case of spontaneous speech, hestitations, repetitions, etc. However, the length of the generated text chunks almost always conforms to the ETV subtitle requirements (48 characters per line).
Usage
It's a finetuned vesion of Whisper large-v3-turbo and can be therefore used via Hugging Face ๐ค Transformers. To run the model, first install the Transformers library. For this example, we'll also install ๐ค Accelerate to reduce the model loading time:
pip install --upgrade pip
pip install --upgrade transformers accelerate
The model can be used with the pipeline
class to transcribe audios of arbitrary length:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "TalTechNLP/whisper-large-v3-et-subs"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
audio = "sample.mp3"
result = pipe(sample, generate_kwargs={"task": "transcribe", "language": "et"})
print(result)
Citation
@inproceedings{fedorchenko-2025-optimizing,
title = "Optimizing Estonian {TV} Subtitles with Semi-supervised Learning and {LLMs}",
author = {Fedorchenko, Artem and Alum{\"a}e, Tanel},
booktitle = "Proceedings of the 25th Nordic Conference on Computational Linguistics (NoDaLiDa)",
year = "2025"
}
- Downloads last month
- 65
Model tree for TalTechNLP/whisper-large-v3-et-subs
Base model
openai/whisper-large-v3