TalTechNLP
/

whisper-large-v3-et-subs

Automatic Speech Recognition

Model card Files Files and versions

whisper-large-v3-et-subs / README.md

artemmm's picture

Update README.md

4075833 verified about 1 year ago

|

history blame contribute delete

2.65 kB

	---
	license: mit
	language: et
	tags:
	- audio
	- automatic-speech-recognition
	#widget:
	#- example_title: Librispeech sample 1
	# src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
	#- example_title: Librispeech sample 2
	# src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
	pipeline_tag: automatic-speech-recognition
	base_model:
	- openai/whisper-large-v3
	library_name: transformers
	---

	## Introduction

	This model is OpenAI Whisper large-v3, finetuned on ~770 hours of manually created subtitles from Estonian TV (ETV).
	Therefore, this model does not always create verbatim (word-by-word) subtitles but often rephrases the sentences and
	compresses text, especially in the case of spontaneous speech, hestitations, repetitions, etc. However, the length
	of the generated text chunks almost always conforms to the ETV subtitle requirements (48 characters per line).

	## Usage



	It's a finetuned vesion of Whisper large-v3-turbo and can be therefore used via Hugging Face 🤗 Transformers. To run the model, first install the Transformers
	library. For this example, we'll also install 🤗 Accelerate to reduce the model loading time:

	```bash
	pip install --upgrade pip
	pip install --upgrade transformers accelerate
	```

	The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
	class to transcribe audios of arbitrary length:

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset


	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

	model_id = "TalTechNLP/whisper-large-v3-et-subs"

	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)

	processor = AutoProcessor.from_pretrained(model_id)

	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	torch_dtype=torch_dtype,
	device=device,
	)

	audio = "sample.mp3"

	result = pipe(sample, generate_kwargs={"task": "transcribe", "language": "et"})
	print(result)
	```

	## Citation

	```
	@inproceedings{fedorchenko-2025-optimizing,
	title = "Optimizing Estonian {TV} Subtitles with Semi-supervised Learning and {LLMs}",
	author = {Fedorchenko, Artem and Alum{\"a}e, Tanel},
	booktitle = "Proceedings of the 25th Nordic Conference on Computational Linguistics (NoDaLiDa)",
	year = "2025"
	}
	```