google/fleurs
Viewer • Updated • 768k • 57.7k • 405
How to use doof-ferb/whisper-tiny-vi with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("doof-ferb/whisper-tiny-vi")
model = AutoModelForSpeechSeq2Seq.from_pretrained("doof-ferb/whisper-tiny-vi")whisper tiny fine-tuned on a very big collection of vietnamese speech datasets
TODO:
openai-whisper, whisper.cpp, faster-whisper21k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2)
manually evaluate WER on test set - vietnamese part:
@ float16 |
CommonVoice v16.1 |
FLEURS |
VIVOS |
|---|---|---|---|
original whisper-tiny |
>100% | 88.6% | 62.5% |
| this model | 26.6% | 37.1% | 18.7% |
all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi
usage example:
import torch
from transformers import pipeline
PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}
PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]
Base model
openai/whisper-tiny