Instructions to use LakoreAI/whisper-small-vi-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LakoreAI/whisper-small-vi-lora with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
Whisper Small — Vietnamese (LoRA Fine-tuned)
Fine-tuned version of openai/whisper-small on Vietnamese speech using LoRA adapters and the Mozilla Common Voice 11 dataset.
Training Results
| Metric | Value |
|---|---|
| Training Loss | 0.9382 |
| Epochs | 5 |
| Global Steps | 470 |
| Samples/sec | 7.37 |
| Total FLOPs | 4.60e+18 |
Model Details
- Base model:
openai/whisper-small(244M params) - Method: LoRA (Low-Rank Adaptation)
- Trainable params: ~13M (5.09% of base)
- Target modules:
q_proj,v_proj,k_proj,out_proj,fc1,fc2 - LoRA rank: 32 · alpha: 64 · dropout: 0.05
- Language: Vietnamese (
vi) - Task: Transcription
Training Details
- Dataset: Mozilla Common Voice 11.0 (
vi) - Learning rate: 1e-4 with linear warmup (500 steps)
- Batch size: 8 × 2 gradient accumulation = effective 16
- Precision: FP16
- Framework: 🤗 Transformers + PEFT
Data augmentation applied:
- Speed perturbation ±10% (p=0.3)
- Additive Gaussian noise (p=0.3)
Usage
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
base = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(base, "LakoreAI/whisper-small-vi-lora")
processor = WhisperProcessor.from_pretrained("LakoreAI/whisper-small-vi-lora")
# Optional: merge LoRA for faster inference
model = model.merge_and_unload()
model.eval()
# Inference
def transcribe(audio_array, sampling_rate=16000):
inputs = processor(audio_array, sampling_rate=sampling_rate, return_tensors="pt")
with torch.no_grad():
ids = model.generate(
inputs.input_features,
language="vietnamese",
task="transcribe",
max_new_tokens=225,
)
return processor.tokenizer.decode(ids[0], skip_special_tokens=True)
Limitations
- Optimized for Vietnamese only; other languages will degrade significantly
- Common Voice data skews toward read speech; spontaneous/accented speech may perform worse
- Short clips (<1s) or clipped audio may cause hallucinations
- Downloads last month
- -
Model tree for LakoreAI/whisper-small-vi-lora
Base model
openai/whisper-small