Instructions to use AnvarMexmonov/uz-speech-adapter-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AnvarMexmonov/uz-speech-adapter-v1 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
Whisper Large-v3 LoRA β Uzbek Speech Adapter v1
A LoRA (Low-Rank Adaptation) fine-tune of openai/whisper-large-v3 for Uzbek speech recognition, trained on the Mozilla Common Voice Uzbek dataset.
This adapter is designed to work as part of a full Speaker Diarization Pipeline using pyannote/speaker-diarization-3.1 β producing timestamped, speaker-labelled Uzbek transcripts from any audio file.
Only ~1% of parameters were trained (15.7M out of 1.55B), so the adapter is just 63 MB while the base model stays completely frozen.
Key Engineering Features
- Fault-Tolerant Training β Custom
run.shwatchdog that survivesSIGSEGV/ GPU memory crashes and auto-resumes from the latest checkpoint - Arrow-Free Dataloader β Custom
WhisperDatasetusing plain Python lists in RAM, bypassing HuggingFace Arrow/mmap memory issues that caused segfaults - LoRA Rank 32 β Targets
q_projandv_projattention layers only, achieving strong Uzbek accuracy without catastrophic forgetting of the base model
How to Use
import torch
from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, WhisperProcessor
# =====================================================================
# π‘ REMINDER: This LoRA adapter was specifically trained on and
# MUST be used with the base model: "openai/whisper-large-v3"
# =====================================================================
# 1. Load the Configuration and Base Model
model_id = "AnvarMexmonov/uz-speech-adapter-v1"
config = PeftConfig.from_pretrained(model_id)
# The config automatically pulls "openai/whisper-large-v3" as the base
processor = WhisperProcessor.from_pretrained(config.base_model_name_or_path)
model = WhisperForConditionalGeneration.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float16,
device_map="auto"
)
# 2. Load your custom Uzbek LoRA Adapter
model = PeftModel.from_pretrained(model, model_id)
print(" Custom Uzbek Whisper Large-v3 model is ready for inference!")
Training Details
| Base model | openai/whisper-large-v3 |
| Fine-tuning method | LoRA via HuggingFace PEFT |
| LoRA targets | q_proj, v_proj |
| LoRA rank / alpha | 32 / 64 |
| Trainable params | 15.7M / 1.55B (1%) |
| Dataset | yakhyo/mozilla-common-voice-uzbek |
| Train samples | 3,000 clips |
| Eval samples | 200 clips |
| Steps | 1,000 |
| Effective batch size | 8 |
| Learning rate | 5e-4 with 50-step warmup |
| Precision | bf16 |
| GPU | NVIDIA RTX 5090 |
| Training time | ~90 minutes |
Results
| Metric | Value |
|---|---|
| Best eval loss | 0.7835 |
| Training steps | 1,000 |
Full Pipeline
The complete diarization + transcription project (code, training scripts, inference) is available at: https://github.com/anvarmexmonov/WhoSaidWhatWhen-Uzbek
License
MIT Β© AnvarMexmonov
- Downloads last month
- 1
Model tree for AnvarMexmonov/uz-speech-adapter-v1
Base model
openai/whisper-large-v3