Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Whisper Large-v3 LoRA β€” Uzbek Speech Adapter v1

A LoRA (Low-Rank Adaptation) fine-tune of openai/whisper-large-v3 for Uzbek speech recognition, trained on the Mozilla Common Voice Uzbek dataset.

This adapter is designed to work as part of a full Speaker Diarization Pipeline using pyannote/speaker-diarization-3.1 β€” producing timestamped, speaker-labelled Uzbek transcripts from any audio file.

Only ~1% of parameters were trained (15.7M out of 1.55B), so the adapter is just 63 MB while the base model stays completely frozen.


Key Engineering Features

  • Fault-Tolerant Training β€” Custom run.sh watchdog that survives SIGSEGV / GPU memory crashes and auto-resumes from the latest checkpoint
  • Arrow-Free Dataloader β€” Custom WhisperDataset using plain Python lists in RAM, bypassing HuggingFace Arrow/mmap memory issues that caused segfaults
  • LoRA Rank 32 β€” Targets q_proj and v_proj attention layers only, achieving strong Uzbek accuracy without catastrophic forgetting of the base model

How to Use

import torch
from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, WhisperProcessor

# =====================================================================
# πŸ’‘ REMINDER: This LoRA adapter was specifically trained on and 
# MUST be used with the base model: "openai/whisper-large-v3"
# =====================================================================

# 1. Load the Configuration and Base Model
model_id = "AnvarMexmonov/uz-speech-adapter-v1"
config = PeftConfig.from_pretrained(model_id)

# The config automatically pulls "openai/whisper-large-v3" as the base
processor = WhisperProcessor.from_pretrained(config.base_model_name_or_path)

model = WhisperForConditionalGeneration.from_pretrained(
    config.base_model_name_or_path, 
    torch_dtype=torch.float16, 
    device_map="auto"
)

# 2. Load your custom Uzbek LoRA Adapter
model = PeftModel.from_pretrained(model, model_id)

print(" Custom Uzbek Whisper Large-v3 model is ready for inference!")

Training Details

Base model openai/whisper-large-v3
Fine-tuning method LoRA via HuggingFace PEFT
LoRA targets q_proj, v_proj
LoRA rank / alpha 32 / 64
Trainable params 15.7M / 1.55B (1%)
Dataset yakhyo/mozilla-common-voice-uzbek
Train samples 3,000 clips
Eval samples 200 clips
Steps 1,000
Effective batch size 8
Learning rate 5e-4 with 50-step warmup
Precision bf16
GPU NVIDIA RTX 5090
Training time ~90 minutes

Results

Metric Value
Best eval loss 0.7835
Training steps 1,000

Full Pipeline

The complete diarization + transcription project (code, training scripts, inference) is available at: https://github.com/anvarmexmonov/WhoSaidWhatWhen-Uzbek


License

MIT Β© AnvarMexmonov

Downloads last month
88
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AnvarMexmonov/uz-speech-adapter-v1

Adapter
(186)
this model

Dataset used to train AnvarMexmonov/uz-speech-adapter-v1

Paper for AnvarMexmonov/uz-speech-adapter-v1