Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Whisper Large-v3 LoRA — Uzbek Speech Adapter v1

A LoRA (Low-Rank Adaptation) fine-tune of openai/whisper-large-v3 for Uzbek speech recognition, trained on the Mozilla Common Voice Uzbek dataset.

This adapter is designed to work as part of a full Speaker Diarization Pipeline using pyannote/speaker-diarization-3.1 — producing timestamped, speaker-labelled Uzbek transcripts from any audio file.

Only ~1% of parameters were trained (15.7M out of 1.55B), so the adapter is just 63 MB while the base model stays completely frozen.

Key Engineering Features

Fault-Tolerant Training — Custom run.sh watchdog that survives SIGSEGV / GPU memory crashes and auto-resumes from the latest checkpoint
Arrow-Free Dataloader — Custom WhisperDataset using plain Python lists in RAM, bypassing HuggingFace Arrow/mmap memory issues that caused segfaults
LoRA Rank 32 — Targets q_proj and v_proj attention layers only, achieving strong Uzbek accuracy without catastrophic forgetting of the base model

How to Use

import torch
from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, WhisperProcessor

# =====================================================================
# 💡 REMINDER: This LoRA adapter was specifically trained on and 
# MUST be used with the base model: "openai/whisper-large-v3"
# =====================================================================

# 1. Load the Configuration and Base Model
model_id = "AnvarMexmonov/uz-speech-adapter-v1"
config = PeftConfig.from_pretrained(model_id)

# The config automatically pulls "openai/whisper-large-v3" as the base
processor = WhisperProcessor.from_pretrained(config.base_model_name_or_path)

model = WhisperForConditionalGeneration.from_pretrained(
    config.base_model_name_or_path, 
    torch_dtype=torch.float16, 
    device_map="auto"
)

# 2. Load your custom Uzbek LoRA Adapter
model = PeftModel.from_pretrained(model, model_id)

print(" Custom Uzbek Whisper Large-v3 model is ready for inference!")

Training Details


Base model	openai/whisper-large-v3
Fine-tuning method	LoRA via HuggingFace PEFT
LoRA targets	`q_proj`, `v_proj`
LoRA rank / alpha	32 / 64
Trainable params	15.7M / 1.55B (1%)
Dataset	yakhyo/mozilla-common-voice-uzbek
Train samples	3,000 clips
Eval samples	200 clips
Steps	1,000
Effective batch size	8
Learning rate	5e-4 with 50-step warmup
Precision	bf16
GPU	NVIDIA RTX 5090
Training time	~90 minutes

Results

Metric	Value
Best eval loss	0.7835
Training steps	1,000

Full Pipeline

The complete diarization + transcription project (code, training scripts, inference) is available at: https://github.com/anvarmexmonov/WhoSaidWhatWhen-Uzbek

License

MIT © AnvarMexmonov

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AnvarMexmonov/uz-speech-adapter-v1

Base model

openai/whisper-large-v3

Adapter

(214)

this model

Dataset used to train AnvarMexmonov/uz-speech-adapter-v1

Paper for AnvarMexmonov/uz-speech-adapter-v1

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 61