InflexionLab
/

VibeVoice-ASR-Kazakh

Automatic Speech Recognition

Eval Results (legacy)

Model card Files Files and versions

VibeVoice-ASR-Kazakh / README.md

diaslmb's picture

Update README.md

0d6ecd8 verified 2 days ago

|

history blame contribute delete

1.76 kB

	---
	license: apache-2.0
	language:
	- kk
	base_model:
	- microsoft/VibeVoice-ASR
	tags:
	- automatic-speech-recognition
	- kazakh
	- vibevoice
	- lora
	- ksc2
	datasets:
	- InflexionLab/ISSAI-KSC2-Structured
	metrics:
	model-index:
	- name: VibeVoice-ASR-Kazakh
	results:
	- task:
	type: ASR
	dataset:
	type: AudioDataset
	name: ISSAI KSC2
	metrics:
	- type: WER
	value: 22%
	---


	# VibeVoice ASR — Kazakh

	## Model Description

	This is VibeVoice ASR fine-tuned on the Kazakh language using the [ISSAI KSC2
	Structured](https://huggingface.co/datasets/InflexionLab/ISSAI-KSC2-Structured) dataset (~1,200 hours of diverse Kazakh speech). Fine-tuning was performed
	using LoRA (Low-Rank Adaptation) and the weights were merged into the base model for efficient inference. Model demonstrated 22% WER on test set of ISSAI KSC2.

	The base VibeVoice ASR model had no prior Kazakh knowledge. This fine-tuned version produces punctuated and capitalized Kazakh transcriptions.

	## Training Dataset

	[InflexionLab/ISSAI-KSC2-Structured](https://huggingface.co/datasets/InflexionLab/ISSAI-KSC2-Structured) — an enhanced version of the ISSAI KSC2 corpus
	with punctuation and capitalization restored using Gemma 27B. Covers 6 domains: TV News, Crowdsourced, Parliament, Talkshow, Podcasts, and Radio.

	## Evaluation Results

	Evaluated on the KSC2 Test split (9,351 samples) and farabi-lab/kazakh-stt 30K samples. Farabi-Lab dataset was not included in training.

	\| Dataset \| WER \| CER \|
	\|----------------------------\|-----------------\|-----------------\|
	\| ISSAI_KSC2 \| ~22% \| ~9.6% \|
	\| farabi-lab/kazakh-stt \| 17.6% \| 4.25% \|