You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ZarosASR — Whisper Small fine-tuned on Central Kurdish (Sorani)

ZarosASR is a fine-tuned version of openai/whisper-small for Central Kurdish (Sorani / CKB) automatic speech recognition, developed as part of a thesis project on low-resource Kurdish ASR.

The model was fine-tuned using LoRA (PEFT) on Mozilla Common Voice 24.0 CKB, then the adapter was merged into the base Whisper small weights for standalone inference.

Language token note: Since Whisper has no native CKB language token, training uses a Persian token hijack strategy — the <|fa|> decoder prompt token is repurposed to condition the model on Central Kurdish audio. This is a standard technique for extending Whisper to unsupported languages using a phonologically adjacent token.

Model Details

  • Developed by: Section (thesis project)
  • Model type: Automatic Speech Recognition (Seq2Seq Transformer)
  • Language(s): Central Kurdish — Sorani (ckb)
  • License: Apache 2.0
  • Fine-tuned from: openai/whisper-small (multilingual)

Performance

Split WER (%)
Test 7.96

Training converged over ~8,300 steps across ~11 epochs, with eval WER dropping from ~35% at the first checkpoint to 7.96% at step 8,250.

Step Train Loss Eval Loss Eval WER (%)
750 0.211 0.177 35.26
1500 0.157 0.139 29.11
2250 0.141 0.129 26.97
3000 0.122 0.119 25.26
3750 0.102 0.104 23.24
4500 0.084 0.086 19.06
5250 0.068 0.072 15.54
6000 0.050 0.063 13.18
6750 0.034 0.053 10.70
7500 0.021 0.047 9.26
8250 0.016 0.043 7.96

How to Get Started

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="SECT19N/whisper-small-ckb-merged",
    generate_kwargs={"language": "persian", "task": "transcribe"},
)

result = pipe("audio.wav")
print(result["text"])

Or using the model and processor directly:

import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained(
    "SECT19N/whisper-small-ckb-merged", language="Persian", task="transcribe"
)
model = WhisperForConditionalGeneration.from_pretrained("SECT19N/whisper-small-ckb-merged")
model.eval()

# Use the Persian token (<|fa|>) as the decoder prompt — CKB hijack
forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")
model.generation_config.forced_decoder_ids = forced_decoder_ids
model.generation_config.language = "fa"
model.generation_config.task = "transcribe"

# audio_array: np.ndarray at 16000 Hz sample rate
inputs = processor(
    audio_array,
    sampling_rate=16000,
    return_tensors="pt",
    return_attention_mask=True,
)

with torch.no_grad():
    predicted_ids = model.generate(
        inputs["input_features"],
        attention_mask=inputs["attention_mask"],
    )

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Training Details

Training Data

Common Voice Scripted Speech 24.0 - Central Kurdish

Split Samples
Train 95,895
Validation 11,987
Test 11,987

Preprocessing & Augmentation

Audio was resampled to 16 kHz mono and processed through Whisper's log-mel feature extractor with return_attention_mask=True. Training samples were augmented on-the-fly with:

  • Volume perturbation (scale ×0.7–1.3, p=0.4)
  • Gaussian noise (σ=0.001–0.006, p=0.3)
  • Speed perturbation via resampling (factor ×0.95–1.05, p=0.2)

No augmentation was applied to validation or test sets.

Training Hyperparameters

Parameter Value
Base model openai/whisper-small
Precision BF16 + TF32
Per-device batch size (train) 128
Per-device batch size (eval) 128
Gradient accumulation steps 1
Learning rate 1e-3
LR scheduler Cosine
Warmup steps 375
Max epochs 15
Eval & save interval 750 steps
Best model metric WER (lower is better)
Gradient checkpointing Enabled

LoRA Configuration

Parameter Value
Rank (r) 32
Alpha 64
Dropout 0.1
Target modules q_proj, v_proj, k_proj, out_proj, fc1, fc2
Bias none
Task type SEQ_2_SEQ_LM
Adapter size ~52 MB (pre-merge)

Training was run in multiple resumed sessions on Modal (A10G GPU).

Uses

Direct Use

Transcribing Central Kurdish (Sorani) speech to text. Suitable for:

  • Voice-to-text applications for Sorani Kurdish speakers
  • Subtitle and caption generation for CKB audio/video content
  • Research and downstream NLP tasks on Kurdish text

Out-of-Scope Use

  • Not suitable for Kurmanji (kmr), Zazaki, or other Kurdish dialects without further fine-tuning.
  • Not evaluated on telephone-quality, heavily noisy, or far-field audio.
  • Not intended for real-time streaming ASR without additional latency optimization.

Bias, Risks, and Limitations

  • Training data is crowdsourced from Common Voice and may not represent all regional accents, age groups, or speaking styles within the Sorani-speaking community.
  • The Persian token hijack (<|fa|>) is a pragmatic workaround; the model has no explicit linguistic knowledge that it is processing Kurdish rather than Persian.
  • Performance may degrade on accents, domains, or recording conditions outside Common Voice.
  • The model inherits any biases present in the Whisper small multilingual base model.

Environmental Impact

Training was performed on Modal cloud infrastructure.

Citation

@misc{zaros_asr_2026,
  title        = {ZarosASR: Fine-tuning Whisper for Central Kurdish (Sorani) Speech Recognition},
  author       = {Yusf Idres},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/SECT19N/whisper-small-ckb-merged}},
  note         = {LoRA-adapted Whisper small on Mozilla Common Voice 24.0 CKB}
}

Model Card Contact

For questions or issues, open a discussion on the model repository. If you're building a commercial product on this Model, we'd appreciate you reaching out.

Downloads last month
21
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SECT19N/whisper-small-ckb-merged

Adapter
(207)
this model

Evaluation results