You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Tynda STT 4L

Tynda (Тыңда — "Listen" in Kazakh) is a multilingual speech-to-text model supporting 4 languages of Central Asia and beyond.

Supported Languages

Language	Code
Kazakh	`kk`
Russian	`ru`
English	`en`
Uzbek	`uz`

Model Details

Architecture: Whisper Large V3 (1.55B parameters)
Task: Automatic Speech Recognition / Speech-to-Text
Audio Input: 16kHz mono WAV
Max Duration: 30 seconds per segment

Usage

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_id = "nur-dev/tynda-stt-4L"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = WhisperForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.float16
).to(device)

# Choose language: "kazakh", "russian", "english", or "uzbek"
processor = WhisperProcessor.from_pretrained(
    "openai/whisper-large-v3", language="kazakh", task="transcribe"
)

# Load your audio (16kHz mono)
import soundfile as sf
audio, sr = sf.read("audio.wav", dtype="float32")

inputs = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
features = inputs.input_features.to(device, dtype=torch.float16)

forced_ids = processor.get_decoder_prompt_ids(language="kazakh", task="transcribe")

with torch.no_grad():
    predicted_ids = model.generate(
        features,
        forced_decoder_ids=forced_ids,
        max_new_tokens=200,
    )

text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)

Using with `pipeline`

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="nur-dev/tynda-stt-4L",
    torch_dtype="float16",
    device="cuda:0",
)

result = pipe(
    "audio.wav",
    generate_kwargs={"language": "kazakh", "task": "transcribe"},
)
print(result["text"])

License

This model is released under CC BY-NC 4.0. It is free for non-commercial use. For commercial licensing, please contact the authors.

Evaluation (independently measured)

Held-out public test sets, measured directly — not self-reported (seed 42, uniform multilingual-Whisper normalization). FLEURS test = 500 utterances/language; ISSAI KSC2 test = 1000 utterances (in-domain Kazakh, spanning crowd/parliament/podcasts/radio/talkshow).

Test set	Lang	WER (%)	CER (%)
FLEURS `kk_kz`	kk	24.80	10.81
FLEURS `ru_ru`	ru	11.49	6.46
FLEURS `en_us`	en	6.33	3.41
ISSAI KSC2	kk	30.60	12.54

Macro WER (kk/ru/en): 14.21% (unweighted mean; penalises models that do not cover all three languages).

Note. The card reports no numbers. Best Russian and English in this account; Kazakh is the weak spot (FLEURS 24.8, and KSC2 30.6 on in-domain broadcast speech).

License & commercial use

Non-commercial use only (CC BY-NC 4.0). For commercial licensing or other inquiries, please reach out to the author, Nurgali Kadyrbek, on LinkedIn: https://www.linkedin.com/in/nurgali-kadyrbek-504260231/

Downloads last month: 9

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for nur-dev/tynda-stt-4L

Base model

openai/whisper-large-v3

Finetuned

(868)

this model