You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Tynda STT 4L

Tynda (Тыңда — "Listen" in Kazakh) is a multilingual speech-to-text model supporting 4 languages of Central Asia and beyond.

Supported Languages

Language Code
Kazakh kk
Russian ru
English en
Uzbek uz

Model Details

  • Architecture: Whisper Large V3 (1.55B parameters)
  • Task: Automatic Speech Recognition / Speech-to-Text
  • Audio Input: 16kHz mono WAV
  • Max Duration: 30 seconds per segment

Usage

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_id = "nur-dev/tynda-stt-4L"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = WhisperForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.float16
).to(device)

# Choose language: "kazakh", "russian", "english", or "uzbek"
processor = WhisperProcessor.from_pretrained(
    "openai/whisper-large-v3", language="kazakh", task="transcribe"
)

# Load your audio (16kHz mono)
import soundfile as sf
audio, sr = sf.read("audio.wav", dtype="float32")

inputs = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
features = inputs.input_features.to(device, dtype=torch.float16)

forced_ids = processor.get_decoder_prompt_ids(language="kazakh", task="transcribe")

with torch.no_grad():
    predicted_ids = model.generate(
        features,
        forced_decoder_ids=forced_ids,
        max_new_tokens=200,
    )

text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)

Using with pipeline

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="nur-dev/tynda-stt-4L",
    torch_dtype="float16",
    device="cuda:0",
)

result = pipe(
    "audio.wav",
    generate_kwargs={"language": "kazakh", "task": "transcribe"},
)
print(result["text"])

License

This model is released under CC BY-NC 4.0. It is free for non-commercial use. For commercial licensing, please contact the authors.

Downloads last month
9
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nur-dev/tynda-stt-4L

Finetuned
(767)
this model