Tynda STT 4L
Tynda (Тыңда — "Listen" in Kazakh) is a multilingual speech-to-text model supporting 4 languages of Central Asia and beyond.
Supported Languages
| Language | Code |
|---|---|
| Kazakh | kk |
| Russian | ru |
| English | en |
| Uzbek | uz |
Model Details
- Architecture: Whisper Large V3 (1.55B parameters)
- Task: Automatic Speech Recognition / Speech-to-Text
- Audio Input: 16kHz mono WAV
- Max Duration: 30 seconds per segment
Usage
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_id = "nur-dev/tynda-stt-4L"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = WhisperForConditionalGeneration.from_pretrained(
model_id, torch_dtype=torch.float16
).to(device)
# Choose language: "kazakh", "russian", "english", or "uzbek"
processor = WhisperProcessor.from_pretrained(
"openai/whisper-large-v3", language="kazakh", task="transcribe"
)
# Load your audio (16kHz mono)
import soundfile as sf
audio, sr = sf.read("audio.wav", dtype="float32")
inputs = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
features = inputs.input_features.to(device, dtype=torch.float16)
forced_ids = processor.get_decoder_prompt_ids(language="kazakh", task="transcribe")
with torch.no_grad():
predicted_ids = model.generate(
features,
forced_decoder_ids=forced_ids,
max_new_tokens=200,
)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)
Using with pipeline
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="nur-dev/tynda-stt-4L",
torch_dtype="float16",
device="cuda:0",
)
result = pipe(
"audio.wav",
generate_kwargs={"language": "kazakh", "task": "transcribe"},
)
print(result["text"])
License
This model is released under CC BY-NC 4.0. It is free for non-commercial use. For commercial licensing, please contact the authors.
- Downloads last month
- 9
Model tree for nur-dev/tynda-stt-4L
Base model
openai/whisper-large-v3