--- language: uz license: apache-2.0 pipeline_tag: automatic-speech-recognition tags: - speech - speech-recognition - whisper - uzbek - stt --- # OmoN-STT Uzbek speech recognition model trained from **islomov/rubaistt_v2_medium**. Dataset size: ~244k audio samples Training time: ~190 hours Epochs: 3 Training Base model: islomov/rubaistt_v2_medium Training setup: GPU: RTX 3080 Ti Batch size: 8 Gradient accumulation: 4 Epochs: 3 Total training steps: 22251 ## Usage ```python import torch import librosa from transformers import WhisperProcessor, WhisperForConditionalGeneration processor = WhisperProcessor.from_pretrained("omullaboyev/OmoN-STT") model = WhisperForConditionalGeneration.from_pretrained("omullaboyev/OmoN-STT") audio, sr = librosa.load("audio.wav", sr=16000) inputs = processor(audio, sampling_rate=16000, return_tensors="pt") with torch.no_grad(): predicted_ids = model.generate(inputs["input_features"]) text = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(text)