Sayro: Uzbek Text-to-Speech (Qwen3-TTS Fine-tuned)
Sayro is a high-quality Uzbek Text-to-Speech model based on the Qwen3-TTS-12Hz-1.7B-Base architecture. This model has been specifically fine-tuned to capture the nuances of the Uzbek language using a curated mix of synthetic data and public Uzbek speech datasets.
Examples
Listen to the model's output generated with the script provided below.
| Sample Description | Audio Player |
|---|---|
| Greeting (Happy) | |
| Project Intro | |
| Contribution (Neutral) | |
| Social Media (Excited) |
Model Description
This model provides a foundational open-access checkpoint for Uzbek speech synthesis. It is designed for researchers and developers looking to integrate natural-sounding Uzbek voices into their applications. This project was made possible by the dedicated efforts of the Examy.me and Teamwork.uz teams. Their support in data curation and computational resources has been instrumental in bringing Sayro to the Uzbek AI community.
- Architecture: Based on
Qwen/Qwen3-TTS-12Hz-1.7B-Base. For detailed architectural specifications, please refer to the original Qwen model page. - Training Data: A balanced mixture of high-fidelity synthetic audio and diverse public domain Uzbek speech datasets.
- Purpose: To contribute to the growing field of Uzbek Language Technology and Speech AI research.
Premium Models
For users requiring production-grade quality, we offer Sayro Premium models featuring:
- Realistic: Hyper-natural human prosody.
- Dialect-specific: Support for regional Uzbek dialects.
- Literal: Precision-focused speech for formal documents.
Visit sayro.uz for more information on accessing these professional checkpoints.
Quickstart
pip install -U qwen-tts
import torch
import soundfile as sf
from qwen_tts.inference.qwen3_tts_model import Qwen3TTSModel
import time
total_start_time = time.time()
CHECKPOINT_PATH = "uzlm/sayro-tts-1.7B"
print(f"Loading custom Uzbek model from {CHECKPOINT_PATH}...")
tts = Qwen3TTSModel.from_pretrained(
CHECKPOINT_PATH,
device_map="cuda:0", # "cpu" if GPU is unavailable
dtype=torch.bfloat16,
# attn_implementation="flash_attention_2", # enable for faster inference
)
# tts.model.talker = torch.compile(tts.model.talker, mode="reduce-overhead")
start_time = time.time()
test_text1 = "Assalomu alaykum! Bu mening birinchi sun'iy intellekt ovozim. Xabarni eshitayotganingizdan xursandman. Yaxshimisiz? Bugun juda yaxshi kun."
test_text2 = "Ushbu model Examy va Teamwork.uz jamoalari tomonidan ishlab chiqildi."
test_text3 = "Umid qilamizki, bu loyiha O'zbekistondagi sun'iy intellekt rivojiga katta hissa qo'shadi."
test_text4 = "UzLM hugging-face va LinkedIn sahifamizda bizni kuzatib boring."
with torch.inference_mode():
wavs, sr = tts.generate_custom_voice(
text=[test_text1, test_text2, test_text3, test_text4],
speaker=["sayro", "sayro", "sayro", "sayro"],
instruct=["Happy", "", "Neutral", "Excited"]
)
print(f"Total time: {time.time() - total_start_time:.2f}s, Generate time: {time.time() - start_time:.2f}s")
sf.write("test_uzbek_output1.mp3", wavs[0], sr)
sf.write("test_uzbek_output2.mp3", wavs[1], sr)
sf.write("test_uzbek_output3.mp3", wavs[2], sr)
sf.write("test_uzbek_output4.mp3", wavs[3], sr)
Ethical Use and Restrictions
IMPORTANT: THIS MODEL IS FOR ETHICAL USE ONLY. By requesting access, you agree not to use this model for:
- Deepfaking: Cloning voices without explicit consent.
- Fake News: Generating deceptive or misleading audio content to spread misinformation.
- Fraud: Using synthesized voices for impersonation, phishing, or financial scams.
Any use of this model must comply with international AI safety standards and local Uzbek regulations. Users are required to explicitly agree to the Sayro Terms of Use before access is granted.
How to Access
- Log in to your Hugging Face account.
- Fill out the access request form above.
- Your request will be manually reviewed by the UzLM team.
- Once approved, you can download the weights and fine-tune the model for your own research projects.
Developed with ❤️ for the Uzbek AI community by the UzLM team.
- Downloads last month
- -
Model tree for uzlm/sayro-tts-1.7B
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-Base