Access the Sayro Uzbek TTS Model

To support responsible AI research in Uzbekistan, we require users to provide their intent and agree to our ethical terms of use. Each request is manually reviewed.

Sayro: Uzbek Text-to-Speech (Qwen3-TTS Fine-tuned)

Sayro is a high-quality Uzbek Text-to-Speech model based on the Qwen3-TTS-12Hz-1.7B-Base architecture. This model has been specifically fine-tuned to capture the nuances of the Uzbek language using a curated mix of synthetic data and public Uzbek speech datasets.

Examples

Listen to the model's output generated with the script provided below.

Sample Description	Audio Player
Greeting (Happy)
Project Intro
Contribution (Neutral)
Social Media (Excited)

Model Description

This model provides a foundational open-access checkpoint for Uzbek speech synthesis. It is designed for researchers and developers looking to integrate natural-sounding Uzbek voices into their applications. This project was made possible by the dedicated efforts of the Examy.me and Teamwork.uz teams. Their support in data curation and computational resources has been instrumental in bringing Sayro to the Uzbek AI community.

Architecture: Based on Qwen/Qwen3-TTS-12Hz-1.7B-Base. For detailed architectural specifications, please refer to the original Qwen model page.
Training Data: A balanced mixture of high-fidelity synthetic audio and diverse public domain Uzbek speech datasets.
Purpose: To contribute to the growing field of Uzbek Language Technology and Speech AI research.

Premium Models

For users requiring production-grade quality, we offer Sayro Premium models featuring:

Realistic: Hyper-natural human prosody.
Dialect-specific: Support for regional Uzbek dialects.
Literal: Precision-focused speech for formal documents.

Visit sayro.uz for more information on accessing these professional checkpoints.

Quickstart

pip install -U qwen-tts

import torch
import soundfile as sf
from qwen_tts.inference.qwen3_tts_model import Qwen3TTSModel
import time

total_start_time = time.time()

CHECKPOINT_PATH = "uzlm/sayro-tts-1.7B"
print(f"Loading custom Uzbek model from {CHECKPOINT_PATH}...")

tts = Qwen3TTSModel.from_pretrained(
    CHECKPOINT_PATH,
    device_map="cuda:0", # "cpu" if GPU is unavailable 
    dtype=torch.bfloat16,
    # attn_implementation="flash_attention_2", # enable for faster inference
)

# tts.model.talker = torch.compile(tts.model.talker, mode="reduce-overhead")

start_time = time.time()

test_text1 = "Assalomu alaykum! Bu mening birinchi sun'iy intellekt ovozim. Xabarni eshitayotganingizdan xursandman. Yaxshimisiz? Bugun juda yaxshi kun."
test_text2 = "Ushbu model Examy va Teamwork.uz jamoalari tomonidan ishlab chiqildi."
test_text3 = "Umid qilamizki, bu loyiha O'zbekistondagi sun'iy intellekt rivojiga katta hissa qo'shadi."
test_text4 = "UzLM hugging-face va LinkedIn sahifamizda bizni kuzatib boring."

with torch.inference_mode():
    wavs, sr = tts.generate_custom_voice(
        text=[test_text1, test_text2, test_text3, test_text4],
        speaker=["sayro", "sayro", "sayro", "sayro"],
        instruct=["Happy", "", "Neutral", "Excited"]
    )
print(f"Total time: {time.time() - total_start_time:.2f}s, Generate time: {time.time() - start_time:.2f}s")

sf.write("test_uzbek_output1.mp3", wavs[0], sr)
sf.write("test_uzbek_output2.mp3", wavs[1], sr)
sf.write("test_uzbek_output3.mp3", wavs[2], sr)
sf.write("test_uzbek_output4.mp3", wavs[3], sr)

Ethical Use and Restrictions

IMPORTANT: THIS MODEL IS FOR ETHICAL USE ONLY. By requesting access, you agree not to use this model for:

Deepfaking: Cloning voices without explicit consent.
Fake News: Generating deceptive or misleading audio content to spread misinformation.
Fraud: Using synthesized voices for impersonation, phishing, or financial scams.

Any use of this model must comply with international AI safety standards and local Uzbek regulations. Users are required to explicitly agree to the Sayro Terms of Use before access is granted.

How to Access

Log in to your Hugging Face account.
Fill out the access request form above.
Your request will be manually reviewed by the UzLM team.
Once approved, you can download the weights and fine-tune the model for your own research projects.

Developed with ❤️ for the Uzbek AI community by the UzLM team.

Downloads last month: -

Model tree for uzlm/sayro-tts-1.7B

Base model

Qwen/Qwen3-TTS-12Hz-1.7B-Base

Finetuned

(8)

this model

Collection including uzlm/sayro-tts-1.7B

Uzbek TTS

Collection

1 item • Updated about 3 hours ago