Qwen3-TTS Uyghur Fine-tuned Model
维吾尔语 TTS 微调模型,基于 Qwen3-TTS-12Hz-0.6B-Base 微调。
模型信息
| 项目 | 值 |
|---|---|
| 基础模型 | Qwen3-TTS-12Hz-0.6B-Base |
| 训练数据 | Common Voice Uyghur v24 (14,211 样本) |
| 说话人 | uyghur_speaker |
| 采样率 | 24000 Hz |
| 训练步数 | 4000 steps |
| 最终 Loss | ~4.5 |
使用示例
Python
from qwen_tts import Qwen3TTSModel
import soundfile as sf
import torch
# 加载模型
tts = Qwen3TTSModel.from_pretrained(
"anke01/qwen3-tts-uyghur",
device_map="cuda:0",
dtype=torch.bfloat16,
)
# 合成语音
wavs, sr = tts.generate_custom_voice(
text="بۇ بىر سىناق ئاۋازى.",
speaker="uyghur_speaker",
language="Auto",
)
# 保存
sf.write("output.wav", wavs[0], sr)
print(f"Saved: {len(wavs[0])/sr:.2f}s at {sr}Hz")
命令行
# 下载模型
git lfs install
git clone https://huggingface.co/anke01/qwen3-tts-uyghur
# 推理
cd qwen3-tts-uyghur
python infer.py \
--model_path ./checkpoint-step-4000 \
--text "بۇ بىر سىناق ئاۋازى." \
--speaker uyghur_speaker \
--output test.wav
训练配置
| 参数 | 值 |
|---|---|
| Batch Size | 2 |
| Gradient Accumulation | 16 |
| Learning Rate | 2e-6 |
| Max Steps | 4000 |
| Epoch | 0 (55%) |
文件结构
qwen3-tts-uyghur/
├── checkpoint-step-4000/ # 微调模型权重
│ ├── config.json
│ ├── model.safetensors # 1.8GB
│ ├── tokenizer_config.json
│ ├── vocab.json
│ ├── merges.txt
│ └── speech_tokenizer/
├── train.py # 训练脚本
├── infer.py # 推理脚本
├── dataset.py # 数据集类
├── prepare_data.py # 数据预处理
├── convert_commonvoice_to_jsonl.py
├── requirements.txt
└── README.md
训练数据格式
{
"audio": "path/to/audio.wav",
"text": "维吾尔语文本",
"ref_audio": "path/to/reference.wav",
"audio_codes": [[...], [...]]
}
依赖安装
pip install -r requirements.txt
requirements.txt:
qwen-tts==0.1.1
transformers>=4.57.3
accelerate>=0.25.0
torch>=2.0.0
torchaudio>=2.0.0
librosa>=0.10.0
soundfile>=0.12.0
tqdm>=4.66.0
safetensors>=0.4.0
示例文本
بۇ بىر سىناق ئاۋازى. # 这是一个测试声音
ياخشىمۇسىز، بۈگۈن ھاۋا ياخشى. # 你好,今天天气好
ئىككىنچى سىناق. # 第二个测试
许可证
Apache 2.0
相关链接
引用
@misc{qwen3tts2025,
title={Qwen3-TTS: A Multilingual Multimodal TTS Model},
author={Qwen Team},
year={2025},
url={https://github.com/QwenLM/Qwen3-TTS}
}
Model tree for anke01/qwen3-tts-uyghur
Base model
Qwen/Qwen3-TTS-12Hz-0.6B-Base