QOR-TTS-0.6B

QOR-TTS 0.6B — Fast local voice cloning (~2 GB)

About

QOR-TTS is a voice cloning text-to-speech model, part of the QOR AI system. It enables local, offline voice cloning — record a short voice sample and generate speech in that voice.

This model is based on Qwen/Qwen3-TTS-12Hz-0.6B-Base by Alibaba Cloud (Apache 2.0 license), repackaged for easy use with the QOR Voice Studio.

Quick Start

# With QOR Voice Studio (recommended):
# 1. Start QOR server: python -m qor serve
# 2. Open Voice Studio in browser
# 3. Go to Models tab → Download → Load
# 4. Record your voice → Test Clone

# Direct Python usage:
from qor.qwen_tts import QwenTTSEngine

engine = QwenTTSEngine()
engine.load_model("0.6B")
wav_path, duration = engine.generate(
    text="Hello, this is my cloned voice!",
    reference_audio="my_voice_sample.wav",
    reference_text="What I said in the sample",
    language="en",
)

Model Details

Property	Value
Parameters	0.6 billion
Size	~2 GB
License	Apache 2.0
Languages	English, Chinese, Japanese, Korean, German, French, Spanish, Portuguese, Russian, Italian
Supports	Voice cloning, text-to-speech, instruction-guided delivery
Device	CUDA, DirectML (Windows), CPU
Based on	Qwen/Qwen3-TTS-12Hz-0.6B-Base

Attribution

This model is a redistribution of Qwen/Qwen3-TTS-12Hz-0.6B-Base by Alibaba Cloud / Qwen Team. Original model released under Apache 2.0 License.

We thank the Qwen team for their excellent work on voice synthesis.

License

Apache 2.0 — same as the original model. See LICENSE for full text.

Downloads last month: 26

Model tree for qoranet/QOR-TTS-0.6B

Base model

Qwen/Qwen3-TTS-12Hz-0.6B-Base

Finetuned

(3)

this model