QOR-TTS-0.6B

QOR-TTS 0.6B โ€” Fast local voice cloning (~2 GB)

About

QOR-TTS is a voice cloning text-to-speech model, part of the QOR AI system. It enables local, offline voice cloning โ€” record a short voice sample and generate speech in that voice.

This model is based on Qwen/Qwen3-TTS-12Hz-0.6B-Base by Alibaba Cloud (Apache 2.0 license), repackaged for easy use with the QOR Voice Studio.

Quick Start

# With QOR Voice Studio (recommended):
# 1. Start QOR server: python -m qor serve
# 2. Open Voice Studio in browser
# 3. Go to Models tab โ†’ Download โ†’ Load
# 4. Record your voice โ†’ Test Clone

# Direct Python usage:
from qor.qwen_tts import QwenTTSEngine

engine = QwenTTSEngine()
engine.load_model("0.6B")
wav_path, duration = engine.generate(
    text="Hello, this is my cloned voice!",
    reference_audio="my_voice_sample.wav",
    reference_text="What I said in the sample",
    language="en",
)

Model Details

Property Value
Parameters 0.6 billion
Size ~2 GB
License Apache 2.0
Languages English, Chinese, Japanese, Korean, German, French, Spanish, Portuguese, Russian, Italian
Supports Voice cloning, text-to-speech, instruction-guided delivery
Device CUDA, DirectML (Windows), CPU
Based on Qwen/Qwen3-TTS-12Hz-0.6B-Base

Attribution

This model is a redistribution of Qwen/Qwen3-TTS-12Hz-0.6B-Base by Alibaba Cloud / Qwen Team. Original model released under Apache 2.0 License.

We thank the Qwen team for their excellent work on voice synthesis.

License

Apache 2.0 โ€” same as the original model. See LICENSE for full text.

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for qoranet/QOR-TTS-0.6B

Finetuned
(3)
this model