QOR-TTS-1.7B / README.md

drdraq

Upload QOR-TTS-1.7B (based on Qwen/Qwen3-TTS-12Hz-1.7B-Base)

ab1e970 verified 2 months ago

preview code

raw

history blame contribute delete

2.14 kB

metadata

license: apache-2.0
language:
  - en
  - zh
  - ja
  - ko
  - de
  - fr
  - es
  - pt
  - ru
  - it
tags:
  - tts
  - voice-cloning
  - text-to-speech
  - qor
base_model: Qwen/Qwen3-TTS-12Hz-1.7B-Base
pipeline_tag: text-to-speech

QOR-TTS-1.7B

QOR-TTS 1.7B — High quality local voice cloning (~4 GB)

About

QOR-TTS is a voice cloning text-to-speech model, part of the QOR AI system. It enables local, offline voice cloning — record a short voice sample and generate speech in that voice.

This model is based on Qwen/Qwen3-TTS-12Hz-1.7B-Base by Alibaba Cloud (Apache 2.0 license), repackaged for easy use with the QOR Voice Studio.

Quick Start

# With QOR Voice Studio (recommended):
# 1. Start QOR server: python -m qor serve
# 2. Open Voice Studio in browser
# 3. Go to Models tab → Download → Load
# 4. Record your voice → Test Clone

# Direct Python usage:
from qor.qwen_tts import QwenTTSEngine

engine = QwenTTSEngine()
engine.load_model("1.7B")
wav_path, duration = engine.generate(
    text="Hello, this is my cloned voice!",
    reference_audio="my_voice_sample.wav",
    reference_text="What I said in the sample",
    language="en",
)

Model Details

Property	Value
Parameters	1.7 billion
Size	~4 GB
License	Apache 2.0
Languages	English, Chinese, Japanese, Korean, German, French, Spanish, Portuguese, Russian, Italian
Supports	Voice cloning, text-to-speech, instruction-guided delivery
Device	CUDA, DirectML (Windows), CPU
Based on	Qwen/Qwen3-TTS-12Hz-1.7B-Base

Attribution

This model is a redistribution of Qwen/Qwen3-TTS-12Hz-1.7B-Base by Alibaba Cloud / Qwen Team. Original model released under Apache 2.0 License.

We thank the Qwen team for their excellent work on voice synthesis.

License

Apache 2.0 — same as the original model. See LICENSE for full text.