fxstar1128/talentdev_01

Custom voice LoRA fine-tuned Qwen3-TTS model trained by fxstar1128 on specialized voice data. This checkpoint represents a targeted adaptation of the Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign base model using Low-Rank Adaptation (LoRA) techniques for enhanced voice characteristics.

Model Overview

This model was fine-tuned with:

  • LoRA rank: 32
  • LoRA alpha: 64
  • Training strategy: Single-epoch targeted voice optimization
  • Base architecture: Qwen3-TTS with merged LoRA weights
  • Output format: 24kHz mono WAV

The fine-tuning focused on capturing specific voice characteristics while maintaining the naturalness and expressiveness of the base Qwen3-TTS architecture.

Quick Start

Installation

pip install qwen-tts transformers torch soundfile

Basic Usage

from qwen_tts import Qwen3TTSModel
import soundfile as sf

# Load the fine-tuned model
model = Qwen3TTSModel.from_pretrained("fxstar1128/talentdev_01")

# Generate speech
audio, sample_rate = model.generate_voice_design(
    text="Hello, this is a demonstration of the fine-tuned voice model.",
    instruct="A natural speaking voice, clear and expressive.",
    language="english",
)

# Save output
sf.write("output.wav", audio[0], sample_rate)

Training Details

This model was trained using the following configuration:

  • Optimizer: AdamW (lr=2.5e-9)
  • Batch size: 2
  • Gradient accumulation: 4 steps
  • Max gradient norm: 1.0
  • Trainable parameters: ~46.8M (2.37% of total)
  • Dataset: Custom voice dataset with Qwen audio codes
  • Speaker embedding: Custom projection layer (1024 → 2048)

The LoRA adapters were merged back into the base weights, so this model runs at full inference speed with no PEFT overhead.

Prompt Engineering

The model inherits the prompt-following capabilities of the base Qwen3-TTS model. Effective prompts typically include:

Voice characteristics:

  • Gender and age indicators ("a young woman", "an older man")
  • Speaking style ("conversational", "professional", "warm")
  • Emotional tone ("calm", "enthusiastic", "thoughtful")

Example prompts:

A clear, natural voice speaking conversationally.
A professional speaker with measured pacing.
A warm, friendly voice with subtle expressiveness.
A calm narrator with natural intonation.

Model Architecture

  • Base: Qwen3-TTS Voice Design model
  • LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Modules saved: talker.model.codec_embedding, speaker_embedding_projection
  • Attention: SDPA (Scaled Dot-Product Attention)
  • Mixed precision: bfloat16

Performance Characteristics

Strengths:

  • Natural voice quality with fine-tuned characteristics
  • Fast inference (merged weights, no adapter overhead)
  • Consistent voice across different prompts
  • Maintains base model's expressiveness

Considerations:

  • Optimized for specific voice characteristics learned during training
  • Best results with prompts similar to training style
  • English language focused

Files Included

model.safetensors            # Merged model weights (base + LoRA)
config.json                  # Model configuration
tokenizer.json               # Text tokenizer
speech_tokenizer/            # Audio codec components
vocence_config.yaml          # Runtime configuration
chute_config.yml             # Deployment configuration
miner.py                     # Vocence integration
demo.py                      # Example inference script

Deployment

This model is compatible with:

  • Bittensor SN78 (Vocence) subnet miners
  • Chutes TEE deployment framework
  • Standard Hugging Face Transformers pipeline
  • Direct qwen-tts inference

License & Attribution

License: CC BY-NC-SA 4.0 (Non-Commercial)

Base model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign Fine-tuned by: fxstar1128
Framework: Qwen3-TTS by Alibaba

This model is intended for research and non-commercial applications only.

Citation

@misc{fxstar1128_talentdev01,
  author = {fxstar1128},
  title = {talentdev_01: LoRA Fine-tuned Qwen3-TTS Voice Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/fxstar1128/talentdev_01}},
}

Built with Qwen3-TTS • Trained with LoRA • Deployed on Vocence

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shiningstar1128/seal-trainer-v02

Finetuned
(32)
this model