| | --- |
| | language: |
| | - it |
| | license: apache-2.0 |
| | tags: |
| | - text-to-speech |
| | - tts |
| | - italian |
| | - qwen |
| | - audio |
| | pipeline_tag: text-to-speech |
| | --- |
| | |
| | # Italian TTS Model - Qwen3-TTS Fine-tuned |
| |
|
| | This is an Italian Text-to-Speech model fine-tuned from Qwen3-TTS-12Hz-1.7B-Base. |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model:** Qwen3-TTS-12Hz-1.7B-Base |
| | - **Language:** Italian (Italiano) |
| | - **Training Data:** 115,000 Italian audio samples |
| | - **Fine-tuning Parameters:** |
| | - Batch size: 8 |
| | - Learning rate: 1e-5 |
| | - Epochs: 10 |
| | - Gradient accumulation: 4 |
| | - Mixed precision: bf16 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | import torch |
| | from qwen_tts.inference.qwen3_tts_model import Qwen3TTSModel |
| | |
| | # Load model |
| | model = Qwen3TTSModel.from_pretrained( |
| | "Aynursusuz/Qwen-TTS-Best-Model", |
| | torch_dtype=torch.bfloat16, |
| | ) |
| | |
| | # Generate speech |
| | text = "Buongiorno, come stai oggi?" |
| | audio = model.inference( |
| | text=text, |
| | speaker="italian_voice" |
| | ) |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | This model was trained on a high-quality Italian speech dataset with the following configuration: |
| | - GPU: NVIDIA A100 80GB |
| | - Training time: ~15-20 hours |
| | - Optimizer: AdamW with weight decay 0.01 |
| | - Best checkpoint selected based on validation loss |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{qwen-tts-italian-2026, |
| | author = {Aynur}, |
| | title = {Italian TTS Model based on Qwen3-TTS}, |
| | year = {2026}, |
| | publisher = {Hugging Face}, |
| | howpublished = {\url{https://huggingface.co/Aynursusuz/Qwen-TTS-Best-Model}} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model inherits the Apache 2.0 license from the base Qwen3-TTS model. |
| |
|