Duplicated from Aynursusuz/Qwen-TTS-Best-Model

simone00
/

Qwen-TTS-Best-Model

Model card Files Files and versions

Qwen-TTS-Best-Model / README.md

simone00's picture

Duplicate from Aynursusuz/Qwen-TTS-Best-Model

5151a29 20 days ago

|

history blame contribute delete

1.5 kB

	---
	language:
	- it
	license: apache-2.0
	tags:
	- text-to-speech
	- tts
	- italian
	- qwen
	- audio
	pipeline_tag: text-to-speech
	---

	# Italian TTS Model - Qwen3-TTS Fine-tuned

	This is an Italian Text-to-Speech model fine-tuned from Qwen3-TTS-12Hz-1.7B-Base.

	## Model Details

	- Base Model: Qwen3-TTS-12Hz-1.7B-Base
	- Language: Italian (Italiano)
	- Training Data: 115,000 Italian audio samples
	- Fine-tuning Parameters:
	- Batch size: 8
	- Learning rate: 1e-5
	- Epochs: 10
	- Gradient accumulation: 4
	- Mixed precision: bf16

	## Usage

	```python
	import torch
	from qwen_tts.inference.qwen3_tts_model import Qwen3TTSModel

	# Load model
	model = Qwen3TTSModel.from_pretrained(
	"Aynursusuz/Qwen-TTS-Best-Model",
	torch_dtype=torch.bfloat16,
	)

	# Generate speech
	text = "Buongiorno, come stai oggi?"
	audio = model.inference(
	text=text,
	speaker="italian_voice"
	)
	```

	## Training Details

	This model was trained on a high-quality Italian speech dataset with the following configuration:
	- GPU: NVIDIA A100 80GB
	- Training time: ~15-20 hours
	- Optimizer: AdamW with weight decay 0.01
	- Best checkpoint selected based on validation loss

	## Citation

	```bibtex
	@misc{qwen-tts-italian-2026,
	author = {Aynur},
	title = {Italian TTS Model based on Qwen3-TTS},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Aynursusuz/Qwen-TTS-Best-Model}}
	}
	```

	## License

	This model inherits the Apache 2.0 license from the base Qwen3-TTS model.