XTTS-v2-vietnamse / README.md
thivux's picture
Update README.md
c39fa77 verified
metadata
license: bsd-3-clause
datasets:
  - thivux/phoaudiobook
language:
  - vi
base_model:
  - coqui/XTTS-v2
pipeline_tag: text-to-speech

XTTS-v2

This repo contains the XTTS-v2 model checkpoint finetuned with PhoAudiobook dataset for Vietnamese. Details of the finetuning process and experimental results can be found in our ACL 2025 paper, "Zero-Shot Text-to-Speech for Vietnamese". If you use this model in your work, please cite the paper:

@inproceedings{vu2025zeroshottexttospeechvietnamese,
      title={Zero-Shot Text-to-Speech for Vietnamese}, 
      author={Thi Vu and Linh The Nguyen and Dat Quoc Nguyen},
      year={2025},
      booktitle={Proceedings of ACL},
}

How to run

# install coqui TTS
pip install TTS

# run inference
python infer.py \
--xtts_checkpoint best_model.pth \
--xtts_config config.json \
--xtts_vocab vocab.json \
--speaker_audio /path/to/your_ref.wav \
--lang vi \
--text "Nếu chỉ còn một ngày để sống tôi xin làm một bông hoa đẹp." \
--output output.wav