| | --- |
| | license: apache-2.0 |
| | language: |
| | - th |
| | --- |
| | # My first TTS |
| |
|
| | [Finetune colab](https://colab.research.google.com/drive/1FdCg-fjwiwrkAHXXqYGWq--Lmz_J10NI?usp=sharing) |
| |
|
| |
|
| | ## Example Code |
| |
|
| | ```python |
| | import torch |
| | from transformers import VitsTokenizer, VitsModel, set_seed |
| | import scipy.io.wavfile |
| | |
| | device = "cuda" if torch.cuda.is_available() else "cpu" |
| | |
| | model = VitsModel.from_pretrained("meguscx/VITS-TH-Model").to(device) |
| | tokenizer = VitsTokenizer.from_pretrained("meguscx/VITS-TH-Model") |
| | |
| | text = "การเรียนรู้ภาษาใหม่ช่วยเปิดโลกทัศน์ให้กว้างขึ้น" |
| | |
| | inputs = tokenizer(text=text, return_tensors="pt").to(device) |
| | |
| | set_seed(456) |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | |
| | waveform = outputs.waveform[0].cpu().numpy() |
| | |
| | scipy.io.wavfile.write( |
| | "test.wav", |
| | rate=model.config.sampling_rate, |
| | data=waveform |
| | ) |
| | |
| | print("Saved successfully.") |
| | ``` |
| |
|
| | this model train 6 hr with only 1094 voice data (single speaker) |
| | so voice maybe not too good and sometimes sound weird or unnatural |
| | because dataset is small ;-; |
| |
|
| |
|
| | ## Sample Audio |
| | <audio controls> |
| | <source src="https://huggingface.co/meguscx/VITS-TH-Model/resolve/main/test.wav" type="audio/wav"> |
| | Your browser does not support the audio element. |
| | </audio> |