nmcuong commited on
Commit
6f84009
·
verified ·
1 Parent(s): 2d7bca1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md CHANGED
@@ -1,3 +1,101 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - doof-ferb/infore1_25hours
5
  ---
6
+ <div align="center">
7
+ <div>&nbsp;</div>
8
+ <img src="logo.png" width="300"/> <br>
9
+ <a href="https://trendshift.io/repositories/8133" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8133" alt="myshell-ai%2FMeloTTS | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
10
+ </div>
11
+
12
+ ## Introduction
13
+ MeloTTS Vietnamese is a version of MeloTTS optimized for the Vietnamese language. This version inherits the high-quality characteristics of the original model but has been specially adjusted to work well with the Vietnamese language.
14
+
15
+ ## Technical Features
16
+ - Uses [underthesea](https://github.com/undertheseanlp/underthesea) for Vietnamese text segmentation
17
+ - Integrates [PhoBert](https://github.com/VinAIResearch/PhoBERT) (vinai/phobert-base-v2) to extract Vietnamese language features
18
+ - Fully supports Vietnamese language characteristics:
19
+ - 45 symbols (phonemes)
20
+ - 8 tones (7 tonal marks and 1 unmarked tone)
21
+ - All defined in `melo/text/symbols.py`
22
+ - Text-to-phoneme conversion source:
23
+ - Based on [Text2PhonemeSequence](https://github.com/thelinhbkhn2014/Text2PhonemeSequence) library
24
+ - An improved version with higher performance has been developed at [Text2PhonemeFast](https://github.com/manhcuong02/Text2PhonemeFast)
25
+
26
+ ## Fine-tuning from Base Model
27
+ This model was fine-tuned from the base MeloTTS model by:
28
+ - Replacing phonemes not found in English and Vietnamese with Vietnamese phonemes
29
+ - Specifically replacing Korean phonemes with corresponding Vietnamese phonemes
30
+ - Adjusting parameters to match Vietnamese phonetic characteristics
31
+
32
+ ## Training Data
33
+ - The model was trained on the Infore dataset, consisting of approximately 25 hours of speech
34
+ - Note on data quality: This dataset has several limitations including poor voice quality, lack of punctuation, and inaccurate phonetic transcriptions. However, when trained on internal data, the results were much better.
35
+
36
+ ## Downloading the Model
37
+ The pre-trained model can be downloaded from Hugging Face:
38
+ - [MeloTTS Vietnamese on Hugging Face](https://huggingface.co/nmcuong/MeloTTS_Vietnamese)
39
+
40
+ ## Usage Guide
41
+
42
+ ### Data Preparation
43
+ The data preparation process is detailed in `docs/training.md`. Basically, you need:
44
+ - Audio files (recommended to use 44100Hz format)
45
+ - Metadata file with the format:
46
+ ```
47
+ path/to/audio_001.wav |<speaker_name>|<language_code>|<text_001>
48
+ path/to/audio_002.wav |<speaker_name>|<language_code>|<text_002>
49
+ ```
50
+
51
+ ### Data Preprocessing
52
+ To process data, use the command:
53
+ ```bash
54
+ python melo/preprocess_text.py --metadata /path/to/text_training.list --config_path /path/to/config.json --device cuda:0 --val-per-spk 10 --max-val-total 500
55
+ ```
56
+ or use the script `melo/preprocess_text.sh` with appropriate parameters.
57
+
58
+ ### Using the Model
59
+ Refer to the notebook `test_infer.ipynb` to learn how to use the model:
60
+ ```python
61
+ # colab_infer.py
62
+ from melo.api import TTS
63
+
64
+ # Speed is adjustable
65
+ speed = 1.0
66
+
67
+ # CPU is sufficient for real-time inference.
68
+ # You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
69
+ device = "cuda:0" # Will automatically use GPU if available
70
+
71
+ # English
72
+ model = TTS(
73
+ language="VI",
74
+ device=device,
75
+ config_path="/path/to/config.json",
76
+ ckpt_path="/path/to/G_model.pth",
77
+ )
78
+ speaker_ids = model.hps.data.spk2id
79
+
80
+ # Convert text to speech
81
+ text = "Nhập văn bản tại đây"
82
+ speaker_ids = model.hps.data.spk2id
83
+ output_path = "output.wav"
84
+ model.tts_to_file(text, speaker_ids["speaker_name"], output_path, speed=1.0, quiet=True)
85
+ ```
86
+
87
+ ## Audio Examples
88
+ Listen to sample outputs from the model:
89
+
90
+ ### Sample Audio
91
+ <audio controls>
92
+ <source src="samples/sample.wav" type="audio/wav">
93
+ Your browser does not support the audio element.
94
+ </audio>
95
+
96
+ ## License
97
+ This project follows the MIT License, like the original MeloTTS project, allowing use for both commercial and non-commercial purposes.
98
+
99
+ ## Acknowledgements
100
+
101
+ This implementation is based on [TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), [VITS2](https://github.com/daniilrobnikov/vits2) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work.