medallo commited on
Commit
2675c18
·
verified ·
1 Parent(s): ff5aea1

Upload 10 files

Browse files
Files changed (10) hide show
  1. Dockerfile +44 -0
  2. README.md +88 -14
  3. apple_silicon_requirements.txt +189 -0
  4. gitattributes +35 -0
  5. install.bat +10 -0
  6. install.sh +13 -0
  7. requirements.txt +7 -0
  8. start-container.sh +6 -0
  9. start.bat +5 -0
  10. start.sh +9 -0
Dockerfile ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # syntax=docker/dockerfile:1
2
+ FROM python:3.11-slim-bookworm AS base
3
+
4
+ ARG APP_NAME=xtts-finetune-webui
5
+ ARG CUDA_VER=cu121
6
+ ARG GID=966
7
+ ARG UID=966
8
+ ARG WHISPER_MODEL="large-v3"
9
+
10
+ # Environment
11
+ ENV APP_NAME=$APP_NAME \
12
+ CUDA_VER=$CUDA_VER \
13
+ WHISPER_MODEL=$WHISPER_MODEL
14
+
15
+ # User configuration
16
+ ENV HOME /app/$APP_NAME
17
+ RUN groupadd -r app -g $GID && \
18
+ useradd --no-log-init -m -r -g app app -u $UID
19
+
20
+ # Prepare file-system
21
+ RUN mkdir -p /app/server && chown -R $UID:$GID /app
22
+ COPY --chown=$UID:$GID *.py *.sh *.txt *.md /app/server/
23
+ ADD --chown=$UID:$GID utils /app/server/utils
24
+
25
+ # Enter environment and install dependencies
26
+ WORKDIR /app/server
27
+
28
+ USER $UID:$GID
29
+
30
+ ENV NVIDIA_VISIBLE_DEVICES=all PATH=$PATH:$HOME/.local/bin
31
+ # Install nvidia-pyindex & nvidia-cudnn for libcudnn_ops_infer.so.8
32
+ # See: https://github.com/SYSTRAN/faster-whisper/issues/516
33
+ RUN pip3 install --user --no-cache-dir nvidia-pyindex && \
34
+ pip3 install --user --no-cache-dir nvidia-cudnn && \
35
+ pip3 install --user --no-cache-dir torch torchvision torchaudio \
36
+ --index-url https://download.pytorch.org/whl/$CUDA_VER && \
37
+ pip3 install --user --no-cache-dir -r requirements.txt --no-cache-dir && \
38
+ python3 -c "import os; from faster_whisper import WhisperModel; WhisperModel(os.environ['WHISPER_MODEL'], device='cpu', compute_type='int8')"
39
+
40
+ # Ports and servername
41
+ EXPOSE 5003
42
+ ENV GRADIO_ANALYTICS_ENABLED="False"
43
+
44
+ CMD [ "bash", "start-container.sh"]
README.md CHANGED
@@ -1,14 +1,88 @@
1
- ---
2
- title: Xttsv2
3
- emoji: 🏃
4
- colorFrom: purple
5
- colorTo: gray
6
- sdk: gradio
7
- sdk_version: 5.27.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: v2
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # xtts-finetune-webui
2
+
3
+ This webui is a slightly modified copy of the [official webui](https://github.com/coqui-ai/TTS/pull/3296) for finetune xtts.
4
+
5
+ If you are looking for an option for normal XTTS use look here [https://github.com/daswer123/xtts-webui](https://github.com/daswer123/xtts-webui)
6
+
7
+ ## TODO
8
+ - [ ] Add the ability to use via console
9
+
10
+ ## Key features:
11
+
12
+ ### Data processing
13
+
14
+ 1. Updated faster-whisper to 0.10.0 with the ability to select a larger-v3 model.
15
+ 2. Changed output folder to output folder inside the main folder.
16
+ 3. If there is already a dataset in the output folder and you want to add new data, you can do so by simply adding new audio, what was there will not be processed again and the new data will be automatically added
17
+ 4. Turn on VAD filter
18
+ 5. After the dataset is created, a file is created that specifies the language of the dataset. This file is read before training so that the language always matches. It is convenient when you restart the interface
19
+
20
+ ### Fine-tuning XTTS Encoder
21
+
22
+ 1. Added the ability to select the base model for XTTS, as well as when you re-training does not need to download the model again.
23
+ 2. Added ability to select custom model as base model during training, which will allow finetune already finetune model.
24
+ 3. Added possibility to get optimized version of the model for 1 click ( step 2.5, put optimized version in output folder).
25
+ 4. You can choose whether to delete training folders after you have optimized the model
26
+ 5. When you optimize the model, the example reference audio is moved to the output folder
27
+ 6. Checking for correctness of the specified language and dataset language
28
+
29
+ ### Inference
30
+
31
+ 1. Added possibility to customize infer settings during model checking.
32
+
33
+ ### Other
34
+
35
+ 1. If you accidentally restart the interface during one of the steps, you can load data to additional buttons
36
+ 2. Removed the display of logs as it was causing problems when restarted
37
+ 3. The finished result is copied to the ready folder, these are fully finished files, you can move them anywhere and use them as a standard model
38
+ 4. Added support for finetune Japanese
39
+
40
+ ## Changes in webui
41
+
42
+ ### 1 - Data processing
43
+
44
+ ![image](https://github.com/daswer123/xtts-finetune-webui/assets/22278673/8f09b829-098b-48f5-9668-832e7319403b)
45
+
46
+ ### 2 - Fine-tuning XTTS Encoder
47
+
48
+ ![image](https://github.com/daswer123/xtts-finetune-webui/assets/22278673/897540d9-3a6b-463c-abb8-261c289cc929)
49
+
50
+ ### 3 - Inference
51
+
52
+ ![image](https://github.com/daswer123/xtts-finetune-webui/assets/22278673/aa05bcd4-8642-4de4-8f2f-bc0f5571af63)
53
+
54
+ ## Google colab
55
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/xtts-finetune-webui/blob/main/notebook/xtts_finetune_webui.ipynb)
56
+
57
+ ## 🐳 Run in Docker
58
+ ```docker
59
+ docker run -it --gpus all --pull always -p 7860:7860 --platform=linux/amd64 athomasson2/fine_tune_xtts:huggingface python app.py
60
+ ```
61
+
62
+
63
+
64
+ ## Install
65
+
66
+ 1. Make sure you have `Cuda` installed
67
+ 2. `git clone https://github.com/daswer123/xtts-finetune-webui`
68
+ 3. `cd xtts-finetune-webui`
69
+ 4. `pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118`
70
+ 5. `pip install -r requirements.txt`
71
+
72
+ ### If you're using Windows
73
+
74
+ 1. First start `install.bat`
75
+ 2. To start the server start `start.bat`
76
+ 3. Go to the local address `127.0.0.1:5003`
77
+
78
+ ### On Linux
79
+
80
+ 1. Run `bash install.sh`
81
+ 2. To start the server start `start.sh`
82
+ 3. Go to the local address `127.0.0.1:5003`
83
+
84
+ ### On Apple Silicon Mac (python 3.10 env)
85
+ 1. Run `pip install --no-deps -r apple_silicon_requirements.txt`
86
+ 2. To start the server `python xtts_demo.py`
87
+ 3. Go to the local address `127.0.0.1:5003`
88
+ ~
apple_silicon_requirements.txt ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ absl-py==2.1.0
2
+ aiofiles==23.2.1
3
+ aiohttp==3.9.5
4
+ aiosignal==1.3.1
5
+ altair==5.3.0
6
+ annotated-types==0.7.0
7
+ anyascii==0.3.2
8
+ anyio==3.7.1
9
+ async-timeout==4.0.3
10
+ attrs==23.2.0
11
+ audioread==3.0.1
12
+ av==12.2.0
13
+ Babel==2.15.0
14
+ bangla==0.0.2
15
+ blinker==1.8.2
16
+ blis==0.7.11
17
+ bnnumerizer==0.0.2
18
+ bnunicodenormalizer==0.1.7
19
+ catalogue==2.0.10
20
+ certifi==2024.7.4
21
+ cffi==1.16.0
22
+ charset-normalizer==3.3.2
23
+ click==8.1.7
24
+ cloudpathlib==0.16.0
25
+ colorama==0.4.6
26
+ coloredlogs==15.0.1
27
+ confection==0.1.5
28
+ contourpy==1.2.1
29
+ coqpit==0.0.17
30
+ coqui-tts==0.24.2
31
+ coqui-tts-trainer==0.1.4
32
+ ctranslate2==4.3.1
33
+ cutlet==0.4.0
34
+ cycler==0.12.1
35
+ cymem==2.0.8
36
+ Cython==3.0.10
37
+ dateparser==1.1.8
38
+ decorator==5.1.1
39
+ dnspython==2.6.1
40
+ docopt==0.6.2
41
+ einops==0.8.0
42
+ email_validator==2.2.0
43
+ encodec==0.1.1
44
+ exceptiongroup==1.2.2
45
+ fastapi==0.103.1
46
+ fastapi-cli==0.0.4
47
+ faster-whisper==1.0.2
48
+ ffmpy==0.3.2
49
+ filelock==3.15.4
50
+ Flask==3.0.3
51
+ flatbuffers==24.3.25
52
+ fonttools==4.53.1
53
+ frozenlist==1.4.1
54
+ fsspec==2024.6.1
55
+ fugashi==1.3.2
56
+ g2pkk==0.1.2
57
+ gradio==4.44.1
58
+ gradio_client==1.3.0
59
+ grpcio==1.64.1
60
+ gruut==2.4.0
61
+ gruut-ipa==0.13.0
62
+ gruut_lang_de==2.0.1
63
+ gruut_lang_en==2.0.1
64
+ gruut_lang_es==2.0.1
65
+ gruut_lang_fr==2.0.2
66
+ h11==0.14.0
67
+ hangul-romanize==0.1.0
68
+ httpcore==1.0.5
69
+ httptools==0.6.1
70
+ httpx==0.27.0
71
+ huggingface-hub==0.23.5
72
+ humanfriendly==10.0
73
+ idna==3.7
74
+ importlib_resources==6.4.0
75
+ inflect==7.3.1
76
+ itsdangerous==2.2.0
77
+ jaconv==0.4.0
78
+ jamo==0.4.1
79
+ jieba==0.42.1
80
+ Jinja2==3.1.4
81
+ joblib==1.4.2
82
+ jsonlines==1.2.0
83
+ jsonschema==4.23.0
84
+ jsonschema-specifications==2023.12.1
85
+ kiwisolver==1.4.5
86
+ langcodes==3.4.0
87
+ language_data==1.2.0
88
+ lazy_loader==0.4
89
+ librosa==0.10.2.post1
90
+ llvmlite==0.43.0
91
+ marisa-trie==1.2.0
92
+ Markdown==3.6
93
+ markdown-it-py==3.0.0
94
+ MarkupSafe==2.1.5
95
+ matplotlib==3.8.4
96
+ mdurl==0.1.2
97
+ mecab-python3==1.0.9
98
+ mojimoji==0.0.13
99
+ more-itertools==10.3.0
100
+ mpmath==1.3.0
101
+ msgpack==1.0.8
102
+ multidict==6.0.5
103
+ murmurhash==1.0.10
104
+ networkx==2.8.8
105
+ nltk==3.8.1
106
+ num2words==0.5.13
107
+ numba==0.60.0
108
+ numpy==1.26.4
109
+ onnxruntime==1.18.1
110
+ orjson==3.10.6
111
+ packaging==24.1
112
+ pandas==1.5.3
113
+ pillow==10.4.0
114
+ platformdirs==4.2.2
115
+ pooch==1.8.2
116
+ preshed==3.0.9
117
+ protobuf==4.25.3
118
+ psutil==6.0.0
119
+ pycparser==2.22
120
+ pydantic==2.3.0
121
+ pydantic_core==2.6.3
122
+ pydub==0.25.1
123
+ pygame==2.6.0
124
+ Pygments==2.18.0
125
+ pynndescent==0.5.13
126
+ pyparsing==3.1.2
127
+ pypinyin==0.51.0
128
+ pysbd==0.3.4
129
+ python-crfsuite==0.9.10
130
+ python-dateutil==2.9.0.post0
131
+ python-dotenv==1.0.1
132
+ python-multipart==0.0.9
133
+ pytz==2024.1
134
+ PyYAML==6.0.1
135
+ referencing==0.35.1
136
+ regex==2024.5.15
137
+ requests==2.32.3
138
+ rich==13.7.1
139
+ rpds-py==0.19.0
140
+ ruff==0.5.2
141
+ safetensors==0.4.3
142
+ scikit-learn==1.5.1
143
+ scipy==1.11.4
144
+ semantic-version==2.10.0
145
+ shellingham==1.5.4
146
+ six==1.16.0
147
+ smart-open==6.4.0
148
+ sniffio==1.3.1
149
+ soundfile==0.12.1
150
+ soxr==0.3.7
151
+ spacy==3.7.4
152
+ spacy-legacy==3.0.12
153
+ spacy-loggers==1.0.5
154
+ srsly==2.4.8
155
+ starlette==0.27.0
156
+ SudachiDict-core==20240409
157
+ SudachiPy==0.6.8
158
+ sympy==1.13.0
159
+ tensorboard==2.17.0
160
+ tensorboard-data-server==0.7.2
161
+ thinc==8.2.5
162
+ threadpoolctl==3.5.0
163
+ tokenizers==0.19.1
164
+ tomlkit==0.12.0
165
+ toolz==0.12.1
166
+ torch==2.3.1
167
+ torchaudio==2.3.1
168
+ tqdm==4.66.4
169
+ trainer==0.0.36
170
+ transformers==4.42.4
171
+ TTS==0.21.3
172
+ typeguard==4.3.0
173
+ typer==0.12.5
174
+ typing_extensions==4.12.2
175
+ tzdata==2024.1
176
+ tzlocal==5.2
177
+ umap-learn==0.5.6
178
+ Unidecode==1.3.8
179
+ unidic-lite==1.0.8
180
+ urllib3==2.2.2
181
+ uvicorn==0.30.1
182
+ uvloop==0.19.0
183
+ wasabi==1.1.3
184
+ watchfiles==0.22.0
185
+ weasel==0.3.4
186
+ websockets==11.0.3
187
+ Werkzeug==3.0.3
188
+ wrapt==1.16.0
189
+ yarl==1.9.4
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
install.bat ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+
3
+ python -m venv venv
4
+ call venv/scripts/activate
5
+
6
+
7
+ pip install -r .\requirements.txt
8
+ pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
9
+
10
+ python xtts_demo.py
install.sh ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Create a Python virtual environment
4
+ python -m venv venv
5
+ # Activate the virtual environment
6
+ source venv/bin/activate
7
+
8
+ # Install other dependencies from requirements.txt
9
+ pip install -r requirements.txt
10
+ pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
11
+
12
+ python xtts_demo.py
13
+
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ faster_whisper==1.0.3
2
+ gradio==5.1.0
3
+ spacy==3.7.5
4
+ coqui-tts[languages] == 0.24.2
5
+
6
+ cutlet
7
+ fugashi[unidic-lite]
start-container.sh ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Enable resolution of libcudnn_ops_infer.so.8
4
+ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/app/xtts-finetune-webui/.local/lib/python3.11/site-packages/torch/lib:/app/xtts-finetune-webui/.local/lib/python3.11/site-packages/nvidia/cudnn/lib"
5
+
6
+ python3 xtts_demo.py
start.bat ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ @echo off
2
+
3
+ call venv/scripts/activate
4
+
5
+ python xtts_demo.py
start.sh ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Create a Python virtual environment
4
+ python -m venv venv
5
+ # Activate the virtual environment
6
+ source venv/bin/activate
7
+
8
+ python xtts_demo.py
9
+