utkarsh2299 commited on
Commit
5f70fe0
·
verified ·
1 Parent(s): f18b351

Upload 12 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ license.pdf filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Attribution 4.0 International
2
+
3
+ =======================================================================
4
+
5
+ Creative Commons Corporation ("Creative Commons") is not a law firm and
6
+ does not provide legal services or legal advice. Distribution of
7
+ Creative Commons public licenses does not create a lawyer-client or
8
+ other relationship. Creative Commons makes its licenses and related
9
+ information available on an "as-is" basis. Creative Commons gives no
10
+ warranties regarding its licenses, any material licensed under their
11
+ terms and conditions, or any related information. Creative Commons
12
+ disclaims all liability for damages resulting from their use to the
13
+ fullest extent possible.
14
+
15
+ Using Creative Commons Public Licenses
16
+
17
+ Creative Commons public licenses provide a standard set of terms and
18
+ conditions that creators and other rights holders may use to share
19
+ original works of authorship and other material subject to copyright
20
+ and certain other rights specified in the public license below. The
21
+ following considerations are for informational purposes only, are not
22
+ exhaustive, and do not form part of our licenses.
23
+
24
+ Considerations for licensors: Our public licenses are
25
+ intended for use by those authorized to give the public
26
+ permission to use material in ways otherwise restricted by
27
+ copyright and certain other rights. Our licenses are
28
+ irrevocable. Licensors should read and understand the terms
29
+ and conditions of the license they choose before applying it.
30
+ Licensors should also secure all rights necessary before
31
+ applying our licenses so that the public can reuse the
32
+ material as expected. Licensors should clearly mark any
33
+ material not subject to the license. This includes other CC-
34
+ licensed material, or material used under an exception or
35
+ limitation to copyright. More considerations for licensors:
36
+ wiki.creativecommons.org/Considerations_for_licensors
37
+
38
+ Considerations for the public: By using one of our public
39
+ licenses, a licensor grants the public permission to use the
40
+ licensed material under specified terms and conditions. If
41
+ the licensor's permission is not necessary for any reason--for
42
+ example, because of any applicable exception or limitation to
43
+ copyright--then that use is not regulated by the license. Our
44
+ licenses grant only permissions under copyright and certain
45
+ other rights that a licensor has authority to grant. Use of
46
+ the licensed material may still be restricted for other
47
+ reasons, including because others have copyright or other
48
+ rights in the material. A licensor may make special requests,
49
+ such as asking that all changes be marked or described.
50
+ Although not required by our licenses, you are encouraged to
51
+ respect those requests where reasonable. More_considerations
52
+ for the public:
53
+ wiki.creativecommons.org/Considerations_for_licensees
54
+
55
+ =======================================================================
56
+
57
+ Creative Commons Attribution 4.0 International Public License
58
+
59
+ By exercising the Licensed Rights (defined below), You accept and agree
60
+ to be bound by the terms and conditions of this Creative Commons
61
+ Attribution 4.0 International Public License ("Public License"). To the
62
+ extent this Public License may be interpreted as a contract, You are
63
+ granted the Licensed Rights in consideration of Your acceptance of
64
+ these terms and conditions, and the Licensor grants You such rights in
65
+ consideration of benefits the Licensor receives from making the
66
+ Licensed Material available under these terms and conditions.
67
+
68
+
69
+ Section 1 -- Definitions.
70
+
71
+ a. Adapted Material means material subject to Copyright and Similar
72
+ Rights that is derived from or based upon the Licensed Material
73
+ and in which the Licensed Material is translated, altered,
74
+ arranged, transformed, or otherwise modified in a manner requiring
75
+ permission under the Copyright and Similar Rights held by the
76
+ Licensor. For purposes of this Public License, where the Licensed
77
+ Material is a musical work, performance, or sound recording,
78
+ Adapted Material is always produced where the Licensed Material is
79
+ synched in timed relation with a moving image.
80
+
81
+ b. Adapter's License means the license You apply to Your Copyright
82
+ and Similar Rights in Your contributions to Adapted Material in
83
+ accordance with the terms and conditions of this Public License.
84
+
85
+ c. Copyright and Similar Rights means copyright and/or similar rights
86
+ closely related to copyright including, without limitation,
87
+ performance, broadcast, sound recording, and Sui Generis Database
88
+ Rights, without regard to how the rights are labeled or
89
+ categorized. For purposes of this Public License, the rights
90
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
91
+ Rights.
92
+
93
+ d. Effective Technological Measures means those measures that, in the
94
+ absence of proper authority, may not be circumvented under laws
95
+ fulfilling obligations under Article 11 of the WIPO Copyright
96
+ Treaty adopted on December 20, 1996, and/or similar international
97
+ agreements.
98
+
99
+ e. Exceptions and Limitations means fair use, fair dealing, and/or
100
+ any other exception or limitation to Copyright and Similar Rights
101
+ that applies to Your use of the Licensed Material.
102
+
103
+ f. Licensed Material means the artistic or literary work, database,
104
+ or other material to which the Licensor applied this Public
105
+ License.
106
+
107
+ g. Licensed Rights means the rights granted to You subject to the
108
+ terms and conditions of this Public License, which are limited to
109
+ all Copyright and Similar Rights that apply to Your use of the
110
+ Licensed Material and that the Licensor has authority to license.
111
+
112
+ h. Licensor means the individual(s) or entity(ies) granting rights
113
+ under this Public License.
114
+
115
+ i. Share means to provide material to the public by any means or
116
+ process that requires permission under the Licensed Rights, such
117
+ as reproduction, public display, public performance, distribution,
118
+ dissemination, communication, or importation, and to make material
119
+ available to the public including in ways that members of the
120
+ public may access the material from a place and at a time
121
+ individually chosen by them.
122
+
123
+ j. Sui Generis Database Rights means rights other than copyright
124
+ resulting from Directive 96/9/EC of the European Parliament and of
125
+ the Council of 11 March 1996 on the legal protection of databases,
126
+ as amended and/or succeeded, as well as other essentially
127
+ equivalent rights anywhere in the world.
128
+
129
+ k. You means the individual or entity exercising the Licensed Rights
130
+ under this Public License. Your has a corresponding meaning.
131
+
132
+
133
+ Section 2 -- Scope.
134
+
135
+ a. License grant.
136
+
137
+ 1. Subject to the terms and conditions of this Public License,
138
+ the Licensor hereby grants You a worldwide, royalty-free,
139
+ non-sublicensable, non-exclusive, irrevocable license to
140
+ exercise the Licensed Rights in the Licensed Material to:
141
+
142
+ a. reproduce and Share the Licensed Material, in whole or
143
+ in part; and
144
+
145
+ b. produce, reproduce, and Share Adapted Material.
146
+
147
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
148
+ Exceptions and Limitations apply to Your use, this Public
149
+ License does not apply, and You do not need to comply with
150
+ its terms and conditions.
151
+
152
+ 3. Term. The term of this Public License is specified in Section
153
+ 6(a).
154
+
155
+ 4. Media and formats; technical modifications allowed. The
156
+ Licensor authorizes You to exercise the Licensed Rights in
157
+ all media and formats whether now known or hereafter created,
158
+ and to make technical modifications necessary to do so. The
159
+ Licensor waives and/or agrees not to assert any right or
160
+ authority to forbid You from making technical modifications
161
+ necessary to exercise the Licensed Rights, including
162
+ technical modifications necessary to circumvent Effective
163
+ Technological Measures. For purposes of this Public License,
164
+ simply making modifications authorized by this Section 2(a)
165
+ (4) never produces Adapted Material.
166
+
167
+ 5. Downstream recipients.
168
+
169
+ a. Offer from the Licensor -- Licensed Material. Every
170
+ recipient of the Licensed Material automatically
171
+ receives an offer from the Licensor to exercise the
172
+ Licensed Rights under the terms and conditions of this
173
+ Public License.
174
+
175
+ b. No downstream restrictions. You may not offer or impose
176
+ any additional or different terms or conditions on, or
177
+ apply any Effective Technological Measures to, the
178
+ Licensed Material if doing so restricts exercise of the
179
+ Licensed Rights by any recipient of the Licensed
180
+ Material.
181
+
182
+ 6. No endorsement. Nothing in this Public License constitutes or
183
+ may be construed as permission to assert or imply that You
184
+ are, or that Your use of the Licensed Material is, connected
185
+ with, or sponsored, endorsed, or granted official status by,
186
+ the Licensor or others designated to receive attribution as
187
+ provided in Section 3(a)(1)(A)(i).
188
+
189
+ b. Other rights.
190
+
191
+ 1. Moral rights, such as the right of integrity, are not
192
+ licensed under this Public License, nor are publicity,
193
+ privacy, and/or other similar personality rights; however, to
194
+ the extent possible, the Licensor waives and/or agrees not to
195
+ assert any such rights held by the Licensor to the limited
196
+ extent necessary to allow You to exercise the Licensed
197
+ Rights, but not otherwise.
198
+
199
+ 2. Patent and trademark rights are not licensed under this
200
+ Public License.
201
+
202
+ 3. To the extent possible, the Licensor waives any right to
203
+ collect royalties from You for the exercise of the Licensed
204
+ Rights, whether directly or through a collecting society
205
+ under any voluntary or waivable statutory or compulsory
206
+ licensing scheme. In all other cases the Licensor expressly
207
+ reserves any right to collect such royalties.
208
+
209
+
210
+ Section 3 -- License Conditions.
211
+
212
+ Your exercise of the Licensed Rights is expressly made subject to the
213
+ following conditions.
214
+
215
+ a. Attribution.
216
+
217
+ 1. If You Share the Licensed Material (including in modified
218
+ form), You must:
219
+
220
+ a. retain the following if it is supplied by the Licensor
221
+ with the Licensed Material:
222
+
223
+ i. identification of the creator(s) of the Licensed
224
+ Material and any others designated to receive
225
+ attribution, in any reasonable manner requested by
226
+ the Licensor (including by pseudonym if
227
+ designated);
228
+
229
+ ii. a copyright notice;
230
+
231
+ iii. a notice that refers to this Public License;
232
+
233
+ iv. a notice that refers to the disclaimer of
234
+ warranties;
235
+
236
+ v. a URI or hyperlink to the Licensed Material to the
237
+ extent reasonably practicable;
238
+
239
+ b. indicate if You modified the Licensed Material and
240
+ retain an indication of any previous modifications; and
241
+
242
+ c. indicate the Licensed Material is licensed under this
243
+ Public License, and include the text of, or the URI or
244
+ hyperlink to, this Public License.
245
+
246
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
247
+ reasonable manner based on the medium, means, and context in
248
+ which You Share the Licensed Material. For example, it may be
249
+ reasonable to satisfy the conditions by providing a URI or
250
+ hyperlink to a resource that includes the required
251
+ information.
252
+
253
+ 3. If requested by the Licensor, You must remove any of the
254
+ information required by Section 3(a)(1)(A) to the extent
255
+ reasonably practicable.
256
+
257
+ 4. If You Share Adapted Material You produce, the Adapter's
258
+ License You apply must not prevent recipients of the Adapted
259
+ Material from complying with this Public License.
260
+
261
+
262
+ Section 4 -- Sui Generis Database Rights.
263
+
264
+ Where the Licensed Rights include Sui Generis Database Rights that
265
+ apply to Your use of the Licensed Material:
266
+
267
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268
+ to extract, reuse, reproduce, and Share all or a substantial
269
+ portion of the contents of the database;
270
+
271
+ b. if You include all or a substantial portion of the database
272
+ contents in a database in which You have Sui Generis Database
273
+ Rights, then the database in which You have Sui Generis Database
274
+ Rights (but not its individual contents) is Adapted Material; and
275
+
276
+ c. You must comply with the conditions in Section 3(a) if You Share
277
+ all or a substantial portion of the contents of the database.
278
+
279
+ For the avoidance of doubt, this Section 4 supplements and does not
280
+ replace Your obligations under this Public License where the Licensed
281
+ Rights include other Copyright and Similar Rights.
282
+
283
+
284
+ Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285
+
286
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296
+
297
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306
+
307
+ c. The disclaimer of warranties and limitation of liability provided
308
+ above shall be interpreted in a manner that, to the extent
309
+ possible, most closely approximates an absolute disclaimer and
310
+ waiver of all liability.
311
+
312
+
313
+ Section 6 -- Term and Termination.
314
+
315
+ a. This Public License applies for the term of the Copyright and
316
+ Similar Rights licensed here. However, if You fail to comply with
317
+ this Public License, then Your rights under this Public License
318
+ terminate automatically.
319
+
320
+ b. Where Your right to use the Licensed Material has terminated under
321
+ Section 6(a), it reinstates:
322
+
323
+ 1. automatically as of the date the violation is cured, provided
324
+ it is cured within 30 days of Your discovery of the
325
+ violation; or
326
+
327
+ 2. upon express reinstatement by the Licensor.
328
+
329
+ For the avoidance of doubt, this Section 6(b) does not affect any
330
+ right the Licensor may have to seek remedies for Your violations
331
+ of this Public License.
332
+
333
+ c. For the avoidance of doubt, the Licensor may also offer the
334
+ Licensed Material under separate terms or conditions or stop
335
+ distributing the Licensed Material at any time; however, doing so
336
+ will not terminate this Public License.
337
+
338
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339
+ License.
340
+
341
+
342
+ Section 7 -- Other Terms and Conditions.
343
+
344
+ a. The Licensor shall not be bound by any additional or different
345
+ terms or conditions communicated by You unless expressly agreed.
346
+
347
+ b. Any arrangements, understandings, or agreements regarding the
348
+ Licensed Material not stated herein are separate from and
349
+ independent of the terms and conditions of this Public License.
350
+
351
+
352
+ Section 8 -- Interpretation.
353
+
354
+ a. For the avoidance of doubt, this Public License does not, and
355
+ shall not be interpreted to, reduce, limit, restrict, or impose
356
+ conditions on any use of the Licensed Material that could lawfully
357
+ be made without permission under this Public License.
358
+
359
+ b. To the extent possible, if any provision of this Public License is
360
+ deemed unenforceable, it shall be automatically reformed to the
361
+ minimum extent necessary to make it enforceable. If the provision
362
+ cannot be reformed, it shall be severed from this Public License
363
+ without affecting the enforceability of the remaining terms and
364
+ conditions.
365
+
366
+ c. No term or condition of this Public License will be waived and no
367
+ failure to comply consented to unless expressly agreed to by the
368
+ Licensor.
369
+
370
+ d. Nothing in this Public License constitutes or may be interpreted
371
+ as a limitation upon, or waiver of, any privileges and immunities
372
+ that apply to the Licensor or You, including from the legal
373
+ processes of any jurisdiction or authority.
374
+
375
+
376
+ =======================================================================
377
+
378
+ Creative Commons is not a party to its public
379
+ licenses. Notwithstanding, Creative Commons may elect to apply one of
380
+ its public licenses to material it publishes and in those instances
381
+ will be considered the “Licensor.” The text of the Creative Commons
382
+ public licenses is dedicated to the public domain under the CC0 Public
383
+ Domain Dedication. Except for the limited purpose of indicating that
384
+ material is shared under a Creative Commons public license or as
385
+ otherwise permitted by the Creative Commons policies published at
386
+ creativecommons.org/policies, Creative Commons does not authorize the
387
+ use of the trademark "Creative Commons" or any other trademark or logo
388
+ of Creative Commons without its prior written consent including,
389
+ without limitation, in connection with any unauthorized modifications
390
+ to any of its public licenses or any other arrangements,
391
+ understandings, or agreements concerning use of licensed material. For
392
+ the avoidance of doubt, this paragraph does not form part of the
393
+ public licenses.
394
+
395
+ Creative Commons may be contacted at creativecommons.org.
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Latest Fastspeech2 Models using FLAT Start
2
+
3
+ This repository branch `(New-Models)` contains new and high quality Fastspeech2 Models for Indian languages implemented using the Flat Start for speech synthesis. The models are capable of generating mel-spectrograms from text inputs and can be used to synthesize speech.
4
+
5
+ **NOTE: The main branch became large in size and underwent few changes in the inference and preprocessing scripts, necessitating the creation of a separate branch. Training information and the script will be shared after further code optimization and footprint reduction.**
6
+
7
+ Clone this branch using the command:
8
+
9
+ ```
10
+ git clone -b New-Models --single-branch https://github.com/smtiitm/Fastspeech2_HS.git
11
+ ```
12
+
13
+ The Repo is large in size. New Models are in "language"_latest folder.
14
+
15
+ ## Model Files
16
+
17
+ The model for each language includes the following files:
18
+
19
+ - `config.yaml`: Configuration file for the Fastspeech2 Model.
20
+ - `energy_stats.npz`: Energy statistics for normalization during synthesis.
21
+ - `feats_stats.npz`: Features statistics for normalization during synthesis.
22
+ - `feats_type`: Features type information.
23
+ - `pitch_stats.npz`: Pitch statistics for normalization during synthesis.
24
+ - `model.pth`: Pre-trained Fastspeech2 model weights.
25
+
26
+ ## Installation
27
+
28
+ 1. Install [Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) first. Create a conda environment using the provided `environment.yml` file:
29
+
30
+ ```shell
31
+ conda env create -f environment.yml
32
+ ```
33
+
34
+ 2.Activate the conda environment (check inside environment.yaml file):
35
+ ```shell
36
+ conda activate tts-hs-hifigan
37
+ ```
38
+
39
+ 3. Install PyTorch separately (you can install the specific version based on your requirements):
40
+ ```shell
41
+ conda install pytorch cudatoolkit
42
+ pip install torchaudio
43
+ ```
44
+ ## Vocoder
45
+ For generating WAV files from mel-spectrograms, you can use a vocoder of your choice. One popular option is the [HIFIGAN](https://github.com/jik876/hifi-gan) vocoder (Clone this repo and put it in the current working directory). Please refer to the documentation of the vocoder you choose for installation and usage instructions.
46
+
47
+ (**We have used the HIFIGAN V1 vocoder and have provided Vocoder for few languages in the Vocoder folder. If needed, make sure to adjust the path in the inference file.**)
48
+
49
+ ## Usage
50
+
51
+ The directory paths are Relative. ( But if needed, Make changes to **text_preprocess_for_inference.py** and **inference.py** file, Update folder/file paths wherever required.)
52
+
53
+ **Please give language/gender in small cases and sample text between quotes. Adjust output speed using the alpha parameter (higher for slow voiced output and vice versa). Output argument is optional; the provide name will be used for the output file.**
54
+
55
+ Use the inference file to synthesize speech from text inputs:
56
+ ```shell
57
+ python inference.py --sample_text "Your input text here" --language <language> --gender <gender> --alpha <alpha> --output_file <file_name.wav OR path/to/file_name.wav>
58
+ ```
59
+
60
+ **Example:**
61
+
62
+ ```
63
+ python inference.py --sample_text "श्रीलंका और पाकिस्तान में खेला जा रहा एशिया कप अब तक का सबसे विवादित टूर्नामेंट होता जा रहा है।" --language hindi_latest --gender male --alpha 1 --output_file male_hindi_output.wav
64
+ ```
65
+ The file will be stored as `male_hindi_output.wav` and will be inside current working directory. If **--output_file** argument is not given it will be stored as `<language>_<gender>_output.wav` in the current working directory.
66
+
67
+ **Use "language"_latest in --language to use latest models.**
68
+
69
+
70
+ ### Citation
71
+ If you use this Fastspeech2 Model in your research or work, please consider citing:
72
+
73
+
74
+ COPYRIGHT
75
+ 2024, Speech Technology Consortium,
76
+
77
+ Bhashini, MeiTY and by Hema A Murthy & S Umesh,
78
+
79
+
80
+ DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
81
+ and
82
+ ELECTRICAL ENGINEERING,
83
+ IIT MADRAS. ALL RIGHTS RESERVED "
84
+
85
+
86
+
87
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
88
+
89
+ This work is licensed under a
90
+ [Creative Commons Attribution 4.0 International License][cc-by].
91
+
92
+ [![CC BY 4.0][cc-by-image]][cc-by]
93
+
94
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
95
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
96
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
api.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TTS IITM SPEECH LAB
2
+ import requests
3
+ import json
4
+ import base64
5
+
6
+ text = "सुप्रभात, आप कैसे हैं?" # hindi
7
+ # text = "സുപ്രഭാതം, സുഖമാ?" # malayalam
8
+ # text = "সুপ্ৰভাত, তুমি কেনে?" # manipuri
9
+ # text = "सुप्रभात, तुम्ही कसे आहात?" # marathi
10
+ # text = "ಶುಭೋದಯ, ನೀವು ಹೇಗಿದ್ದೀರಿ?" # kannada
11
+ # text = "बसु म्विथ्बो, बरि दिबाबो?" # bodo male not working <---
12
+ # text = "Good morning, how are you?" # english
13
+ # text = "সুপ্ৰভাত, আপুনি কেমন আছে?" # assamese
14
+ # text = "காலை வணக்கம், நீங்கள் எப்படி இருக்கின்றீர்கள்?" # tamil
15
+ # text = "ସୁପ୍ରଭାତ, ଆପଣ କେମିତି ଅଛନ୍ତି?" # odia male not working <---
16
+ # text = "सुप्रभात, आप कैसे छो?" # rajasthani
17
+ # text = "శుభోదయం, మీరు ఎలా ఉన్నారు?" # telugu
18
+ # text = "সুপ্রভাত, আপনি কেমন আছেন?" # bengali male not working <---
19
+ # text = "સુપ્રભાત, તમે કેમ છો?" # gujarati
20
+
21
+ lang = 'hindi'
22
+ gender = 'female'
23
+
24
+ url = "http://localhost:4005/tts"
25
+ # url = 'http://projects.respark.iitm.ac.in:8009/tts' # proxy
26
+
27
+ payload = json.dumps({
28
+ "input": text,
29
+ "gender": gender,
30
+ "lang": lang,
31
+ "alpha": 1,
32
+ "segmentwise":"True"
33
+ })
34
+ headers = {'Content-Type': 'application/json'}
35
+ response = requests.request("POST", url, headers=headers, data=payload).json()
36
+
37
+ audio = response['audio']
38
+ file_name = "tts.mp3"
39
+ wav_file = open(file_name,'wb')
40
+ decode_string = base64.b64decode(audio)
41
+ wav_file.write(decode_string)
42
+ wav_file.close()
43
+
44
+ '''
45
+ Supported languages
46
+
47
+ Assamese
48
+ Bengali
49
+ Bodo
50
+ English
51
+ Gujarati
52
+ Hindi
53
+ Kannada
54
+ Malayalam
55
+ Manipuri
56
+ Marathi
57
+ Odia
58
+ Punjabi
59
+ Rajasthani
60
+ Tamil
61
+ Telugu
62
+ Urdu
63
+ '''
app.py ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Flask, render_template, request, send_file, jsonify
2
+ import requests
3
+ import json
4
+ import ssl
5
+ import logging
6
+ import sys
7
+ import os
8
+ import base64
9
+ import io
10
+ #replace the path with your hifigan path to import Generator from models.py
11
+ sys.path.append("hifigan")
12
+ # import argparse
13
+ import torch
14
+ from espnet2.bin.tts_inference import Text2Speech
15
+ from models import Generator
16
+ from scipy.io.wavfile import write
17
+ from meldataset import MAX_WAV_VALUE
18
+ from env import AttrDict
19
+ import json
20
+ import yaml
21
+ from text_preprocess_for_inference import TTSDurAlignPreprocessor
22
+ # import time
23
+
24
+ logging.basicConfig(filename='access.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
25
+
26
+ SAMPLING_RATE = 22050
27
+ if torch.cuda.is_available():
28
+ device = "cuda"
29
+ else:
30
+ device = "cpu"
31
+
32
+ preprocessor = TTSDurAlignPreprocessor()
33
+
34
+ app = Flask(__name__)
35
+ # app.config['SECRET_KEY'] = 'key'
36
+ # socketio = SocketIO(app)
37
+
38
+ # @socketio.on('new_user')
39
+ # def handle_new_user(data):
40
+ # client_id = data['id']
41
+ # # print('\n'+f"New user connected with ID: {client_id}")
42
+ # logging.info('\n'+f"New user connected with ID: {client_id}")
43
+
44
+ def load_hifigan_vocoder(language, gender, device):
45
+ # Load HiFi-GAN vocoder configuration file and generator model for the specified language and gender
46
+ vocoder_config = f"vocoder/{gender}/aryan/hifigan/config.json"
47
+ vocoder_generator = f"vocoder/{gender}/aryan/hifigan/generator"
48
+ # Read the contents of the vocoder configuration file
49
+ with open(vocoder_config, 'r') as f:
50
+ data = f.read()
51
+ json_config = json.loads(data)
52
+ h = AttrDict(json_config)
53
+ torch.manual_seed(h.seed)
54
+ # Move the generator model to the specified device (CPU or GPU)
55
+ device = torch.device(device)
56
+ generator = Generator(h).to(device)
57
+ state_dict_g = torch.load(vocoder_generator, device)
58
+ generator.load_state_dict(state_dict_g['generator'])
59
+ generator.eval()
60
+ generator.remove_weight_norm()
61
+
62
+ # Return the loaded and prepared HiFi-GAN generator model
63
+ return generator
64
+
65
+ def load_fastspeech2_model(language, gender, device):
66
+
67
+ #updating the config.yaml fiel based on language and gender
68
+ with open(f"{language}/{gender}/model/config.yaml", "r") as file:
69
+ config = yaml.safe_load(file)
70
+
71
+ current_working_directory = os.getcwd()
72
+ feat="model/feats_stats.npz"
73
+ pitch="model/pitch_stats.npz"
74
+ energy="model/energy_stats.npz"
75
+
76
+ feat_path=os.path.join(current_working_directory,language,gender,feat)
77
+ pitch_path=os.path.join(current_working_directory,language,gender,pitch)
78
+ energy_path=os.path.join(current_working_directory,language,gender,energy)
79
+
80
+
81
+ config["normalize_conf"]["stats_file"] = feat_path
82
+ config["pitch_normalize_conf"]["stats_file"] = pitch_path
83
+ config["energy_normalize_conf"]["stats_file"] = energy_path
84
+
85
+ with open(f"{language}/{gender}/model/config.yaml", "w") as file:
86
+ yaml.dump(config, file)
87
+
88
+ tts_model = f"{language}/{gender}/model/model.pth"
89
+ tts_config = f"{language}/{gender}/model/config.yaml"
90
+
91
+
92
+ return Text2Speech(train_config=tts_config, model_file=tts_model, device=device)
93
+
94
+ def text_synthesis(language, gender, sample_text, vocoder, MAX_WAV_VALUE, device, alpha=1):
95
+ # Perform Text-to-Speech synthesis
96
+ with torch.no_grad():
97
+ # Load the FastSpeech2 model for the specified language and gender
98
+
99
+ model = load_fastspeech2_model(language, gender, device)
100
+
101
+ # Generate mel-spectrograms from the input text using the FastSpeech2 model
102
+ out = model(sample_text, decode_conf={"alpha": alpha})
103
+ print("TTS Done")
104
+ x = out["feat_gen_denorm"].T.unsqueeze(0) * 2.3262
105
+ x = x.to(device)
106
+
107
+ # Use the HiFi-GAN vocoder to convert mel-spectrograms to raw audio waveforms
108
+ y_g_hat = vocoder(x)
109
+ audio = y_g_hat.squeeze()
110
+ audio = audio * MAX_WAV_VALUE
111
+ audio = audio.cpu().numpy().astype('int16')
112
+
113
+ # Return the synthesized audio
114
+ return audio
115
+
116
+ def setup_app():
117
+ genders = ['male','female']
118
+ # to make dummy calls in all languages available
119
+ languages = {'hindi': "नमस्ते",'malayalam': "ഹലോ",'manipuri': "হ্যালো",'marathi': "हॅलो",'kannada': "ಹಲೋ",'bodo': "हॅलो",'english': "Hello",'assamese': "হ্যালো",'tamil': "ஹலோ",'odia': "ହେଲୋ",'rajasthani': "हॅलो",'telugu': "హలో",'bengali': "হ্যালো",'gujarati': "હલો"}
120
+
121
+ vocoders = {}
122
+ for gender in genders:
123
+ vocoders[gender]={}
124
+ for language,text in languages.items():
125
+ # Load the HiFi-GAN vocoder with dynamic language and gender
126
+ vocoder = load_hifigan_vocoder(language, gender, device)
127
+ vocoders[gender][language] = vocoder
128
+ # dummy calls
129
+ print(f"making dummy calls for {language} - {gender}")
130
+ try:
131
+ out = text_synthesis(language, gender, text, vocoder, MAX_WAV_VALUE, device)
132
+ except:
133
+ message = f"cannot make dummy call for {gender} - {language} <==================="
134
+ print(message.upper())
135
+
136
+ print("Server Started...")
137
+ return vocoders
138
+ vocoders = setup_app()
139
+
140
+ @app.route('/', methods=['GET'])
141
+ def main():
142
+ return "IITM_TTS_V2"
143
+
144
+ @app.route('/tts', methods=['GET', 'POST'], strict_slashes=False)
145
+ def tts():
146
+ try:
147
+ json_data = request.get_json()
148
+ text = json_data["input"]
149
+ if not isinstance(text,str):
150
+ input_type = type(text)
151
+ ret = jsonify(status='failure', reason=f"Unsupported input type {input_type}. Input text should be in string format.")
152
+ gender = json_data["gender"]
153
+ language = json_data["lang"].lower()
154
+ alpha = json_data["alpha"]
155
+ # Preprocess the sample text
156
+ preprocessed_text, phrases = preprocessor.preprocess(text, language, gender)
157
+ preprocessed_text = " ".join(preprocessed_text)
158
+ vocoder = vocoders[gender][language]
159
+ out = text_synthesis(language, gender, preprocessed_text, vocoder, MAX_WAV_VALUE, device, alpha=alpha)
160
+
161
+ # output_file = f"{language}_{gender}_output.wav"
162
+ # write(output_file, SAMPLING_RATE, out)
163
+ # audio_wav_bytes = base64.b64encode(open(output_file, "rb").read())
164
+
165
+ # avoid saving file on disk
166
+ output_stream = io.BytesIO()
167
+ write(output_stream, SAMPLING_RATE, out)
168
+ audio_wav_bytes = base64.b64encode(output_stream.getvalue())
169
+
170
+ ret = jsonify(status="success",audio=audio_wav_bytes.decode('utf-8'))
171
+
172
+ except Exception as err:
173
+ ret = jsonify(status="failure", reason=str(err))
174
+ return ret
175
+
176
+ if __name__ == '__main__':
177
+ # ssl_context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
178
+ # ssl_context.load_cert_chain('./ssl2023/iitm2022.crt','./ssl2023/iitm2022.key')
179
+ app.run(host='0.0.0.0', port=4005, debug=True)
environment.yml ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: tts-hs-hifigan
2
+ channels:
3
+ - defaults
4
+ dependencies:
5
+ - _libgcc_mutex=0.1=main
6
+ - _openmp_mutex=5.1=1_gnu
7
+ - ca-certificates=2022.10.11=h06a4308_0
8
+ - certifi=2022.9.24=py37h06a4308_0
9
+ - ld_impl_linux-64=2.38=h1181459_1
10
+ - libffi=3.3=he6710b0_2
11
+ - libgcc-ng=11.2.0=h1234567_1
12
+ - libgomp=11.2.0=h1234567_1
13
+ - libstdcxx-ng=11.2.0=h1234567_1
14
+ - ncurses=6.3=h5eee18b_3
15
+ - openssl=1.1.1s=h7f8727e_0
16
+ - pip=22.2.2=py37h06a4308_0
17
+ - python=3.7.15=haa1d7c7_0
18
+ - readline=8.2=h5eee18b_0
19
+ - setuptools=65.5.0=py37h06a4308_0
20
+ - sqlite=3.39.3=h5082296_0
21
+ - tk=8.6.12=h1ccaba5_0
22
+ - wheel=0.37.1=pyhd3eb1b0_0
23
+ - xz=5.2.6=h5eee18b_0
24
+ - zlib=1.2.13=h5eee18b_0
25
+ - pip:
26
+ - aiosignal==1.3.1
27
+ - appdirs==1.4.4
28
+ - attrs==22.1.0
29
+ - audioread==3.0.0
30
+ - backcall==0.2.0
31
+ - cffi==1.15.1
32
+ - charset-normalizer==2.1.1
33
+ - ci-sdr==0.0.2
34
+ - click==8.0.4
35
+ - configargparse==1.5.3
36
+ - ctc-segmentation==1.7.4
37
+ - cycler==0.11.0
38
+ - cython==0.29.32
39
+ - decorator==5.1.1
40
+ - distance==0.1.3
41
+ - distlib==0.3.6
42
+ - docopt==0.6.2
43
+ - einops==0.6.0
44
+ - espnet==202209
45
+ - espnet-tts-frontend==0.0.3
46
+ - fast-bss-eval==0.1.3
47
+ - filelock==3.8.0
48
+ - flask==2.2.2
49
+ - fonttools==4.38.0
50
+ - frozenlist==1.3.3
51
+ - g2p-en==2.1.0
52
+ - grpcio==1.50.0
53
+ - gunicorn==20.1.0
54
+ - h5py==3.7.0
55
+ - humanfriendly==10.0
56
+ - idna==3.4
57
+ - importlib-metadata==4.13.0
58
+ - importlib-resources==5.10.0
59
+ - indic-num2words==1.0.1
60
+ - indic_unified_parser==1.0.6
61
+ - inflect==6.0.2
62
+ - ipython==7.34.0
63
+ - itsdangerous==2.1.2
64
+ - jaconv==0.3
65
+ - jamo==0.4.1
66
+ - jedi==0.18.2
67
+ - jinja2==3.1.2
68
+ - joblib==1.2.0
69
+ - jsonschema==4.17.0
70
+ - kaldiio==2.17.2
71
+ - kiwisolver==1.4.4
72
+ - librosa==0.9.2
73
+ - llvmlite==0.39.1
74
+ - markupsafe==2.1.1
75
+ - matplotlib==3.5.3
76
+ - matplotlib-inline==0.1.6
77
+ - msgpack==1.0.4
78
+ - nltk==3.7
79
+ - numba==0.56.4
80
+ - numpy==1.21.6
81
+ - packaging==21.3
82
+ - pandas==1.3.5
83
+ - parso==0.8.3
84
+ - pexpect==4.8.0
85
+ - pickleshare==0.7.5
86
+ - pillow==9.3.0
87
+ - pkgutil-resolve-name==1.3.10
88
+ - platformdirs==2.5.4
89
+ - pooch==1.6.0
90
+ - prompt-toolkit==3.0.36
91
+ - protobuf==3.20.1
92
+ - ptyprocess==0.7.0
93
+ - pycparser==2.21
94
+ - pydantic==1.10.2
95
+ - pydub==0.25.1
96
+ - pygments==2.14.0
97
+ - pyparsing==3.0.9
98
+ - pypinyin==0.44.0
99
+ - pyrsistent==0.19.2
100
+ - python-dateutil==2.8.2
101
+ - pytorch-wpe==0.0.1
102
+ - pytz==2022.6
103
+ - pyworld==0.3.2
104
+ - pyyaml==6.0
105
+ - ray==2.1.0
106
+ - regex==2022.10.31
107
+ - requests==2.28.1
108
+ - resampy==0.4.2
109
+ - scikit-learn==1.0.2
110
+ - scipy==1.7.3
111
+ - sentencepiece==0.1.97
112
+ - six==1.16.0
113
+ - soundfile==0.11.0
114
+ - threadpoolctl==3.1.0
115
+ - torch-complex==0.4.3
116
+ - tqdm==4.64.1
117
+ - traitlets==5.8.0
118
+ - typeguard==2.13.3
119
+ - typing-extensions==4.4.0
120
+ - unidecode==1.3.6
121
+ - urllib3==1.26.12
122
+ - virtualenv==20.16.7
123
+ - wcwidth==0.2.5
124
+ - webvtt-py==0.4.6
125
+ - werkzeug==2.2.2
126
+ - zipp==3.10.0
127
+ prefix: /speech/Apps/Flask_app_env/conda_dir/envs/tts-hs-hifigan
get_phone_mapped_python.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ class TextReplacer:
2
+ def __init__(self):
3
+ self.replacements = {
4
+ 'aa':'A',
5
+ 'ae':'ऍ',
6
+ 'ag':'ऽ',
7
+ 'ai':'ऐ',
8
+ 'au':'औ',
9
+ 'axx':'अ',
10
+ 'ax':'ऑ',
11
+ 'bh':'B',
12
+ 'ch':'C',
13
+ 'dh':'ध',
14
+ 'dxhq':'T',
15
+ 'dxh':'ढ',
16
+ 'dxq':'D',
17
+ 'dx':'ड',
18
+ 'ee':'E',
19
+ 'ei':'ऐ',
20
+ 'eu':'உ',
21
+ 'gh':'घ',
22
+ 'gq':'G',
23
+ 'hq':'H',
24
+ 'ii':'I',
25
+ 'jh':'J',
26
+ 'khq':'K',
27
+ 'kh':'ख',
28
+ 'kq':'क',
29
+ 'ln':'ൾ',
30
+ 'lw':'ൽ',
31
+ 'lx':'ള',
32
+ 'mq':'M',
33
+ 'nd':'ऩ',
34
+ 'ng':'ङ',
35
+ 'nj':'ञ',
36
+ 'nk':'Y',
37
+ 'nn':'N',
38
+ 'nw':'ൺ',
39
+ 'nx':'ण',
40
+ 'oo':'O',
41
+ 'ou':'औ',
42
+ 'ph':'P',
43
+ 'rqw':'ॠ',
44
+ 'rq':'R',
45
+ 'rw':'ർ',
46
+ 'rx':'ऱ',
47
+ 'sh':'श',
48
+ 'sx':'ष',
49
+ 'txh':'ठ',
50
+ 'th':'थ',
51
+ 'tx':'ट',
52
+ 'uu':'U',
53
+ 'wv':'W',
54
+ 'zh':'Z'
55
+
56
+ # ... Add more replacements as needed
57
+ }
58
+
59
+ def apply_replacements(self, text):
60
+ for key, value in self.replacements.items():
61
+ # print('KEY AND VALUE OF PARSED OUTPUT',key, value)
62
+ text = text.replace(key, value)
63
+ temp=""
64
+ for i in range(len(text)):
65
+ if text[i]!=" ":
66
+ temp=temp+text[i]
67
+
68
+ return temp
69
+
70
+ def apply_replacements_by_phonems(self, text):
71
+ ans=self.replacements[text]
72
+ # for key, value in self.replacements.items():
73
+ # # print('KEY AND VALUE OF PARSED OUTPUT',key, value)
74
+ # text = text.replace(key, value)
75
+ return ans
inference.py ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import os
3
+ #replace the path with your hifigan path to import Generator from models.py
4
+ sys.path.append("hifigan")
5
+ import argparse
6
+ import torch
7
+ from espnet2.bin.tts_inference import Text2Speech
8
+ from models import Generator
9
+ from scipy.io.wavfile import write
10
+ from meldataset import MAX_WAV_VALUE
11
+ from env import AttrDict
12
+ import json
13
+ import yaml
14
+ import concurrent.futures
15
+ import numpy as np
16
+ import time
17
+
18
+ from text_preprocess_for_inference import TTSDurAlignPreprocessor, CharTextPreprocessor, TTSPreprocessor
19
+
20
+ SAMPLING_RATE = 48000
21
+
22
+ def load_hifigan_vocoder(language, gender, device):
23
+ # Load HiFi-GAN vocoder configuration file and generator model for the specified language and gender
24
+ vocoder_config = f"vocoder/{gender}/{language}/config.json"
25
+ vocoder_generator = f"vocoder/{gender}/{language}/generator"
26
+ # Read the contents of the vocoder configuration file
27
+ with open(vocoder_config, 'r') as f:
28
+ data = f.read()
29
+ json_config = json.loads(data)
30
+ h = AttrDict(json_config)
31
+ torch.manual_seed(h.seed)
32
+ # Move the generator model to the specified device (CPU or GPU)
33
+ device = torch.device(device)
34
+ generator = Generator(h).to(device)
35
+ state_dict_g = torch.load(vocoder_generator, device)
36
+ generator.load_state_dict(state_dict_g['generator'])
37
+ generator.eval()
38
+ generator.remove_weight_norm()
39
+
40
+ # Return the loaded and prepared HiFi-GAN generator model
41
+ return generator
42
+
43
+
44
+ def load_fastspeech2_model(language, gender, device):
45
+
46
+ #updating the config.yaml fiel based on language and gender
47
+ with open(f"{language}/{gender}/model/config.yaml", "r") as file:
48
+ config = yaml.safe_load(file)
49
+
50
+ current_working_directory = os.getcwd()
51
+ feat="model/feats_stats.npz"
52
+ pitch="model/pitch_stats.npz"
53
+ energy="model/energy_stats.npz"
54
+
55
+ feat_path=os.path.join(current_working_directory,language,gender,feat)
56
+ pitch_path=os.path.join(current_working_directory,language,gender,pitch)
57
+ energy_path=os.path.join(current_working_directory,language,gender,energy)
58
+
59
+
60
+ config["normalize_conf"]["stats_file"] = feat_path
61
+ config["pitch_normalize_conf"]["stats_file"] = pitch_path
62
+ config["energy_normalize_conf"]["stats_file"] = energy_path
63
+
64
+ with open(f"{language}/{gender}/model/config.yaml", "w") as file:
65
+ yaml.dump(config, file)
66
+
67
+ tts_model = f"{language}/{gender}/model/model.pth"
68
+ tts_config = f"{language}/{gender}/model/config.yaml"
69
+
70
+
71
+ return Text2Speech(train_config=tts_config, model_file=tts_model, device=device)
72
+
73
+ def text_synthesis(language, gender, sample_text, vocoder, MAX_WAV_VALUE, device, alpha):
74
+ # Perform Text-to-Speech synthesis
75
+ with torch.no_grad():
76
+ # Load the FastSpeech2 model for the specified language and gender
77
+
78
+ model = load_fastspeech2_model(language, gender, device)
79
+
80
+
81
+ # Generate mel-spectrograms from the input text using the FastSpeech2 model
82
+ out = model(sample_text, decode_conf={"alpha": alpha})
83
+ print("TTS Done")
84
+ x = out["feat_gen_denorm"].T.unsqueeze(0) * 2.3262
85
+ x = x.to(device)
86
+
87
+ # Use the HiFi-GAN vocoder to convert mel-spectrograms to raw audio waveforms
88
+ y_g_hat = vocoder(x)
89
+ audio = y_g_hat.squeeze()
90
+ audio = audio * MAX_WAV_VALUE
91
+ audio = audio.cpu().numpy().astype('int16')
92
+
93
+ # Return the synthesized audio
94
+ return audio
95
+
96
+ def split_into_chunks(text, words_per_chunk=100):
97
+ words = text.split()
98
+ chunks = [words[i:i + words_per_chunk] for i in range(0, len(words), words_per_chunk)]
99
+ return [' '.join(chunk) for chunk in chunks]
100
+
101
+
102
+ if __name__ == "__main__":
103
+ parser = argparse.ArgumentParser(description="Text-to-Speech Inference")
104
+ parser.add_argument("--language", type=str, required=True, help="Language (e.g., hindi)")
105
+ parser.add_argument("--gender", type=str, required=True, help="Gender (e.g., female)")
106
+ parser.add_argument("--sample_text", type=str, required=True, help="Text to be synthesized")
107
+ parser.add_argument("--output_file", type=str, default="", help="Output WAV file path")
108
+ parser.add_argument("--alpha", type=float, default=1, help="Alpha Parameter for speed control (e.g. 1.1 (slow) or 0.8 (fast))")
109
+
110
+ args = parser.parse_args()
111
+
112
+ phone_dictionary = {}
113
+ # Set the device
114
+ device = "cuda" if torch.cuda.is_available() else "cpu"
115
+
116
+ # Load the HiFi-GAN vocoder with dynamic language and gender
117
+ vocoder = load_hifigan_vocoder(args.language, args.gender, device)
118
+
119
+ if args.language == "urdu" or args.language == "punjabi":
120
+ preprocessor = CharTextPreprocessor()
121
+ elif args.language == "english":
122
+ preprocessor = TTSPreprocessor()
123
+ else:
124
+ preprocessor = TTSDurAlignPreprocessor()
125
+
126
+
127
+
128
+ start_time = time.time()
129
+ audio_arr = []
130
+ result = split_into_chunks(args.sample_text)
131
+
132
+ with concurrent.futures.ThreadPoolExecutor() as executor:
133
+ # Process each text sample concurrently
134
+ for sample_text in result:
135
+
136
+ # Preprocess the text and obtain a list of phrases
137
+ preprocessed_text, phrases = preprocessor.preprocess(sample_text, args.language, args.gender, phone_dictionary)
138
+ preprocessed_text = " ".join(preprocessed_text)
139
+
140
+ # Generate audio from the preprocessed text using a text-to-speech synthesis function
141
+ audio = text_synthesis(args.language, args.gender, preprocessed_text, vocoder, MAX_WAV_VALUE, device, args.alpha)
142
+
143
+
144
+ # Set the output file name
145
+ if args.output_file:
146
+ output_file = f"{args.output_file}"
147
+ else:
148
+ output_file = f"{args.language}_{args.gender}_output.wav"
149
+
150
+ # Append the generated audio to the list
151
+ audio_arr.append(audio)
152
+ result_array = np.concatenate(audio_arr, axis=0)
153
+ write(output_file, SAMPLING_RATE, result_array)
license.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e45a02755dcbb6015e3ff0a8e6de54a929ea5a85233e49773cb8c0fd6177b6ae
3
+ size 138348
multilingualcharmap.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"assamese_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "\u0922", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "assamese_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "\u0922", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "bengali_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "b", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "\u0922", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "bengali_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "b", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "\u0922", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "bodo_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "y", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0921", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "\u0921", "f": "P", "\u0930": "r", "M": "n", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "gujarati_male": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "\u090d", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "gujarati_female": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "\u090d", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "hindi_male": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "\u090d", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "\u0915", "K": "K", "G": "G", "z": "z", "D": "D", "T": "T", "f": "f", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "hindi_female": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "\u090d", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "\u0915", "K": "K", "G": "G", "z": "z", "D": "D", "T": "T", "f": "f", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "kannada_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "r", "M": "n", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "kannada_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "r", "M": "n", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "malayalam_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "\u0930", "M": "n", "q": "q", "H": "H", "Z": "Z", "\u0928": "n", "N": "N", "\u0d7e": "\u0d7e", "\u0d7d": "\u0d7d", "\u0d7a": "\u0d7a", "\u0d7c": "\u0d7c", "\u0960": "R"}, "malayalam_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "\u0930", "M": "n", "q": "q", "H": "H", "Z": "Z", "\u0928": "n", "N": "N", "\u0d7e": "\u0d7e", "\u0d7d": "\u0d7d", "\u0d7a": "\u0d7a", "\u0d7c": "\u0d7c", "\u0960": "R"}, "manipuri_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "r", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "g", "\u0919": "\u0919", "c": "c", "C": "c", "j": "j", "J": "j", "\u091e": "y", "\u091f": "\u091f", "\u0920": "\u091f", "\u0921": "\u091f", "\u0922": "\u091f", "\u0923": "n", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "d", "n": "n", "p": "p", "P": "P", "b": "b", "B": "b", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "w", "\u0936": "\u0936", "\u0937": "\u0936", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u091f", "T": "\u091f", "f": "P", "\u0930": "r", "M": "n", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "n", "\u0d7c": "r", "\u0960": "r"}, "manipuri_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "r", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "g", "\u0919": "\u0919", "c": "c", "C": "c", "j": "j", "J": "j", "\u091e": "y", "\u091f": "\u091f", "\u0920": "\u091f", "\u0921": "\u091f", "\u0922": "\u091f", "\u0923": "n", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "d", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "l", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u091f", "T": "\u091f", "f": "P", "\u0930": "r", "M": "n", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "l", "\u0d7d": "l", "\u0d7a": "n", "\u0d7c": "r", "\u0960": "r"}, "marathi_male": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "\u090d", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "f", "\u0930": "\u0930", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "marathi_female": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "\u090d", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "z", "D": "\u0921", "T": "\u0922", "f": "f", "\u0930": "\u0930", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "odia_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "T", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "odia_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "E", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "\u0919", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "D", "T": "T", "f": "P", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "rajasthani_male": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "z", "D": "D", "T": "T", "f": "f", "\u0930": "r", "M": "M", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "rajasthani_female": {"a": "a", "\u0911": "\u0911", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "E", "E": "E", "\u0910": "\u0910", "o": "o", "O": "o", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "J", "\u091e": "y", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "z", "D": "D", "T": "\u0922", "f": "f", "\u0930": "r", "M": "n", "q": "q", "H": "h", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "tamil_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "\u0b89", "U": "U", "R": "r", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "k", "g": "g", "\u0918": "g", "\u0919": "\u0919", "c": "c", "C": "c", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u091f", "\u0921": "\u0921", "\u0922": "\u0921", "\u0923": "\u0923", "t": "t", "\u0925": "t", "d": "d", "\u0927": "d", "n": "n", "p": "p", "P": "p", "b": "b", "B": "b", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0937", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "k", "G": "g", "z": "j", "D": "\u0921", "T": "\u0921", "f": "f", "\u0930": "\u0930", "M": "n", "q": "n", "H": "h", "Z": "Z", "\u0928": "\u0928", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "r"}, "tamil_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "\u0b89", "U": "U", "R": "r", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "k", "g": "g", "\u0918": "g", "\u0919": "\u0919", "c": "c", "C": "c", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u091f", "\u0921": "\u0921", "\u0922": "\u0921", "\u0923": "\u0923", "t": "t", "\u0925": "t", "d": "d", "\u0927": "d", "n": "n", "p": "p", "P": "p", "b": "b", "B": "b", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0937", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "k", "G": "g", "z": "j", "D": "\u0921", "T": "\u0921", "f": "f", "\u0930": "\u0930", "M": "n", "q": "n", "H": "h", "Z": "Z", "\u0928": "\u0928", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "r"}, "telugu_male": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "\u0930", "M": "n", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}, "telugu_female": {"a": "a", "\u0911": "A", "A": "A", "\u0905": "A", "i": "i", "I": "I", "u": "u", "\u0b89": "u", "U": "U", "R": "R", "e": "e", "E": "E", "\u0910": "\u0910", "o": "o", "O": "O", "\u090d": "E", "\u0914": "\u0914", "k": "k", "\u0916": "\u0916", "g": "g", "\u0918": "\u0918", "\u0919": "n", "c": "c", "C": "C", "j": "j", "J": "j", "\u091e": "\u091e", "\u091f": "\u091f", "\u0920": "\u0920", "\u0921": "\u0921", "\u0922": "\u0922", "\u0923": "\u0923", "t": "t", "\u0925": "\u0925", "d": "d", "\u0927": "\u0927", "n": "n", "p": "p", "P": "P", "b": "b", "B": "B", "m": "m", "y": "y", "r": "r", "l": "l", "\u0d33": "\u0d33", "w": "w", "\u0936": "\u0936", "\u0937": "\u0937", "s": "s", "h": "h", "\u0915": "k", "K": "\u0916", "G": "g", "z": "j", "D": "\u0921", "T": "\u0922", "f": "P", "\u0930": "\u0930", "M": "n", "q": "q", "H": "H", "Z": "y", "\u0928": "n", "N": "n", "\u0d7e": "\u0d33", "\u0d7d": "l", "\u0d7a": "\u0923", "\u0d7c": "r", "\u0960": "R"}}
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # use this requirement file if not usong conda, but pip
2
+ # create the tts-hs-hifigan virtual environment using "python3 -m venv tts-hs-hifigan" > "source tts-hs-hifigan/bin/activate" > "pip install -r requirements.txt"
3
+ flask
4
+ requests
5
+ torch
6
+ espnet
7
+ matplotlib
8
+ pandas
9
+ indic-num2words
10
+ gunicorn
start.sh ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ source tts-hs-hifigan/bin/activate
2
+ CUDA_VISIBLE_DEVICES="" gunicorn -w 2 -b 0.0.0.0:4005 app:app --timeout 600 #--daemon # to run in cpu
3
+ # CUDA_VISIBLE_DEVICES=1 gunicorn -w 2 -b 0.0.0.0:4005 app:app --timeout 600 --daemon # to run in specific gpu
4
+
5
+
6
+ # CUDA_VISIBLE_DEVICES="" > to make all the GPUs available invisible
text_preprocess_for_inference.py ADDED
@@ -0,0 +1,949 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''
2
+ TTS Preprocessing
3
+ Developed by Arun Kumar A(CS20S013) - November 2022
4
+ Code Changes by Utkarsh - 2023
5
+ '''
6
+ import os
7
+ import re
8
+ import json
9
+ import pandas as pd
10
+ import string
11
+ from collections import defaultdict
12
+ import time
13
+ import subprocess
14
+ import shutil
15
+ from multiprocessing import Process
16
+ import traceback
17
+
18
+ #imports of dependencies from environment.yml
19
+ from num_to_words import num_to_word
20
+ from g2p_en import G2p
21
+
22
+ def add_to_dictionary(dict_to_add, dict_file):
23
+ append_string = ""
24
+ for key, value in dict_to_add.items():
25
+ append_string += (str(key) + " " + str(value) + "\n")
26
+
27
+ if os.path.isfile(dict_file):
28
+ # make a copy of the dictionary
29
+ source_dir = os.path.dirname(dict_file)
30
+ dict_file_name = os.path.basename(dict_file)
31
+ temp_file_name = "." + dict_file_name + ".temp"
32
+ temp_dict_file = os.path.join(source_dir, temp_file_name)
33
+ shutil.copy(dict_file, temp_dict_file)
34
+ # append the new words in the dictionary to the temp file
35
+ with open(temp_dict_file, "a") as f:
36
+ f.write(append_string)
37
+ # check if the write is successful and then replace the temp file as the dict file
38
+ try:
39
+ df_orig = pd.read_csv(dict_file, delimiter=" ", header=None, dtype=str)
40
+ df_temp = pd.read_csv(temp_dict_file, delimiter=" ", header=None, dtype=str)
41
+ if len(df_temp) > len(df_orig):
42
+ os.rename(temp_dict_file, dict_file)
43
+ print(f"{len(dict_to_add)} new words appended to Dictionary: {dict_file}")
44
+ except:
45
+ print(traceback.format_exc())
46
+ else:
47
+ # create a new dictionary
48
+ with open(dict_file, "a") as f:
49
+ f.write(append_string)
50
+ print(f"New Dictionary: {dict_file} created with {len(dict_to_add)} words")
51
+
52
+
53
+ class TextCleaner:
54
+ def __init__(self):
55
+ # this is a static set of cleaning rules to be applied
56
+ self.cleaning_rules = {
57
+ " +" : " ",
58
+ "^ +" : "",
59
+ " +$" : "",
60
+ "#" : "",
61
+ "[.,;।!](\r\n)*" : "# ",
62
+ "[.,;।!](\n)*" : "# ",
63
+ "(\r\n)+" : "# ",
64
+ "(\n)+" : "# ",
65
+ "(\r)+" : "# ",
66
+ """[?;:)(!|&’‘,।\."]""": "",
67
+ "[/']" : "",
68
+ "[-–]" : " ",
69
+ }
70
+
71
+ def clean(self, text):
72
+ for key, replacement in self.cleaning_rules.items():
73
+ text = re.sub(key, replacement, text)
74
+ return text
75
+
76
+ def clean_list(self, text):
77
+ # input is supposed to be a list of strings
78
+ output_text = []
79
+ for line in text:
80
+ line = line.strip()
81
+ for key, replacement in self.cleaning_rules.items():
82
+ line = re.sub(key, replacement, line)
83
+ output_text.append(line)
84
+ return output_text
85
+
86
+
87
+ class Phonifier:
88
+ def __init__(self, dict_location=None):
89
+ if dict_location is None:
90
+ dict_location = "phone_dict"
91
+ self.dict_location = dict_location
92
+
93
+ # self.phone_dictionary = {}
94
+ # # load dictionary for all the available languages
95
+ # for dict_file in os.listdir(dict_location):
96
+ # try:
97
+ # if dict_file.startswith("."):
98
+ # # ignore hidden files
99
+ # continue
100
+ # language = dict_file
101
+ # dict_file_path = os.path.join(dict_location, dict_file)
102
+ # df = pd.read_csv(dict_file_path, delimiter=" ", header=None, dtype=str)
103
+ # self.phone_dictionary[language] = df.set_index(0).to_dict('dict')[1]
104
+ # except Exception as e:
105
+ # print(traceback.format_exc())
106
+
107
+ # print("Phone dictionary loaded for the following languages:", list(self.phone_dictionary.keys()))
108
+
109
+ self.g2p = G2p()
110
+ print('Loading G2P model... Done!')
111
+ # Mapping between the cmu phones and the iitm cls
112
+ self.cmu_2_cls_map = {
113
+ "AA" : "aa",
114
+ "AA0" : "aa",
115
+ "AA1" : "aa",
116
+ "AA2" : "aa",
117
+ "AE" : "axx",
118
+ "AE0" : "axx",
119
+ "AE1" : "axx",
120
+ "AE2" : "axx",
121
+ "AH" : "a",
122
+ "AH0" : "a",
123
+ "AH1" : "a",
124
+ "AH2" : "a",
125
+ "AO" : "ax",
126
+ "AO0" : "ax",
127
+ "AO1" : "ax",
128
+ "AO2" : "ax",
129
+ "AW" : "ou",
130
+ "AW0" : "ou",
131
+ "AW1" : "ou",
132
+ "AW2" : "ou",
133
+ "AX" : "a",
134
+ "AY" : "ei",
135
+ "AY0" : "ei",
136
+ "AY1" : "ei",
137
+ "AY2" : "ei",
138
+ "B" : "b",
139
+ "CH" : "c",
140
+ "D" : "dx",
141
+ "DH" : "d",
142
+ "EH" : "ee",
143
+ "EH0" : "ee",
144
+ "EH1" : "ee",
145
+ "EH2" : "ee",
146
+ "ER" : "a r",
147
+ "ER0" : "a r",
148
+ "ER1" : "a r",
149
+ "ER2" : "a r",
150
+ "EY" : "ee",
151
+ "EY0" : "ee",
152
+ "EY1" : "ee",
153
+ "EY2" : "ee",
154
+ "F" : "f",
155
+ "G" : "g",
156
+ "HH" : "h",
157
+ "IH" : "i",
158
+ "IH0" : "i",
159
+ "IH1" : "i",
160
+ "IH2" : "i",
161
+ "IY" : "ii",
162
+ "IY0" : "ii",
163
+ "IY1" : "ii",
164
+ "IY2" : "ii",
165
+ "JH" : "j",
166
+ "K" : "k",
167
+ "L" : "l",
168
+ "M" : "m",
169
+ "N" : "n",
170
+ "NG" : "ng",
171
+ "OW" : "o",
172
+ "OW0" : "o",
173
+ "OW1" : "o",
174
+ "OW2" : "o",
175
+ "OY" : "ei",
176
+ "OY0" : "ei",
177
+ "OY1" : "ei",
178
+ "OY2" : "ei",
179
+ "P" : "p",
180
+ "R" : "r",
181
+ "S" : "s",
182
+ "SH" : "sh",
183
+ "T" : "tx",
184
+ "TH" : "t",
185
+ "UH" : "u",
186
+ "UH0" : "u",
187
+ "UH1" : "u",
188
+ "UH2" : "u",
189
+ "UW" : "uu",
190
+ "UW0" : "uu",
191
+ "UW1" : "uu",
192
+ "UW2" : "uu",
193
+ "V" : "w",
194
+ "W" : "w",
195
+ "Y" : "y",
196
+ "Z" : "z",
197
+ "ZH" : "sh",
198
+ }
199
+
200
+ # Mapping between the iitm cls and iitm char
201
+ self.cls_2_chr_map = {
202
+ "aa" : "A",
203
+ "ii" : "I",
204
+ "uu" : "U",
205
+ "ee" : "E",
206
+ "oo" : "O",
207
+ "nn" : "N",
208
+ "ae" : "ऍ",
209
+ "ag" : "ऽ",
210
+ "au" : "औ",
211
+ "axx" : "अ",
212
+ "ax" : "ऑ",
213
+ "bh" : "B",
214
+ "ch" : "C",
215
+ "dh" : "ध",
216
+ "dx" : "ड",
217
+ "dxh" : "ढ",
218
+ "dxhq" : "T",
219
+ "dxq" : "D",
220
+ "ei" : "ऐ",
221
+ "ai" : "ऐ",
222
+ "eu" : "உ",
223
+ "gh" : "घ",
224
+ "gq" : "G",
225
+ "hq" : "H",
226
+ "jh" : "J",
227
+ "kh" : "ख",
228
+ "khq" : "K",
229
+ "kq" : "क",
230
+ "ln" : "ൾ",
231
+ "lw" : "ൽ",
232
+ "lx" : "ള",
233
+ "mq" : "M",
234
+ "nd" : "न",
235
+ "ng" : "ङ",
236
+ "nj" : "ञ",
237
+ "nk" : "Y",
238
+ "nw" : "ൺ",
239
+ "nx" : "ण",
240
+ "ou" : "औ",
241
+ "ph" : "P",
242
+ "rq" : "R",
243
+ "rqw" : "ॠ",
244
+ "rw" : "ർ",
245
+ "rx" : "र",
246
+ "sh" : "श",
247
+ "sx" : "ष",
248
+ "th" : "थ",
249
+ "tx" : "ट",
250
+ "txh" : "ठ",
251
+ "wv" : "W",
252
+ "zh" : "Z",
253
+ }
254
+
255
+ # Multilingual support for OOV characters
256
+ oov_map_json_file = 'multilingualcharmap.json'
257
+ with open(oov_map_json_file, 'r') as oov_file:
258
+ self.oov_map = json.load(oov_file)
259
+
260
+
261
+
262
+ def load_lang_dict(self, language, phone_dictionary):
263
+ # load dictionary for requested language
264
+ try:
265
+
266
+ dict_file = language
267
+ print("language", language)
268
+ dict_file_path = os.path.join(self.dict_location, dict_file)
269
+ print("dict_file_path", dict_file_path)
270
+ df = pd.read_csv(dict_file_path, delimiter=" ", header=None, dtype=str)
271
+ phone_dictionary[language] = df.set_index(0).to_dict('dict')[1]
272
+
273
+ dict_file = 'english'
274
+ dict_file_path = os.path.join(self.dict_location, dict_file)
275
+ df = pd.read_csv(dict_file_path, delimiter=" ", header=None, dtype=str)
276
+ phone_dictionary['english'] = df.set_index(0).to_dict('dict')[1]
277
+
278
+ except Exception as e:
279
+ print(traceback.format_exc())
280
+
281
+ return phone_dictionary
282
+
283
+ def __is_float(self, word):
284
+ parts = word.split('.')
285
+ if len(parts) != 2:
286
+ return False
287
+ return parts[0].isdecimal() and parts[1].isdecimal()
288
+
289
+ def en_g2p(self, word):
290
+ phn_out = self.g2p(word)
291
+ # print(f"phn_out: {phn_out}")
292
+ # iterate over the string list and replace each word with the corresponding value from the dictionary
293
+ for i, phn in enumerate(phn_out):
294
+ if phn in self.cmu_2_cls_map.keys():
295
+ phn_out[i] = self.cmu_2_cls_map[phn]
296
+ # cls_out = self.cmu_2_cls_map[phn]
297
+ if phn_out[i] in self.cls_2_chr_map.keys():
298
+ phn_out[i] = self.cls_2_chr_map[phn_out[i]]
299
+ else:
300
+ pass
301
+ else:
302
+ pass # ignore words that are not in the dictionary
303
+ # print(f"i: {i}, phn: {phn}, cls_out: {cls_out}, phn_out: {phn_out[i]}")
304
+ return ("".join(phn_out)).strip().replace(" ", "")
305
+
306
+ def __post_phonify(self, text, language, gender):
307
+ language_gender_id = language+'_'+gender
308
+ if language_gender_id in self.oov_map.keys():
309
+ output_string = ''
310
+ for char in text:
311
+ if char in self.oov_map[language_gender_id].keys():
312
+ output_string += self.oov_map[language_gender_id][char]
313
+ else:
314
+ output_string += char
315
+ # output_string += self.oov_map['language_gender_id']['char']
316
+ return output_string
317
+ else:
318
+ return text
319
+
320
+ def __is_english_word(self, word):
321
+ maxchar = max(word)
322
+ if u'\u0000' <= maxchar <= u'\u007f':
323
+ return True
324
+ return False
325
+
326
+ def __phonify(self, text, language, gender, phone_dictionary):
327
+ # text is expected to be a list of strings
328
+ words = set((" ".join(text)).split(" "))
329
+ #print(f"words test: {words}")
330
+ non_dict_words = []
331
+
332
+
333
+ if language in phone_dictionary:
334
+ for word in words:
335
+ # print(f"word: {word}")
336
+ if word not in phone_dictionary[language] and (language == "english" or (not self.__is_english_word(word))):
337
+ non_dict_words.append(word)
338
+ #print('INSIDE IF CONDITION OF ADDING WORDS')
339
+ else:
340
+ non_dict_words = words
341
+ print(f"word not in dict: {non_dict_words}")
342
+
343
+ if len(non_dict_words) > 0:
344
+ # unified parser has to be run for the non dictionary words
345
+ os.makedirs("tmp", exist_ok=True)
346
+ timestamp = str(time.time())
347
+ non_dict_words_file = os.path.abspath("tmp/non_dict_words_" + timestamp)
348
+ out_dict_file = os.path.abspath("tmp/out_dict_" + timestamp)
349
+ with open(non_dict_words_file, "w") as f:
350
+ f.write("\n".join(non_dict_words))
351
+
352
+ if(language == 'tamil'):
353
+ current_directory = os.getcwd()
354
+ #tamil_parser_cmd = "tamil_parser.sh"
355
+ tamil_parser_cmd = f"{current_directory}/ssn_parser_new/tamil_parser.py"
356
+ #subprocess.run(["bash", tamil_parser_cmd, non_dict_words_file, out_dict_file, timestamp, "ssn_parser"])
357
+ subprocess.run(["python", tamil_parser_cmd, non_dict_words_file, out_dict_file, timestamp, f"{current_directory}/ssn_parser_new"])
358
+ elif(language == 'english'):
359
+ phn_out_dict = {}
360
+ for i in range(0,len(non_dict_words)):
361
+ phn_out_dict[non_dict_words[i]] = self.en_g2p(non_dict_words[i])
362
+ # Create a string representation of the dictionary
363
+ data_str = "\n".join([f"{key}\t{value}" for key, value in phn_out_dict.items()])
364
+ print(f"data_str: {data_str}")
365
+ with open(out_dict_file, "w") as f:
366
+ f.write(data_str)
367
+ else:
368
+
369
+ out_dict_file = os.path.abspath("tmp/out_dict_" + timestamp)
370
+ from get_phone_mapped_python import TextReplacer
371
+
372
+ from indic_unified_parser.uparser import wordparse
373
+
374
+ text_replacer=TextReplacer()
375
+ # def write_output_to_file(output_text, file_path):
376
+ # with open(file_path, 'w') as f:
377
+ # f.write(output_text)
378
+ parsed_output_list = []
379
+ for word in non_dict_words:
380
+ parsed_word = wordparse(word, 0, 0, 1)
381
+ parsed_output_list.append(parsed_word)
382
+ replaced_output_list = [text_replacer.apply_replacements(parsed_word) for parsed_word in parsed_output_list]
383
+ with open(out_dict_file, 'w', encoding='utf-8') as file:
384
+ for original_word, formatted_word in zip(non_dict_words, replaced_output_list):
385
+ line = f"{original_word}\t{formatted_word}\n"
386
+ file.write(line)
387
+ print(line, end='')
388
+
389
+
390
+ try:
391
+
392
+ df = pd.read_csv(out_dict_file, delimiter="\t", header=None, dtype=str)
393
+ #print('DATAFRAME OUTPUT FILE', df.head())
394
+ new_dict = df.dropna().set_index(0).to_dict('dict')[1]
395
+ #print("new dict",new_dict)
396
+ if language not in phone_dictionary:
397
+ phone_dictionary[language] = new_dict
398
+ else:
399
+ phone_dictionary[language].update(new_dict)
400
+ # run a non-blocking child process to update the dictionary file
401
+ #print("phone_dict", self.phone_dictionary)
402
+ p = Process(target=add_to_dictionary, args=(new_dict, os.path.join(self.dict_location, language)))
403
+ p.start()
404
+ except Exception as err:
405
+ print(f"Error: While loading {out_dict_file}")
406
+ traceback.print_exc()
407
+
408
+ # phonify text with dictionary
409
+ text_phonified = []
410
+ for phrase in text:
411
+ phrase_phonified = []
412
+ for word in phrase.split(" "):
413
+ if self.__is_english_word(word):
414
+ if word in phone_dictionary["english"]:
415
+ phrase_phonified.append(str(phone_dictionary["english"][word]))
416
+ else:
417
+ phrase_phonified.append(str(self.en_g2p(word)))
418
+ elif word in phone_dictionary[language]:
419
+ # if a word could not be parsed, skip it
420
+ phrase_phonified.append(str(phone_dictionary[language][word]))
421
+ # text_phonified.append(self.__post_phonify(" ".join(phrase_phonified),language, gender))
422
+ text_phonified.append(" ".join(phrase_phonified))
423
+ return text_phonified
424
+
425
+ def __merge_lists(self, lists):
426
+ merged_string = ""
427
+ for list in lists:
428
+ for word in list:
429
+ merged_string += word + " "
430
+ return merged_string.strip()
431
+
432
+ def __phonify_list(self, text, language, gender, phone_dictionary):
433
+ # text is expected to be a list of list of strings
434
+ words = set(self.__merge_lists(text).split(" "))
435
+ non_dict_words = []
436
+ if language in phone_dictionary:
437
+ for word in words:
438
+ if word not in phone_dictionary[language] and (language == "english" or (not self.__is_english_word(word))):
439
+ non_dict_words.append(word)
440
+ else:
441
+ non_dict_words = words
442
+
443
+ if len(non_dict_words) > 0:
444
+ print(len(non_dict_words))
445
+ print(non_dict_words)
446
+ # unified parser has to be run for the non dictionary words
447
+ os.makedirs("tmp", exist_ok=True)
448
+ timestamp = str(time.time())
449
+ non_dict_words_file = os.path.abspath("tmp/non_dict_words_" + timestamp)
450
+ out_dict_file = os.path.abspath("tmp/out_dict_" + timestamp)
451
+ with open(non_dict_words_file, "w") as f:
452
+ f.write("\n".join(non_dict_words))
453
+
454
+ if(language == 'tamil'):
455
+ current_directory = os.getcwd()
456
+ #tamil_parser_cmd = "tamil_parser.sh"
457
+ tamil_parser_cmd = f"{current_directory}/ssn_parser_new/tamil_parser.py"
458
+ #subprocess.run(["bash", tamil_parser_cmd, non_dict_words_file, out_dict_file, timestamp, "ssn_parser"])
459
+ subprocess.run(["python", tamil_parser_cmd, non_dict_words_file, out_dict_file, timestamp, f"{current_directory}/ssn_parser_new"])
460
+
461
+ elif(language == 'english'):
462
+ phn_out_dict = {}
463
+ for i in range(0,len(non_dict_words)):
464
+ phn_out_dict[non_dict_words[i]] = self.en_g2p(non_dict_words[i])
465
+ # Create a string representation of the dictionary
466
+ data_str = "\n".join([f"{key}\t{value}" for key, value in phn_out_dict.items()])
467
+ print(f"data_str: {data_str}")
468
+ with open(out_dict_file, "w") as f:
469
+ f.write(data_str)
470
+ else:
471
+ out_dict_file = os.path.abspath("tmp/out_dict_" + timestamp)
472
+ from get_phone_mapped_python import TextReplacer
473
+
474
+ from indic_unified_parser.uparser import wordparse
475
+
476
+ text_replacer=TextReplacer()
477
+
478
+ parsed_output_list = []
479
+ for word in non_dict_words:
480
+ parsed_word = wordparse(word, 0, 0, 1)
481
+ parsed_output_list.append(parsed_word)
482
+ replaced_output_list = [text_replacer.apply_replacements(parsed_word) for parsed_word in parsed_output_list]
483
+ with open(out_dict_file, 'w', encoding='utf-8') as file:
484
+ for original_word, formatted_word in zip(non_dict_words, replaced_output_list):
485
+ line = f"{original_word}\t{formatted_word}\n"
486
+ file.write(line)
487
+ print(line, end='')
488
+
489
+ try:
490
+ df = pd.read_csv(out_dict_file, delimiter="\t", header=None, dtype=str)
491
+ new_dict = df.dropna().set_index(0).to_dict('dict')[1]
492
+ print(new_dict)
493
+ if language not in phone_dictionary:
494
+ phone_dictionary[language] = new_dict
495
+ else:
496
+ phone_dictionary[language].update(new_dict)
497
+ # run a non-blocking child process to update the dictionary file
498
+ p = Process(target=add_to_dictionary, args=(new_dict, os.path.join(self.dict_location, language)))
499
+ p.start()
500
+ except Exception as err:
501
+ traceback.print_exc()
502
+
503
+ # phonify text with dictionary
504
+ text_phonified = []
505
+ for line in text:
506
+ line_phonified = []
507
+ for phrase in line:
508
+ phrase_phonified = []
509
+ for word in phrase.split(" "):
510
+ if self.__is_english_word(word):
511
+ if word in phone_dictionary["english"]:
512
+ phrase_phonified.append(str(phone_dictionary["english"][word]))
513
+ else:
514
+ phrase_phonified.append(str(self.en_g2p(word)))
515
+ elif word in phone_dictionary[language]:
516
+ # if a word could not be parsed, skip it
517
+ phrase_phonified.append(str(phone_dictionary[language][word]))
518
+ # line_phonified.append(self.__post_phonify(" ".join(phrase_phonified), language, gender))
519
+ line_phonified.append(" ".join(phrase_phonified))
520
+ text_phonified.append(line_phonified)
521
+ return text_phonified
522
+
523
+ def phonify(self, text, language, gender, phone_dictionary):
524
+ if not isinstance(text, list):
525
+ out = self.__phonify([text], language, gender)
526
+ return out[0]
527
+ return self.__phonify(text, language, gender, phone_dictionary)
528
+
529
+ def phonify_list(self, text, language, gender, phone_dictionary):
530
+ if isinstance(text, list):
531
+ return self.__phonify_list(text, language, gender, phone_dictionary)
532
+ else:
533
+ print("Error!! Expected to have a list as input.")
534
+
535
+
536
+ class TextNormalizer:
537
+ def __init__(self, char_map_location=None):
538
+ # self.phonifier = phonifier
539
+ if char_map_location is None:
540
+ char_map_location = "charmap"
541
+
542
+ # this is a static set of cleaning rules to be applied
543
+ self.cleaning_rules = {
544
+ " +" : " ",
545
+ "^ +" : "",
546
+ " +$" : "",
547
+ "#$" : "",
548
+ "# +$" : "",
549
+ }
550
+
551
+ # this is the list of languages supported by num_to_words
552
+ self.keydict = {"english" : "en",
553
+ "hindi" : "hi",
554
+ "gujarati" : "gu",
555
+ "marathi" : "mr",
556
+ "bengali" : "bn",
557
+ "telugu" : "te",
558
+ "tamil" : "ta",
559
+ "kannada" : "kn",
560
+ "odia" : "or",
561
+ "punjabi" : "pa"
562
+ }
563
+
564
+ # self.g2p = G2p()
565
+ # print('Loading G2P model... Done!')
566
+
567
+ def __post_cleaning(self, text):
568
+ for key, replacement in self.cleaning_rules.items():
569
+ text = re.sub(key, replacement, text)
570
+ return text
571
+
572
+ def __post_cleaning_list(self, text):
573
+ # input is supposed to be a list of strings
574
+ output_text = []
575
+ for line in text:
576
+ for key, replacement in self.cleaning_rules.items():
577
+ line = re.sub(key, replacement, line)
578
+ output_text.append(line)
579
+ return output_text
580
+
581
+ def __check_char_type(self, str_c):
582
+ # Determine the type of the character
583
+ if str_c.isnumeric():
584
+ char_type = "number"
585
+ elif str_c in string.punctuation:
586
+ char_type = "punctuation"
587
+ elif str_c in string.whitespace:
588
+ char_type = "whitespace"
589
+ elif str_c.isalpha() and str_c.isascii():
590
+ char_type = "ascii"
591
+ else:
592
+ char_type = "non-ascii"
593
+ return char_type
594
+
595
+ def insert_space(self, text):
596
+ '''
597
+ Check if the text contains numbers and English words and if they are without space inserts space between them.
598
+ '''
599
+ # Initialize variables to track the previous character type and whether a space should be inserted
600
+ prev_char_type = None
601
+ next_char_type = None
602
+ insert_space = False
603
+
604
+ # Output string
605
+ output_string = ""
606
+
607
+ # Iterate through each character in the text
608
+ for i, c in enumerate(text):
609
+ # Determine the type of the character
610
+ char_type = self.__check_char_type(c)
611
+ if i == (len(text) - 1):
612
+ next_char_type = None
613
+ else:
614
+ next_char_type = self.__check_char_type(text[i+1])
615
+ # print(f"{i}: {c} is a {char_type} character and next character is a {next_char_type}")
616
+
617
+ # If the character type has changed from the previous character, check if a space should be inserted
618
+ if (char_type != prev_char_type and prev_char_type != None and char_type != "punctuation" and char_type != "whitespace"):
619
+ if next_char_type != "punctuation" or next_char_type != "whitespace":
620
+ insert_space = True
621
+
622
+ # Insert a space if needed
623
+ if insert_space:
624
+ output_string += " "+c
625
+ insert_space = False
626
+ else:
627
+ output_string += c
628
+
629
+ # Update the previous character type
630
+ prev_char_type = char_type
631
+
632
+ # Print the modified text
633
+ output_string = re.sub(r' +', ' ', output_string)
634
+ return output_string
635
+
636
+ def insert_space_list(self, text):
637
+ '''
638
+ Expect the input to be in form of list of string.
639
+ Check if the text contains numbers and English words and if they are without space inserts space between them.
640
+ '''
641
+ # Output string list
642
+ output_list = []
643
+
644
+ for line in text:
645
+ # Initialize variables to track the previous character type and whether a space should be inserted
646
+ prev_char_type = None
647
+ next_char_type = None
648
+ insert_space = False
649
+ # Output string
650
+ output_string = ""
651
+ # Iterate through each character in the line
652
+ for i, c in enumerate(line):
653
+ # Determine the type of the character
654
+ char_type = self.__check_char_type(c)
655
+ if i == (len(line) - 1):
656
+ next_char_type = None
657
+ else:
658
+ next_char_type = self.__check_char_type(line[i+1])
659
+ # print(f"{i}: {c} is a {char_type} character and next character is a {next_char_type}")
660
+
661
+ # If the character type has changed from the previous character, check if a space should be inserted
662
+ if (char_type != prev_char_type and prev_char_type != None and char_type != "punctuation" and char_type != "whitespace"):
663
+ if next_char_type != "punctuation" or next_char_type != "whitespace":
664
+ insert_space = True
665
+
666
+ # Insert a space if needed
667
+ if insert_space:
668
+ output_string += " "+c
669
+ insert_space = False
670
+ else:
671
+ output_string += c
672
+
673
+ # Update the previous character type
674
+ prev_char_type = char_type
675
+
676
+ # Print the modified line
677
+ output_string = re.sub(r' +', ' ', output_string)
678
+ output_list.append(output_string)
679
+ return output_list
680
+
681
+ def num2text(self, text, language):
682
+ if language in self.keydict.keys():
683
+ digits = sorted(list(map(int, re.findall(r'\d+', text))),reverse=True)
684
+ if digits:
685
+ for digit in digits:
686
+ text = re.sub(str(digit), ' '+num_to_word(digit, self.keydict[language])+' ', text)
687
+ return self.__post_cleaning(text)
688
+ else:
689
+ print(f"No num-to-char for the given language {language}.")
690
+ return self.__post_cleaning(text)
691
+
692
+ def num2text_list(self, text, language):
693
+ # input is supposed to be a list of strings
694
+ if language in self.keydict.keys():
695
+ output_text = []
696
+ for line in text:
697
+ digits = sorted(list(map(int, re.findall(r'\d+', line))),reverse=True)
698
+ if digits:
699
+ for digit in digits:
700
+ line = re.sub(str(digit), ' '+num_to_word(digit, self.keydict[language])+' ', line)
701
+ output_text.append(line)
702
+ return self.__post_cleaning_list(output_text)
703
+ else:
704
+ print(f"No num-to-char for the given language {language}.")
705
+ return self.__post_cleaning_list(text)
706
+
707
+ def numberToTextConverter(self, text, language):
708
+ if language in self.keydict.keys():
709
+ matches = re.findall(r'\d+\.\d+|\d+', text)
710
+ digits = sorted([int(match) if match.isdigit() else match if re.match(r'^\d+(\.\d+)?$', match) else str(match) for match in matches], key=lambda x: float(x) if isinstance(x, str) and '.' in x else x, reverse=True)
711
+ if digits:
712
+ for digit in digits:
713
+
714
+ if isinstance(digit, int):
715
+ text = re.sub(str(digit), ' '+num_to_word(digit, self.keydict[language]).replace(",", "")+' ', text)
716
+ else:
717
+ parts = str(digit).split('.')
718
+ integer_part = int(parts[0])
719
+ data1 = num_to_word(integer_part, self.keydict[language]).replace(",", "")
720
+ decimal_part = str(parts[1])
721
+ data2 = ''
722
+ for i in decimal_part:
723
+ data2 = data2+' '+num_to_word(i, self.keydict[language])
724
+ if language == 'hindi':
725
+ final_data = f'{data1} दशमलव {data2}'
726
+ elif language == 'tamil':
727
+ final_data = f'{data1} புள்ளி {data2}'
728
+ else:
729
+ final_data = f'{data1} point {data2}'
730
+
731
+
732
+ text = re.sub(str(digit), ' '+final_data+' ', text)
733
+
734
+ return self.__post_cleaning(text)
735
+ else:
736
+
737
+
738
+ words = {
739
+ '0': 'zero', '1': 'one', '2': 'two', '3': 'three', '4': 'four',
740
+ '5': 'five', '6': 'six', '7': 'seven', '8': 'eight', '9': 'nine'
741
+ }
742
+
743
+
744
+ # Use regular expression to find and replace decimal points in numbers
745
+ text = re.sub(r'(?<=\d)\.(?=\d)', ' point ', text)
746
+
747
+ # Find all occurrences of numbers with decimal points and convert them to words
748
+ matches = re.findall(r'point (\d+)', text)
749
+
750
+ for match in matches:
751
+ replacement = ' '.join(words[digit] for digit in match)
752
+ text = text.replace(f'point {match}', f'point {replacement}', 1)
753
+
754
+
755
+ return self.__post_cleaning(text)
756
+
757
+
758
+ def normalize(self, text, language):
759
+ return self.__post_cleaning(text)
760
+
761
+ def normalize_list(self, text, language):
762
+ # input is supposed to be a list of strings
763
+ return self.__post_cleaning_list(text)
764
+
765
+
766
+ class TextPhrasifier:
767
+ @classmethod
768
+ def phrasify(cls, text):
769
+ phrase_list = []
770
+ for phrase in text.split("#"):
771
+ phrase = phrase.strip()
772
+ if phrase != "":
773
+ phrase_list.append(phrase)
774
+ return phrase_list
775
+
776
+ class TextPhrasifier_List:
777
+ @classmethod
778
+ def phrasify(cls, text):
779
+ # input is supposed to be a list of strings
780
+ # output is list of list of strings
781
+ output_list = []
782
+ for line in text:
783
+ phrase_list = []
784
+ for phrase in line.split("#"):
785
+ phrase = phrase.strip()
786
+ if phrase != "":
787
+ phrase_list.append(phrase)
788
+ output_list.append(phrase_list)
789
+ return output_list
790
+
791
+ class DurAlignTextProcessor:
792
+ def __init__(self):
793
+ # this is a static set of cleaning rules to be applied
794
+ self.cleaning_rules = {
795
+ " +" : " ",
796
+ "^" : "$",
797
+ "$" : ".",
798
+ }
799
+ self.cleaning_rules_English = {
800
+ " +" : " ",
801
+ "$" : ".",
802
+ }
803
+ def textProcesor(self, text):
804
+ for key, replacement in self.cleaning_rules.items():
805
+ for idx in range(0,len(text)):
806
+ text[idx] = re.sub(key, replacement, text[idx])
807
+
808
+ return text
809
+
810
+ def textProcesorForEnglish(self, text):
811
+ for key, replacement in self.cleaning_rules_English.items():
812
+ for idx in range(0,len(text)):
813
+ text[idx] = re.sub(key, replacement, text[idx])
814
+
815
+ return text
816
+
817
+ def textProcesor_list(self, text):
818
+ # input expected in 'list of list of string' format
819
+ output_text = []
820
+ for line in text:
821
+ for key, replacement in self.cleaning_rules.items():
822
+ for idx in range(0,len(line)):
823
+ line[idx] = re.sub(key, replacement, line[idx])
824
+ output_text.append(line)
825
+
826
+ return output_text
827
+
828
+
829
+
830
+
831
+ class SharedInit:
832
+ def __init__(self,
833
+ text_cleaner = TextCleaner(),
834
+ text_normalizer=TextNormalizer(),
835
+ phonifier = Phonifier(),
836
+ text_phrasefier = TextPhrasifier(),
837
+ post_processor = DurAlignTextProcessor()):
838
+ self.text_cleaner = text_cleaner
839
+ self.text_normalizer = text_normalizer
840
+ self.phonifier = phonifier
841
+ self.text_phrasefier = text_phrasefier
842
+ self.post_processor = post_processor
843
+
844
+
845
+
846
+ class TTSDurAlignPreprocessor(SharedInit):
847
+
848
+ def preprocess(self, text, language, gender, phone_dictionary):
849
+ # text = text.strip()
850
+ #print(text)
851
+ text = self.text_normalizer.numberToTextConverter(text, language)
852
+ text = self.text_cleaner.clean(text)
853
+ #print("cleaned text", text)
854
+ # text = self.text_normalizer.insert_space(text)
855
+ #text = self.text_normalizer.num2text(text, language)
856
+ # print(text)
857
+ text = self.text_normalizer.normalize(text, language)
858
+ # print(text)
859
+ phrasified_text = TextPhrasifier.phrasify(text)
860
+ #print("phrased",phrasified_text)
861
+
862
+ if language not in list(phone_dictionary.keys()):
863
+ phone_dictionary = self.phonifier.load_lang_dict(language, phone_dictionary)
864
+
865
+ #print(phone_dictionary.keys())
866
+
867
+ phonified_text = self.phonifier.phonify(phrasified_text, language, gender, phone_dictionary)
868
+ #print("phonetext",phonified_text)
869
+ phonified_text = self.post_processor.textProcesor(phonified_text)
870
+ #print(phonified_text)
871
+ return phonified_text, phrasified_text
872
+
873
+ class TTSDurAlignPreprocessor_VTT(SharedInit):
874
+
875
+ def preprocess(self, text, language, gender):
876
+ # text = text.strip()
877
+ text = self.text_cleaner.clean_list(text)
878
+ # text = self.text_normalizer.insert_space_list(text)
879
+ text = self.text_normalizer.num2text_list(text, language)
880
+ text = self.text_normalizer.normalize_list(text, language)
881
+ phrasified_text = TextPhrasifier_List.phrasify(text)
882
+ phonified_text = self.phonifier.phonify_list(phrasified_text, language, gender)
883
+ phonified_text = self.post_processor.textProcesor_list(phonified_text)
884
+ return phonified_text, phrasified_text
885
+
886
+
887
+ class CharTextPreprocessor(SharedInit):
888
+
889
+ def preprocess(self, text, language, gender=None, phone_dictionary=None):
890
+ text = text.strip()
891
+ text = self.text_normalizer.numberToTextConverter(text, language)
892
+ text = self.text_cleaner.clean(text)
893
+ # text = self.text_normalizer.insert_space(text)
894
+ #text = self.text_normalizer.num2text(text, language)
895
+ text = self.text_normalizer.normalize(text, language)
896
+ phrasified_text = TextPhrasifier.phrasify(text)
897
+ phonified_text = phrasified_text # No phonification for character TTS models
898
+ return phonified_text, phrasified_text
899
+
900
+ class CharTextPreprocessor_VTT(SharedInit):
901
+
902
+
903
+ def preprocess(self, text, language, gender=None):
904
+ # text = text.strip()
905
+ text = self.text_cleaner.clean_list(text)
906
+ # text = self.text_normalizer.insert_space_list(text)
907
+ text = self.text_normalizer.num2text_list(text, language)
908
+ text = self.text_normalizer.normalize_list(text, language)
909
+ phrasified_text = TextPhrasifier_List.phrasify(text)
910
+ phonified_text = phrasified_text # No phonification for character TTS models
911
+ return phonified_text, phrasified_text
912
+
913
+
914
+ class TTSPreprocessor(SharedInit):
915
+
916
+ def preprocess(self, text, language, gender, phone_dictionary):
917
+ text = text.strip()
918
+ text = self.text_normalizer.numberToTextConverter(text, language)
919
+ text = self.text_cleaner.clean(text)
920
+ # text = self.text_normalizer.insert_space(text)
921
+ #text = self.text_normalizer.num2text(text, language)
922
+ text = self.text_normalizer.normalize(text, language)
923
+ phrasified_text = TextPhrasifier.phrasify(text)
924
+ if language not in list(phone_dictionary.keys()):
925
+ phone_dictionary = self.phonifier.load_lang_dict(language, phone_dictionary)
926
+ phonified_text = self.phonifier.phonify(phrasified_text, language, gender, phone_dictionary)
927
+ #print(phonified_text)
928
+ phonified_text = self.post_processor.textProcesorForEnglish(phonified_text)
929
+ #print(phonified_text)
930
+ return phonified_text, phrasified_text
931
+
932
+ class TTSPreprocessor_VTT(SharedInit):
933
+
934
+
935
+ def preprocess(self, text, language, gender):
936
+ # print(f"Original text: {text}")
937
+ text = self.text_cleaner.clean_list(text)
938
+ # print(f"After text cleaner: {text}")
939
+ # text = self.text_normalizer.insert_space_list(text)
940
+ # print(f"After insert space: {text}")
941
+ text = self.text_normalizer.num2text_list(text, language)
942
+ # print(f"After num2text: {text}")
943
+ text = self.text_normalizer.normalize_list(text, language)
944
+ # print(f"After text normalizer: {text}")
945
+ phrasified_text = TextPhrasifier_List.phrasify(text)
946
+ # print(f"phrasified_text: {phrasified_text}")
947
+ phonified_text = self.phonifier.phonify_list(phrasified_text, language, gender)
948
+ # print(f"phonified_text: {phonified_text}")
949
+ return phonified_text, phrasified_text