ANLGBOY commited on
Commit
75e6727
·
1 Parent(s): 53f3023
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.img filter=lfs diff=lfs merge=lfs -text
37
+ *.jpg filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ window.json
2
+ filter_bank.json
3
+ style_extractor.onnx
4
+ *.yml
5
+ *.npy
LICENSE ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ BigScience Open RAIL-M License
2
+ dated August 18, 2022
3
+
4
+ Section I: PREAMBLE
5
+
6
+ This Open RAIL-M License was created by BigScience, a collaborative open innovation project aimed at
7
+ the responsible development and use of large multilingual datasets and Large Language Models
8
+ (“LLMs”). While a similar license was originally designed for the BLOOM model, we decided to adapt it
9
+ and create this license in order to propose a general open and responsible license applicable to other
10
+ machine learning based AI models (e.g. multimodal generative models).
11
+ In short, this license strives for both the open and responsible downstream use of the accompanying
12
+ model. When it comes to the open character, we took inspiration from open source permissive licenses
13
+ regarding the grant of IP rights. Referring to the downstream responsible use, we added use-based
14
+ restrictions not permitting the use of the Model in very specific scenarios, in order for the licensor to be
15
+ able to enforce the license in case potential misuses of the Model may occur. Even though downstream
16
+ derivative versions of the model could be released under different licensing terms, the latter will always
17
+ have to include - at minimum - the same use-based restrictions as the ones in the original license (this
18
+ license).
19
+ The development and use of artificial intelligence (“AI”), does not come without concerns. The world has
20
+ witnessed how AI techniques may, in some instances, become risky for the public in general. These risks
21
+ come in many forms, from racial discrimination to the misuse of sensitive information.
22
+ BigScience believes in the intersection between open and responsible AI development; thus, this License
23
+ aims to strike a balance between both in order to enable responsible open-science in the field of AI.
24
+ This License governs the use of the model (and its derivatives) and is informed by the model card
25
+ associated with the model.
26
+
27
+ NOW THEREFORE, You and Licensor agree as follows:
28
+
29
+ 1. Definitions
30
+ (a) "License" means the terms and conditions for use, reproduction, and Distribution as defined in
31
+ this document.
32
+ (b) “Data” means a collection of information and/or content extracted from the dataset used with the
33
+ Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under
34
+ this License.
35
+ (c)“Output” means the results of operating a Model as embodied in informational content resulting
36
+ therefrom.
37
+ (d)“Model” means any accompanying machine-learning based assemblies (including checkpoints),
38
+ consisting of learnt weights, parameters (including optimizer states), corresponding to the model
39
+ architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or
40
+ in part on the Data, using the Complementary Material.
41
+ (e) “Derivatives of the Model” means all modifications to the Model, works based on the Model, or any
42
+ other model which is created or initialized by transfer of patterns of the weights, parameters,
43
+ activations or output of the Model, to the other model, in order to cause the other model to perform
44
+ similarly to the Model, including - but not limited to - distillation methods entailing the use of
45
+ intermediate data representations or methods based on the generation of synthetic data by the Model
46
+ for training the other model.
47
+ (f)“Complementary Material” means the accompanying source code and scripts used to define,
48
+ run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if
49
+ any. This includes any accompanying documentation, tutorials, examples, etc, if any.
50
+ (g) “Distribution” means any transmission, reproduction, publication or other sharing of the Model or
51
+ Derivatives of the Model to a third party, including providing the Model as a hosted service made
52
+ available by electronic or other remote means - e.g. API-based or web access.
53
+ (h) “Licensor” means the copyright owner or entity authorized by the copyright owner that is
54
+ granting the License, including the persons or entities that may have rights in the Model and/or
55
+ distributing the Model.
56
+ (i) "You" (or "Your") means an individual or Legal Entity exercising permissions granted by this
57
+ License and/or making use of the Model for whichever purpose and in any field of use, including
58
+ usage of the Model in an end-use application - e.g. chatbot, translator, image generator.
59
+ (j) “Third Parties” means individuals or legal entities that are not under common control with
60
+ Licensor or You.
61
+ (k) "Contribution" means any work of authorship, including the original version of the Model and
62
+ any modifications or additions to that Model or Derivatives of the Model thereof, that is
63
+ intentionally submitted to Licensor for inclusion in the Model by the copyright owner or by an
64
+ individual or Legal Entity authorized to submit on behalf of the copyright owner. For the
65
+ purposes of this definition,
66
+ “submitted” means any form of electronic, verbal, or written
67
+ communication sent to the Licensor or its representatives, including but not limited to
68
+ communication on electronic mailing lists, source code control systems, and issue tracking
69
+ systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and
70
+ improving the Model, but excluding communication that is conspicuously marked or otherwise
71
+ designated in writing by the copyright owner as "Not a Contribution."
72
+ (l) "Contributor" means Licensor and any individual or Legal Entity on behalf of whom a
73
+ Contribution has been received by Licensor and subsequently incorporated within the Model.
74
+
75
+
76
+ Section II: INTELLECTUAL PROPERTY RIGHTS
77
+
78
+ Both copyright and patent grants apply to the Model, Derivatives of the Model and Complementary
79
+ Material. The Model and Derivatives of the Model are subject to additional terms as described in Section III.
80
+
81
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor
82
+ hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the
83
+ Complementary Material, the Model, and Derivatives of the Model.
84
+
85
+ 3. Grant of Patent License. Subject to the terms and conditions of this License and where and as
86
+ applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge,
87
+ royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer
88
+ to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such
89
+ license applies only to those patent claims licensable by such Contributor that are necessarily infringed by
90
+ their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such
91
+ Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim
92
+ or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution
93
+ incorporated within the Model and/or Complementary Material constitutes direct or contributory patent
94
+ infringement, then any patent licenses granted to You under this License for the Model and/or Work shall
95
+ terminate as of the date such litigation is asserted or filed.
96
+ Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
97
+
98
+ 4. Distribution and Redistribution. You may host for Third Party remote access purposes (e.g.
99
+ software-as-a-service), reproduce and distribute copies of the Model or Derivatives of the Model thereof
100
+ in any medium, with or without modifications, provided that You meet the following conditions:
101
+
102
+ a. Use-based restrictions as referenced in paragraph 5 MUST be included as an enforceable provision
103
+ by You in any type of legal agreement (e.g. a license) governing the use and/or distribution of the
104
+ Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to,
105
+ that the Model or Derivatives of the Model are subject to paragraph 5. This provision does not apply
106
+ to the use of Complementary Material.
107
+
108
+ b. You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this
109
+ License;
110
+
111
+ c. You must cause any modified files to carry prominent notices stating that You changed the files;
112
+
113
+ d. You must retain all copyright, patent, trademark, and attribution notices excluding those notices
114
+ that do not pertain to any part of the Model, Derivatives of the Model.
115
+ You may add Your own copyright statement to Your modifications and may provide additional or
116
+ different license terms and conditions - respecting paragraph 4.a.
117
+ - for use, reproduction, or Distribution
118
+ of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use,
119
+ reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
120
+
121
+ 5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions.
122
+ Therefore You cannot use the Model and the Derivatives of the Model for the specified restricted uses. You
123
+ may use the Model subject to this License, including only for lawful purposes and in accordance with the
124
+ License. Use may include creating any content with, finetuning, updating, running, training, evaluating and/or
125
+ reparametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model
126
+ to comply with the terms of this paragraph (paragraph 5).
127
+
128
+ 6. The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You
129
+ generate using the Model. You are accountable for the Output you generate and its subsequent uses. No
130
+ use of the output can contravene any provision as stated in the License.
131
+
132
+ Section IV: OTHER PROVISIONS
133
+
134
+ 7. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the
135
+ right to restrict (remotely or otherwise) usage of the Model in violation of this License, update the Model
136
+ through electronic means, or modify the Output of the Model based on updates. You shall undertake
137
+ reasonable efforts to use the latest version of the Model.
138
+
139
+ 8. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks,
140
+ trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the
141
+ parties; and any rights not expressly granted herein are reserved by the Licensors.
142
+
143
+ 9. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides
144
+ the Model and the Complementary Material (and each Contributor provides its Contributions) on an "AS
145
+ IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
146
+ including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
147
+ MERCHANTABILITY , or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for
148
+ determining the appropriateness of using or redistributing the Model, Derivatives of the Model, and the
149
+ Complementary Material and assume any risks associated with Your exercise of permissions under this
150
+ License.
151
+
152
+ 10. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence),
153
+ contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or
154
+ agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect,
155
+ special, incidental, or consequential damages of any character arising as a result of this License or out of
156
+ the use or inability to use the Model and the Complementary Material (including but not limited to
157
+ damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other
158
+ commercial damages or losses), even if such Contributor has been advised of the possibility of such
159
+ damages.
160
+
161
+ 11. Accepting Warranty or Additional Liability. While redistributing the Model, Derivatives of the
162
+ Model and the Complementary Material thereof, You may choose to offer, and charge a fee for, acceptance
163
+ of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License.
164
+ However, in accepting such obligations, You may act only on Your own behalf and on Your sole
165
+ responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and
166
+ hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor
167
+ by reason of your accepting any such warranty or additional liability.
168
+
169
+ 12. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining
170
+ provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
171
+
172
+ END OF TERMS AND CONDITIONS
173
+
174
+ Attachment A
175
+
176
+ Use Restrictions
177
+
178
+ You agree not to use the Model or Derivatives of the Model:
179
+ (a) In any way that violates any applicable national, federal, state, local or international law
180
+ or regulation;
181
+ (b) For the purpose of exploiting, harming or attempting to exploit or harm minors in any
182
+ way;
183
+ (c) To generate or disseminate verifiably false information and/or content with the purpose of
184
+ harming others;
185
+ (d) To generate or disseminate personal identifiable information that can be used to harm an
186
+ individual;
187
+ (e) To generate or disseminate information and/or content (e.g. images, code, posts, articles),
188
+ and place the information and/or content in any context (e.g. bot generating tweets)
189
+ without expressly and intelligibly disclaiming that the information and/or content is
190
+ machine generated;
191
+ (f) To defame, disparage or otherwise harass others;
192
+ (g) To impersonate or attempt to impersonate (e.g. deepfakes) others without their consent;
193
+ (h) For fully automated decision making that adversely impacts an individual’s legal rights or
194
+ otherwise creates or modifies a binding, enforceable obligation;
195
+ (i) For any use intended to or which has the effect of discriminating against or harming
196
+ individuals or groups based on online or offline social behavior or known or predicted
197
+ personal or personality characteristics;
198
+ (j) To exploit any of the vulnerabilities of a specific group of persons based on their age,
199
+ social, physical or mental characteristics, in order to materially distort the behavior of a
200
+ person pertaining to that group in a manner that causes or is likely to cause that person or
201
+ another person physical or psychological harm;
202
+ (k) For any use intended to or which has the effect of discriminating against individuals or
203
+ groups based on legally protected characteristics or categories;
204
+ (l) To provide medical advice and medical results interpretation;
205
+ (m) To generate or disseminate information for the purpose to be used for administration of
206
+ justice, law enforcement, immigration or asylum processes, such as predicting an
207
+ individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal
208
+ relationships between assertions made in documents, indiscriminate and
209
+ arbitrarily-targeted use).
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ language:
4
+ - en
5
+ - ko
6
+ - es
7
+ - pt
8
+ - fr
9
+ pipeline_tag: text-to-speech
10
+ tags:
11
+ - text-to-speech
12
+ - speech-synthesis
13
+ - tts
14
+ - onnx
15
+ library_name: supertonic
16
+ ---
17
+
18
+ # Supertonic 2 — Lightning Fast, On-Device TTS, Multilingual TTS
19
+
20
+ ![Supertonic Preview](img/supertonic_preview_0.1.jpg)
21
+
22
+ <p align="center">
23
+ <a href="https://huggingface.co/spaces/Supertone/supertonic-2"><img src="https://img.shields.io/badge/🤗_Demo-Hugging_Face-yellow?style=for-the-badge" alt="Demo"></a>
24
+ <a href="https://github.com/supertone-inc/supertonic"><img src="https://img.shields.io/badge/💻_Code-GitHub-black?style=for-the-badge&logo=github" alt="Code"></a>
25
+ </p>
26
+
27
+ **Supertonic** is a lightning-fast, on-device text-to-speech system designed for **extreme performance** with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your device—no cloud, no API calls, no privacy concerns.
28
+
29
+ ## What's New in Supertonic 2
30
+
31
+ **Supertonic 2** extends multilingual capabilities while maintaining the same inference speed and efficiency as the original.
32
+
33
+ ### 🌍 Multilingual Support
34
+
35
+ | Language | Code |
36
+ |----------|------|
37
+ | English | `en` |
38
+ | Korean | `ko` |
39
+ | Spanish | `es` |
40
+ | Portuguese | `pt` |
41
+ | French | `fr` |
42
+
43
+ ### ⚡ Same Speed, More Languages
44
+
45
+ - **No speed degradation**: Supertonic 2 delivers the same ultra-fast inference speed as the original—up to **167× faster than real-time**
46
+ - **Efficient architecture**: Only **66M parameters**, optimized for on-device deployment
47
+ - **Cross-language consistency**: All supported languages share the same model architecture and inference pipeline
48
+
49
+ ## Performance
50
+
51
+ We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars).
52
+
53
+ **Metrics:**
54
+ - **Characters per Second**: Measures throughput by dividing the number of input characters by the time required to generate audio. Higher is better.
55
+ - **Real-time Factor (RTF)**: Measures the time taken to synthesize audio relative to its duration. Lower is better (e.g., RTF of 0.1 means it takes 0.1 seconds to generate one second of audio).
56
+
57
+ ### Characters per Second
58
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
59
+ |--------|-----------------|----------------|-----------------|
60
+ | **Supertonic** (M4 pro - CPU) | 912 | 1048 | 1263 |
61
+ | **Supertonic** (M4 pro - WebGPU) | 996 | 1801 | 2509 |
62
+ | **Supertonic** (RTX4090) | 2615 | 6548 | 12164 |
63
+ | `API` [ElevenLabs Flash v2.5](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) | 144 | 209 | 287 |
64
+ | `API` [OpenAI TTS-1](https://platform.openai.com/docs/guides/text-to-speech) | 37 | 55 | 82 |
65
+ | `API` [Gemini 2.5 Flash TTS](https://ai.google.dev/gemini-api/docs/speech-generation) | 12 | 18 | 24 |
66
+ | `API` [Supertone Sona speech 1](https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech) | 38 | 64 | 92 |
67
+ | `Open` [Kokoro](https://github.com/hexgrad/kokoro/) | 104 | 107 | 117 |
68
+ | `Open` [NeuTTS Air](https://github.com/neuphonic/neutts-air) | 37 | 42 | 47 |
69
+
70
+ > **Notes:**
71
+ > `API` = Cloud-based API services (measured from Seoul)
72
+ > `Open` = Open-source models
73
+ > Supertonic (M4 pro - CPU) and (M4 pro - WebGPU): Tested with ONNX
74
+ > Supertonic (RTX4090): Tested with PyTorch model
75
+ > Kokoro: Tested on M4 Pro CPU with ONNX
76
+ > NeuTTS Air: Tested on M4 Pro CPU with Q8-GGUF
77
+
78
+ ### Real-time Factor
79
+
80
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
81
+ |--------|-----------------|----------------|-----------------|
82
+ | **Supertonic** (M4 pro - CPU) | 0.015 | 0.013 | 0.012 |
83
+ | **Supertonic** (M4 pro - WebGPU) | 0.014 | 0.007 | 0.006 |
84
+ | **Supertonic** (RTX4090) | 0.005 | 0.002 | 0.001 |
85
+ | `API` [ElevenLabs Flash v2.5](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) | 0.133 | 0.077 | 0.057 |
86
+ | `API` [OpenAI TTS-1](https://platform.openai.com/docs/guides/text-to-speech) | 0.471 | 0.302 | 0.201 |
87
+ | `API` [Gemini 2.5 Flash TTS](https://ai.google.dev/gemini-api/docs/speech-generation) | 1.060 | 0.673 | 0.541 |
88
+ | `API` [Supertone Sona speech 1](https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech) | 0.372 | 0.206 | 0.163 |
89
+ | `Open` [Kokoro](https://github.com/hexgrad/kokoro/) | 0.144 | 0.124 | 0.126 |
90
+ | `Open` [NeuTTS Air](https://github.com/neuphonic/neutts-air) | 0.390 | 0.338 | 0.343 |
91
+
92
+ <details>
93
+ <summary><b>Additional Performance Data (5-step inference)</b></summary>
94
+
95
+ <br>
96
+
97
+ **Characters per Second (5-step)**
98
+
99
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
100
+ |--------|-----------------|----------------|-----------------|
101
+ | **Supertonic** (M4 pro - CPU) | 596 | 691 | 850 |
102
+ | **Supertonic** (M4 pro - WebGPU) | 570 | 1118 | 1546 |
103
+ | **Supertonic** (RTX4090) | 1286 | 3757 | 6242 |
104
+
105
+ **Real-time Factor (5-step)**
106
+
107
+ | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
108
+ |--------|-----------------|----------------|-----------------|
109
+ | **Supertonic** (M4 pro - CPU) | 0.023 | 0.019 | 0.018 |
110
+ | **Supertonic** (M4 pro - WebGPU) | 0.024 | 0.012 | 0.010 |
111
+ | **Supertonic** (RTX4090) | 0.011 | 0.004 | 0.002 |
112
+
113
+ </details>
114
+
115
+ ## License
116
+
117
+ This project’s sample code is released under the MIT License. - see the [LICENSE](https://github.com/supertone-inc/supertonic?tab=MIT-1-ov-file) for details.
118
+
119
+ The accompanying model is released under the OpenRAIL-M License. - see the [LICENSE](https://huggingface.co/Supertone/supertonic-2/blob/main/LICENSE) file for details.
120
+
121
+ This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. - see the [LICENSE](https://docs.pytorch.org/FBGEMM/general/License.html) for details.
122
+
123
+ Copyright (c) 2026 Supertone Inc.
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "model_name": "Supertonic 2",
3
+ "model_type": "onnx",
4
+ "description": "This is a stub config for Hugging Face download counting. The actual model is located at onnx/"
5
+ }
img/supertonic_preview_0.1.jpg ADDED

Git LFS Details

  • SHA256: 4648945559928f84ad00aa91c76ef6bf1d29f60617f81114e49afaa8c4f390df
  • Pointer size: 131 Bytes
  • Size of remote file: 785 kB
onnx/duration_predictor.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d556b3691165c364be91dc0bd894656b5949f5acd2750d8ec2f954010845011
3
+ size 1521526
onnx/text_encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd5f535ed629f7df86071043e15f541ce1b2ab7f1bdbce4c7892b307bca79fa3
3
+ size 27431318
onnx/tts.json ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tts_version": "v1.6.0",
3
+ "split": "opensource-multilingual",
4
+ "ttl_ckpt_path": "unknown.pt",
5
+ "dp_ckpt_path": "unknown.pt",
6
+ "ae_ckpt_path": "unknown.pt",
7
+ "ttl_train": "unknown",
8
+ "dp_train": "unknown",
9
+ "ae_train": "unknown",
10
+ "ttl": {
11
+ "latent_dim": 24,
12
+ "chunk_compress_factor": 6,
13
+ "batch_expander": {
14
+ "n_batch_expand": 6
15
+ },
16
+ "normalizer": {
17
+ "scale": 0.25
18
+ },
19
+ "text_encoder": {
20
+ "char_dict_path": "resources/metadata/char_dict/opensource-multilingual2/char_dict.json",
21
+ "text_embedder": {
22
+ "char_dict_path": "resources/metadata/char_dict/opensource-multilingual2/char_dict.json",
23
+ "char_emb_dim": 256
24
+ },
25
+ "convnext": {
26
+ "idim": 256,
27
+ "ksz": 5,
28
+ "intermediate_dim": 1024,
29
+ "num_layers": 6,
30
+ "dilation_lst": [
31
+ 1,
32
+ 1,
33
+ 1,
34
+ 1,
35
+ 1,
36
+ 1
37
+ ]
38
+ },
39
+ "attn_encoder": {
40
+ "hidden_channels": 256,
41
+ "filter_channels": 1024,
42
+ "n_heads": 4,
43
+ "n_layers": 4,
44
+ "p_dropout": 0.1
45
+ },
46
+ "proj_out": {
47
+ "idim": 256,
48
+ "odim": 256
49
+ }
50
+ },
51
+ "flow_matching": {
52
+ "sig_min": 0
53
+ },
54
+ "style_encoder": {
55
+ "proj_in": {
56
+ "ldim": 24,
57
+ "chunk_compress_factor": 6,
58
+ "odim": 256
59
+ },
60
+ "convnext": {
61
+ "idim": 256,
62
+ "ksz": 5,
63
+ "intermediate_dim": 1024,
64
+ "num_layers": 6,
65
+ "dilation_lst": [
66
+ 1,
67
+ 1,
68
+ 1,
69
+ 1,
70
+ 1,
71
+ 1
72
+ ]
73
+ },
74
+ "style_token_layer": {
75
+ "input_dim": 256,
76
+ "n_style": 50,
77
+ "style_key_dim": 256,
78
+ "style_value_dim": 256,
79
+ "prototype_dim": 256,
80
+ "n_units": 256,
81
+ "n_heads": 2
82
+ }
83
+ },
84
+ "speech_prompted_text_encoder": {
85
+ "text_dim": 256,
86
+ "style_dim": 256,
87
+ "n_units": 256,
88
+ "n_heads": 2
89
+ },
90
+ "uncond_masker": {
91
+ "prob_both_uncond": 0.04,
92
+ "prob_text_uncond": 0.01,
93
+ "std": 0.1,
94
+ "text_dim": 256,
95
+ "n_style": 50,
96
+ "style_key_dim": 256,
97
+ "style_value_dim": 256
98
+ },
99
+ "vector_field": {
100
+ "proj_in": {
101
+ "ldim": 24,
102
+ "chunk_compress_factor": 6,
103
+ "odim": 512
104
+ },
105
+ "time_encoder": {
106
+ "time_dim": 64,
107
+ "hdim": 256
108
+ },
109
+ "main_blocks": {
110
+ "n_blocks": 4,
111
+ "time_cond_layer": {
112
+ "idim": 512,
113
+ "time_dim": 64
114
+ },
115
+ "style_cond_layer": {
116
+ "idim": 512,
117
+ "style_dim": 256
118
+ },
119
+ "text_cond_layer": {
120
+ "idim": 512,
121
+ "text_dim": 256,
122
+ "n_heads": 4,
123
+ "use_residual": true,
124
+ "rotary_base": 10000,
125
+ "rotary_scale": 10
126
+ },
127
+ "convnext_0": {
128
+ "idim": 512,
129
+ "ksz": 5,
130
+ "intermediate_dim": 1024,
131
+ "num_layers": 4,
132
+ "dilation_lst": [
133
+ 1,
134
+ 2,
135
+ 4,
136
+ 8
137
+ ]
138
+ },
139
+ "convnext_1": {
140
+ "idim": 512,
141
+ "ksz": 5,
142
+ "intermediate_dim": 1024,
143
+ "num_layers": 1,
144
+ "dilation_lst": [
145
+ 1
146
+ ]
147
+ },
148
+ "convnext_2": {
149
+ "idim": 512,
150
+ "ksz": 5,
151
+ "intermediate_dim": 1024,
152
+ "num_layers": 1,
153
+ "dilation_lst": [
154
+ 1
155
+ ]
156
+ }
157
+ },
158
+ "last_convnext": {
159
+ "idim": 512,
160
+ "ksz": 5,
161
+ "intermediate_dim": 1024,
162
+ "num_layers": 4,
163
+ "dilation_lst": [
164
+ 1,
165
+ 1,
166
+ 1,
167
+ 1
168
+ ]
169
+ },
170
+ "proj_out": {
171
+ "idim": 512,
172
+ "chunk_compress_factor": 6,
173
+ "ldim": 24
174
+ }
175
+ }
176
+ },
177
+ "ae": {
178
+ "sample_rate": 44100,
179
+ "n_delay": 0,
180
+ "base_chunk_size": 512,
181
+ "chunk_compress_factor": 1,
182
+ "ldim": 24,
183
+ "encoder": {
184
+ "spec_processor": {
185
+ "n_fft": 2048,
186
+ "win_length": 2048,
187
+ "hop_length": 512,
188
+ "n_mels": 228,
189
+ "sample_rate": 44100,
190
+ "eps": 1e-05,
191
+ "norm_mean": 0.0,
192
+ "norm_std": 1.0
193
+ },
194
+ "ksz_init": 7,
195
+ "ksz": 7,
196
+ "num_layers": 10,
197
+ "dilation_lst": [
198
+ 1,
199
+ 1,
200
+ 1,
201
+ 1,
202
+ 1,
203
+ 1,
204
+ 1,
205
+ 1,
206
+ 1,
207
+ 1
208
+ ],
209
+ "intermediate_dim": 2048,
210
+ "idim": 1253,
211
+ "hdim": 512,
212
+ "odim": 24
213
+ },
214
+ "decoder": {
215
+ "ksz_init": 7,
216
+ "ksz": 7,
217
+ "num_layers": 10,
218
+ "dilation_lst": [
219
+ 1,
220
+ 2,
221
+ 4,
222
+ 1,
223
+ 2,
224
+ 4,
225
+ 1,
226
+ 1,
227
+ 1,
228
+ 1
229
+ ],
230
+ "intermediate_dim": 2048,
231
+ "idim": 24,
232
+ "hdim": 512,
233
+ "head": {
234
+ "idim": 512,
235
+ "hdim": 2048,
236
+ "odim": 512,
237
+ "ksz": 3
238
+ }
239
+ }
240
+ },
241
+ "dp": {
242
+ "latent_dim": 24,
243
+ "chunk_compress_factor": 6,
244
+ "normalizer": {
245
+ "scale": 1.0
246
+ },
247
+ "sentence_encoder": {
248
+ "char_emb_dim": 64,
249
+ "char_dict_path": "resources/metadata/char_dict/opensource-multilingual2/char_dict.json",
250
+ "text_embedder": {
251
+ "char_dict_path": "resources/metadata/char_dict/opensource-multilingual2/char_dict.json",
252
+ "char_emb_dim": 64
253
+ },
254
+ "convnext": {
255
+ "idim": 64,
256
+ "ksz": 5,
257
+ "intermediate_dim": 256,
258
+ "num_layers": 6,
259
+ "dilation_lst": [
260
+ 1,
261
+ 1,
262
+ 1,
263
+ 1,
264
+ 1,
265
+ 1
266
+ ]
267
+ },
268
+ "attn_encoder": {
269
+ "hidden_channels": 64,
270
+ "filter_channels": 256,
271
+ "n_heads": 2,
272
+ "n_layers": 2,
273
+ "p_dropout": 0.0
274
+ },
275
+ "proj_out": {
276
+ "idim": 64,
277
+ "odim": 64
278
+ }
279
+ },
280
+ "style_encoder": {
281
+ "proj_in": {
282
+ "ldim": 24,
283
+ "chunk_compress_factor": 6,
284
+ "odim": 64
285
+ },
286
+ "convnext": {
287
+ "idim": 64,
288
+ "ksz": 5,
289
+ "intermediate_dim": 256,
290
+ "num_layers": 4,
291
+ "dilation_lst": [
292
+ 1,
293
+ 1,
294
+ 1,
295
+ 1
296
+ ]
297
+ },
298
+ "style_token_layer": {
299
+ "input_dim": 64,
300
+ "n_style": 8,
301
+ "style_key_dim": 0,
302
+ "style_value_dim": 16,
303
+ "prototype_dim": 64,
304
+ "n_units": 64,
305
+ "n_heads": 2
306
+ }
307
+ },
308
+ "predictor": {
309
+ "sentence_dim": 64,
310
+ "n_style": 8,
311
+ "style_dim": 16,
312
+ "hdim": 128,
313
+ "n_layer": 2
314
+ }
315
+ }
316
+ }
onnx/unicode_indexer.json ADDED
The diff for this file is too large to render. See raw diff
 
onnx/vector_estimator.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:105e9d66fd8756876b210a6b4aa03fc393b1eaca3a8dadcc8d9a3bc785c86a35
3
+ size 132471364
onnx/vocoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19bd51f47a186069c752403518a40f7ea4c647455056d2511f7249691ecddf7c
3
+ size 101405066
voice_styles/F1.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F2.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F3.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F4.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F5.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M1.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M2.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M3.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M4.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M5.json ADDED
The diff for this file is too large to render. See raw diff