HFHash789 commited on
Commit
dc6a411
·
verified ·
1 Parent(s): 2752ea2

Upload folder using huggingface_hub

Browse files
Files changed (10) hide show
  1. .dockerignore +17 -0
  2. .gitignore +17 -0
  3. Dockerfile +26 -0
  4. LICENSE +674 -0
  5. README.md +237 -6
  6. app.py +331 -0
  7. requirements.txt +10 -0
  8. templates/index.html +343 -0
  9. 启动服务.bat +24 -0
  10. 安装N卡GPU支持.bat +20 -0
.dockerignore ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .gitignore
3
+ __pycache__
4
+ *.pyc
5
+ *.pyo
6
+ *.pyd
7
+ .pytest_cache
8
+ .mypy_cache
9
+ .ruff_cache
10
+
11
+ # Local artifacts
12
+ models
13
+ *.wav
14
+ *.mp3
15
+
16
+ # Windows helpers
17
+ *.bat
.gitignore ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ runtime/
6
+ models/
7
+ dist/
8
+ wheels/
9
+ *.egg-info
10
+ tools
11
+ # Virtual environments
12
+ .venv
13
+ venv
14
+ *.7z
15
+ *.exe
16
+ *.001
17
+ *.002
Dockerfile ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ ENV PYTHONDONTWRITEBYTECODE=1
6
+ ENV PYTHONUNBUFFERED=1
7
+
8
+ # System deps:
9
+ # - ffmpeg: mp3 encode/decode + prompt audio convert
10
+ # - libsndfile1: required by python-soundfile
11
+ RUN apt-get update && apt-get install -y --no-install-recommends \
12
+ ffmpeg \
13
+ libsndfile1 \
14
+ git \
15
+ && rm -rf /var/lib/apt/lists/*
16
+
17
+ COPY requirements.txt .
18
+ RUN pip install --no-cache-dir -r requirements.txt
19
+
20
+ COPY . .
21
+
22
+ ENV HOST=0.0.0.0
23
+ ENV PORT=7860
24
+ EXPOSE 7860
25
+
26
+ CMD ["python", "-u", "app.py"]
LICENSE ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GNU GENERAL PUBLIC LICENSE
2
+ Version 3, 29 June 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ Preamble
9
+
10
+ The GNU General Public License is a free, copyleft license for
11
+ software and other kinds of works.
12
+
13
+ The licenses for most software and other practical works are designed
14
+ to take away your freedom to share and change the works. By contrast,
15
+ the GNU General Public License is intended to guarantee your freedom to
16
+ share and change all versions of a program--to make sure it remains free
17
+ software for all its users. We, the Free Software Foundation, use the
18
+ GNU General Public License for most of our software; it applies also to
19
+ any other work released this way by its authors. You can apply it to
20
+ your programs, too.
21
+
22
+ When we speak of free software, we are referring to freedom, not
23
+ price. Our General Public Licenses are designed to make sure that you
24
+ have the freedom to distribute copies of free software (and charge for
25
+ them if you wish), that you receive source code or can get it if you
26
+ want it, that you can change the software or use pieces of it in new
27
+ free programs, and that you know you can do these things.
28
+
29
+ To protect your rights, we need to prevent others from denying you
30
+ these rights or asking you to surrender the rights. Therefore, you have
31
+ certain responsibilities if you distribute copies of the software, or if
32
+ you modify it: responsibilities to respect the freedom of others.
33
+
34
+ For example, if you distribute copies of such a program, whether
35
+ gratis or for a fee, you must pass on to the recipients the same
36
+ freedoms that you received. You must make sure that they, too, receive
37
+ or can get the source code. And you must show them these terms so they
38
+ know their rights.
39
+
40
+ Developers that use the GNU GPL protect your rights with two steps:
41
+ (1) assert copyright on the software, and (2) offer you this License
42
+ giving you legal permission to copy, distribute and/or modify it.
43
+
44
+ For the developers' and authors' protection, the GPL clearly explains
45
+ that there is no warranty for this free software. For both users' and
46
+ authors' sake, the GPL requires that modified versions be marked as
47
+ changed, so that their problems will not be attributed erroneously to
48
+ authors of previous versions.
49
+
50
+ Some devices are designed to deny users access to install or run
51
+ modified versions of the software inside them, although the manufacturer
52
+ can do so. This is fundamentally incompatible with the aim of
53
+ protecting users' freedom to change the software. The systematic
54
+ pattern of such abuse occurs in the area of products for individuals to
55
+ use, which is precisely where it is most unacceptable. Therefore, we
56
+ have designed this version of the GPL to prohibit the practice for those
57
+ products. If such problems arise substantially in other domains, we
58
+ stand ready to extend this provision to those domains in future versions
59
+ of the GPL, as needed to protect the freedom of users.
60
+
61
+ Finally, every program is threatened constantly by software patents.
62
+ States should not allow patents to restrict development and use of
63
+ software on general-purpose computers, but in those that do, we wish to
64
+ avoid the special danger that patents applied to a free program could
65
+ make it effectively proprietary. To prevent this, the GPL assures that
66
+ patents cannot be used to render the program non-free.
67
+
68
+ The precise terms and conditions for copying, distribution and
69
+ modification follow.
70
+
71
+ TERMS AND CONDITIONS
72
+
73
+ 0. Definitions.
74
+
75
+ "This License" refers to version 3 of the GNU General Public License.
76
+
77
+ "Copyright" also means copyright-like laws that apply to other kinds of
78
+ works, such as semiconductor masks.
79
+
80
+ "The Program" refers to any copyrightable work licensed under this
81
+ License. Each licensee is addressed as "you". "Licensees" and
82
+ "recipients" may be individuals or organizations.
83
+
84
+ To "modify" a work means to copy from or adapt all or part of the work
85
+ in a fashion requiring copyright permission, other than the making of an
86
+ exact copy. The resulting work is called a "modified version" of the
87
+ earlier work or a work "based on" the earlier work.
88
+
89
+ A "covered work" means either the unmodified Program or a work based
90
+ on the Program.
91
+
92
+ To "propagate" a work means to do anything with it that, without
93
+ permission, would make you directly or secondarily liable for
94
+ infringement under applicable copyright law, except executing it on a
95
+ computer or modifying a private copy. Propagation includes copying,
96
+ distribution (with or without modification), making available to the
97
+ public, and in some countries other activities as well.
98
+
99
+ To "convey" a work means any kind of propagation that enables other
100
+ parties to make or receive copies. Mere interaction with a user through
101
+ a computer network, with no transfer of a copy, is not conveying.
102
+
103
+ An interactive user interface displays "Appropriate Legal Notices"
104
+ to the extent that it includes a convenient and prominently visible
105
+ feature that (1) displays an appropriate copyright notice, and (2)
106
+ tells the user that there is no warranty for the work (except to the
107
+ extent that warranties are provided), that licensees may convey the
108
+ work under this License, and how to view a copy of this License. If
109
+ the interface presents a list of user commands or options, such as a
110
+ menu, a prominent item in the list meets this criterion.
111
+
112
+ 1. Source Code.
113
+
114
+ The "source code" for a work means the preferred form of the work
115
+ for making modifications to it. "Object code" means any non-source
116
+ form of a work.
117
+
118
+ A "Standard Interface" means an interface that either is an official
119
+ standard defined by a recognized standards body, or, in the case of
120
+ interfaces specified for a particular programming language, one that
121
+ is widely used among developers working in that language.
122
+
123
+ The "System Libraries" of an executable work include anything, other
124
+ than the work as a whole, that (a) is included in the normal form of
125
+ packaging a Major Component, but which is not part of that Major
126
+ Component, and (b) serves only to enable use of the work with that
127
+ Major Component, or to implement a Standard Interface for which an
128
+ implementation is available to the public in source code form. A
129
+ "Major Component", in this context, means a major essential component
130
+ (kernel, window system, and so on) of the specific operating system
131
+ (if any) on which the executable work runs, or a compiler used to
132
+ produce the work, or an object code interpreter used to run it.
133
+
134
+ The "Corresponding Source" for a work in object code form means all
135
+ the source code needed to generate, install, and (for an executable
136
+ work) run the object code and to modify the work, including scripts to
137
+ control those activities. However, it does not include the work's
138
+ System Libraries, or general-purpose tools or generally available free
139
+ programs which are used unmodified in performing those activities but
140
+ which are not part of the work. For example, Corresponding Source
141
+ includes interface definition files associated with source files for
142
+ the work, and the source code for shared libraries and dynamically
143
+ linked subprograms that the work is specifically designed to require,
144
+ such as by intimate data communication or control flow between those
145
+ subprograms and other parts of the work.
146
+
147
+ The Corresponding Source need not include anything that users
148
+ can regenerate automatically from other parts of the Corresponding
149
+ Source.
150
+
151
+ The Corresponding Source for a work in source code form is that
152
+ same work.
153
+
154
+ 2. Basic Permissions.
155
+
156
+ All rights granted under this License are granted for the term of
157
+ copyright on the Program, and are irrevocable provided the stated
158
+ conditions are met. This License explicitly affirms your unlimited
159
+ permission to run the unmodified Program. The output from running a
160
+ covered work is covered by this License only if the output, given its
161
+ content, constitutes a covered work. This License acknowledges your
162
+ rights of fair use or other equivalent, as provided by copyright law.
163
+
164
+ You may make, run and propagate covered works that you do not
165
+ convey, without conditions so long as your license otherwise remains
166
+ in force. You may convey covered works to others for the sole purpose
167
+ of having them make modifications exclusively for you, or provide you
168
+ with facilities for running those works, provided that you comply with
169
+ the terms of this License in conveying all material for which you do
170
+ not control copyright. Those thus making or running the covered works
171
+ for you must do so exclusively on your behalf, under your direction
172
+ and control, on terms that prohibit them from making any copies of
173
+ your copyrighted material outside their relationship with you.
174
+
175
+ Conveying under any other circumstances is permitted solely under
176
+ the conditions stated below. Sublicensing is not allowed; section 10
177
+ makes it unnecessary.
178
+
179
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180
+
181
+ No covered work shall be deemed part of an effective technological
182
+ measure under any applicable law fulfilling obligations under article
183
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184
+ similar laws prohibiting or restricting circumvention of such
185
+ measures.
186
+
187
+ When you convey a covered work, you waive any legal power to forbid
188
+ circumvention of technological measures to the extent such circumvention
189
+ is effected by exercising rights under this License with respect to
190
+ the covered work, and you disclaim any intention to limit operation or
191
+ modification of the work as a means of enforcing, against the work's
192
+ users, your or third parties' legal rights to forbid circumvention of
193
+ technological measures.
194
+
195
+ 4. Conveying Verbatim Copies.
196
+
197
+ You may convey verbatim copies of the Program's source code as you
198
+ receive it, in any medium, provided that you conspicuously and
199
+ appropriately publish on each copy an appropriate copyright notice;
200
+ keep intact all notices stating that this License and any
201
+ non-permissive terms added in accord with section 7 apply to the code;
202
+ keep intact all notices of the absence of any warranty; and give all
203
+ recipients a copy of this License along with the Program.
204
+
205
+ You may charge any price or no price for each copy that you convey,
206
+ and you may offer support or warranty protection for a fee.
207
+
208
+ 5. Conveying Modified Source Versions.
209
+
210
+ You may convey a work based on the Program, or the modifications to
211
+ produce it from the Program, in the form of source code under the
212
+ terms of section 4, provided that you also meet all of these conditions:
213
+
214
+ a) The work must carry prominent notices stating that you modified
215
+ it, and giving a relevant date.
216
+
217
+ b) The work must carry prominent notices stating that it is
218
+ released under this License and any conditions added under section
219
+ 7. This requirement modifies the requirement in section 4 to
220
+ "keep intact all notices".
221
+
222
+ c) You must license the entire work, as a whole, under this
223
+ License to anyone who comes into possession of a copy. This
224
+ License will therefore apply, along with any applicable section 7
225
+ additional terms, to the whole of the work, and all its parts,
226
+ regardless of how they are packaged. This License gives no
227
+ permission to license the work in any other way, but it does not
228
+ invalidate such permission if you have separately received it.
229
+
230
+ d) If the work has interactive user interfaces, each must display
231
+ Appropriate Legal Notices; however, if the Program has interactive
232
+ interfaces that do not display Appropriate Legal Notices, your
233
+ work need not make them do so.
234
+
235
+ A compilation of a covered work with other separate and independent
236
+ works, which are not by their nature extensions of the covered work,
237
+ and which are not combined with it such as to form a larger program,
238
+ in or on a volume of a storage or distribution medium, is called an
239
+ "aggregate" if the compilation and its resulting copyright are not
240
+ used to limit the access or legal rights of the compilation's users
241
+ beyond what the individual works permit. Inclusion of a covered work
242
+ in an aggregate does not cause this License to apply to the other
243
+ parts of the aggregate.
244
+
245
+ 6. Conveying Non-Source Forms.
246
+
247
+ You may convey a covered work in object code form under the terms
248
+ of sections 4 and 5, provided that you also convey the
249
+ machine-readable Corresponding Source under the terms of this License,
250
+ in one of these ways:
251
+
252
+ a) Convey the object code in, or embodied in, a physical product
253
+ (including a physical distribution medium), accompanied by the
254
+ Corresponding Source fixed on a durable physical medium
255
+ customarily used for software interchange.
256
+
257
+ b) Convey the object code in, or embodied in, a physical product
258
+ (including a physical distribution medium), accompanied by a
259
+ written offer, valid for at least three years and valid for as
260
+ long as you offer spare parts or customer support for that product
261
+ model, to give anyone who possesses the object code either (1) a
262
+ copy of the Corresponding Source for all the software in the
263
+ product that is covered by this License, on a durable physical
264
+ medium customarily used for software interchange, for a price no
265
+ more than your reasonable cost of physically performing this
266
+ conveying of source, or (2) access to copy the
267
+ Corresponding Source from a network server at no charge.
268
+
269
+ c) Convey individual copies of the object code with a copy of the
270
+ written offer to provide the Corresponding Source. This
271
+ alternative is allowed only occasionally and noncommercially, and
272
+ only if you received the object code with such an offer, in accord
273
+ with subsection 6b.
274
+
275
+ d) Convey the object code by offering access from a designated
276
+ place (gratis or for a charge), and offer equivalent access to the
277
+ Corresponding Source in the same way through the same place at no
278
+ further charge. You need not require recipients to copy the
279
+ Corresponding Source along with the object code. If the place to
280
+ copy the object code is a network server, the Corresponding Source
281
+ may be on a different server (operated by you or a third party)
282
+ that supports equivalent copying facilities, provided you maintain
283
+ clear directions next to the object code saying where to find the
284
+ Corresponding Source. Regardless of what server hosts the
285
+ Corresponding Source, you remain obligated to ensure that it is
286
+ available for as long as needed to satisfy these requirements.
287
+
288
+ e) Convey the object code using peer-to-peer transmission, provided
289
+ you inform other peers where the object code and Corresponding
290
+ Source of the work are being offered to the general public at no
291
+ charge under subsection 6d.
292
+
293
+ A separable portion of the object code, whose source code is excluded
294
+ from the Corresponding Source as a System Library, need not be
295
+ included in conveying the object code work.
296
+
297
+ A "User Product" is either (1) a "consumer product", which means any
298
+ tangible personal property which is normally used for personal, family,
299
+ or household purposes, or (2) anything designed or sold for incorporation
300
+ into a dwelling. In determining whether a product is a consumer product,
301
+ doubtful cases shall be resolved in favor of coverage. For a particular
302
+ product received by a particular user, "normally used" refers to a
303
+ typical or common use of that class of product, regardless of the status
304
+ of the particular user or of the way in which the particular user
305
+ actually uses, or expects or is expected to use, the product. A product
306
+ is a consumer product regardless of whether the product has substantial
307
+ commercial, industrial or non-consumer uses, unless such uses represent
308
+ the only significant mode of use of the product.
309
+
310
+ "Installation Information" for a User Product means any methods,
311
+ procedures, authorization keys, or other information required to install
312
+ and execute modified versions of a covered work in that User Product from
313
+ a modified version of its Corresponding Source. The information must
314
+ suffice to ensure that the continued functioning of the modified object
315
+ code is in no case prevented or interfered with solely because
316
+ modification has been made.
317
+
318
+ If you convey an object code work under this section in, or with, or
319
+ specifically for use in, a User Product, and the conveying occurs as
320
+ part of a transaction in which the right of possession and use of the
321
+ User Product is transferred to the recipient in perpetuity or for a
322
+ fixed term (regardless of how the transaction is characterized), the
323
+ Corresponding Source conveyed under this section must be accompanied
324
+ by the Installation Information. But this requirement does not apply
325
+ if neither you nor any third party retains the ability to install
326
+ modified object code on the User Product (for example, the work has
327
+ been installed in ROM).
328
+
329
+ The requirement to provide Installation Information does not include a
330
+ requirement to continue to provide support service, warranty, or updates
331
+ for a work that has been modified or installed by the recipient, or for
332
+ the User Product in which it has been modified or installed. Access to a
333
+ network may be denied when the modification itself materially and
334
+ adversely affects the operation of the network or violates the rules and
335
+ protocols for communication across the network.
336
+
337
+ Corresponding Source conveyed, and Installation Information provided,
338
+ in accord with this section must be in a format that is publicly
339
+ documented (and with an implementation available to the public in
340
+ source code form), and must require no special password or key for
341
+ unpacking, reading or copying.
342
+
343
+ 7. Additional Terms.
344
+
345
+ "Additional permissions" are terms that supplement the terms of this
346
+ License by making exceptions from one or more of its conditions.
347
+ Additional permissions that are applicable to the entire Program shall
348
+ be treated as though they were included in this License, to the extent
349
+ that they are valid under applicable law. If additional permissions
350
+ apply only to part of the Program, that part may be used separately
351
+ under those permissions, but the entire Program remains governed by
352
+ this License without regard to the additional permissions.
353
+
354
+ When you convey a copy of a covered work, you may at your option
355
+ remove any additional permissions from that copy, or from any part of
356
+ it. (Additional permissions may be written to require their own
357
+ removal in certain cases when you modify the work.) You may place
358
+ additional permissions on material, added by you to a covered work,
359
+ for which you have or can give appropriate copyright permission.
360
+
361
+ Notwithstanding any other provision of this License, for material you
362
+ add to a covered work, you may (if authorized by the copyright holders of
363
+ that material) supplement the terms of this License with terms:
364
+
365
+ a) Disclaiming warranty or limiting liability differently from the
366
+ terms of sections 15 and 16 of this License; or
367
+
368
+ b) Requiring preservation of specified reasonable legal notices or
369
+ author attributions in that material or in the Appropriate Legal
370
+ Notices displayed by works containing it; or
371
+
372
+ c) Prohibiting misrepresentation of the origin of that material, or
373
+ requiring that modified versions of such material be marked in
374
+ reasonable ways as different from the original version; or
375
+
376
+ d) Limiting the use for publicity purposes of names of licensors or
377
+ authors of the material; or
378
+
379
+ e) Declining to grant rights under trademark law for use of some
380
+ trade names, trademarks, or service marks; or
381
+
382
+ f) Requiring indemnification of licensors and authors of that
383
+ material by anyone who conveys the material (or modified versions of
384
+ it) with contractual assumptions of liability to the recipient, for
385
+ any liability that these contractual assumptions directly impose on
386
+ those licensors and authors.
387
+
388
+ All other non-permissive additional terms are considered "further
389
+ restrictions" within the meaning of section 10. If the Program as you
390
+ received it, or any part of it, contains a notice stating that it is
391
+ governed by this License along with a term that is a further
392
+ restriction, you may remove that term. If a license document contains
393
+ a further restriction but permits relicensing or conveying under this
394
+ License, you may add to a covered work material governed by the terms
395
+ of that license document, provided that the further restriction does
396
+ not survive such relicensing or conveying.
397
+
398
+ If you add terms to a covered work in accord with this section, you
399
+ must place, in the relevant source files, a statement of the
400
+ additional terms that apply to those files, or a notice indicating
401
+ where to find the applicable terms.
402
+
403
+ Additional terms, permissive or non-permissive, may be stated in the
404
+ form of a separately written license, or stated as exceptions;
405
+ the above requirements apply either way.
406
+
407
+ 8. Termination.
408
+
409
+ You may not propagate or modify a covered work except as expressly
410
+ provided under this License. Any attempt otherwise to propagate or
411
+ modify it is void, and will automatically terminate your rights under
412
+ this License (including any patent licenses granted under the third
413
+ paragraph of section 11).
414
+
415
+ However, if you cease all violation of this License, then your
416
+ license from a particular copyright holder is reinstated (a)
417
+ provisionally, unless and until the copyright holder explicitly and
418
+ finally terminates your license, and (b) permanently, if the copyright
419
+ holder fails to notify you of the violation by some reasonable means
420
+ prior to 60 days after the cessation.
421
+
422
+ Moreover, your license from a particular copyright holder is
423
+ reinstated permanently if the copyright holder notifies you of the
424
+ violation by some reasonable means, this is the first time you have
425
+ received notice of violation of this License (for any work) from that
426
+ copyright holder, and you cure the violation prior to 30 days after
427
+ your receipt of the notice.
428
+
429
+ Termination of your rights under this section does not terminate the
430
+ licenses of parties who have received copies or rights from you under
431
+ this License. If your rights have been terminated and not permanently
432
+ reinstated, you do not qualify to receive new licenses for the same
433
+ material under section 10.
434
+
435
+ 9. Acceptance Not Required for Having Copies.
436
+
437
+ You are not required to accept this License in order to receive or
438
+ run a copy of the Program. Ancillary propagation of a covered work
439
+ occurring solely as a consequence of using peer-to-peer transmission
440
+ to receive a copy likewise does not require acceptance. However,
441
+ nothing other than this License grants you permission to propagate or
442
+ modify any covered work. These actions infringe copyright if you do
443
+ not accept this License. Therefore, by modifying or propagating a
444
+ covered work, you indicate your acceptance of this License to do so.
445
+
446
+ 10. Automatic Licensing of Downstream Recipients.
447
+
448
+ Each time you convey a covered work, the recipient automatically
449
+ receives a license from the original licensors, to run, modify and
450
+ propagate that work, subject to this License. You are not responsible
451
+ for enforcing compliance by third parties with this License.
452
+
453
+ An "entity transaction" is a transaction transferring control of an
454
+ organization, or substantially all assets of one, or subdividing an
455
+ organization, or merging organizations. If propagation of a covered
456
+ work results from an entity transaction, each party to that
457
+ transaction who receives a copy of the work also receives whatever
458
+ licenses to the work the party's predecessor in interest had or could
459
+ give under the previous paragraph, plus a right to possession of the
460
+ Corresponding Source of the work from the predecessor in interest, if
461
+ the predecessor has it or can get it with reasonable efforts.
462
+
463
+ You may not impose any further restrictions on the exercise of the
464
+ rights granted or affirmed under this License. For example, you may
465
+ not impose a license fee, royalty, or other charge for exercise of
466
+ rights granted under this License, and you may not initiate litigation
467
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
468
+ any patent claim is infringed by making, using, selling, offering for
469
+ sale, or importing the Program or any portion of it.
470
+
471
+ 11. Patents.
472
+
473
+ A "contributor" is a copyright holder who authorizes use under this
474
+ License of the Program or a work on which the Program is based. The
475
+ work thus licensed is called the contributor's "contributor version".
476
+
477
+ A contributor's "essential patent claims" are all patent claims
478
+ owned or controlled by the contributor, whether already acquired or
479
+ hereafter acquired, that would be infringed by some manner, permitted
480
+ by this License, of making, using, or selling its contributor version,
481
+ but do not include claims that would be infringed only as a
482
+ consequence of further modification of the contributor version. For
483
+ purposes of this definition, "control" includes the right to grant
484
+ patent sublicenses in a manner consistent with the requirements of
485
+ this License.
486
+
487
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
488
+ patent license under the contributor's essential patent claims, to
489
+ make, use, sell, offer for sale, import and otherwise run, modify and
490
+ propagate the contents of its contributor version.
491
+
492
+ In the following three paragraphs, a "patent license" is any express
493
+ agreement or commitment, however denominated, not to enforce a patent
494
+ (such as an express permission to practice a patent or covenant not to
495
+ sue for patent infringement). To "grant" such a patent license to a
496
+ party means to make such an agreement or commitment not to enforce a
497
+ patent against the party.
498
+
499
+ If you convey a covered work, knowingly relying on a patent license,
500
+ and the Corresponding Source of the work is not available for anyone
501
+ to copy, free of charge and under the terms of this License, through a
502
+ publicly available network server or other readily accessible means,
503
+ then you must either (1) cause the Corresponding Source to be so
504
+ available, or (2) arrange to deprive yourself of the benefit of the
505
+ patent license for this particular work, or (3) arrange, in a manner
506
+ consistent with the requirements of this License, to extend the patent
507
+ license to downstream recipients. "Knowingly relying" means you have
508
+ actual knowledge that, but for the patent license, your conveying the
509
+ covered work in a country, or your recipient's use of the covered work
510
+ in a country, would infringe one or more identifiable patents in that
511
+ country that you have reason to believe are valid.
512
+
513
+ If, pursuant to or in connection with a single transaction or
514
+ arrangement, you convey, or propagate by procuring conveyance of, a
515
+ covered work, and grant a patent license to some of the parties
516
+ receiving the covered work authorizing them to use, propagate, modify
517
+ or convey a specific copy of the covered work, then the patent license
518
+ you grant is automatically extended to all recipients of the covered
519
+ work and works based on it.
520
+
521
+ A patent license is "discriminatory" if it does not include within
522
+ the scope of its coverage, prohibits the exercise of, or is
523
+ conditioned on the non-exercise of one or more of the rights that are
524
+ specifically granted under this License. You may not convey a covered
525
+ work if you are a party to an arrangement with a third party that is
526
+ in the business of distributing software, under which you make payment
527
+ to the third party based on the extent of your activity of conveying
528
+ the work, and under which the third party grants, to any of the
529
+ parties who would receive the covered work from you, a discriminatory
530
+ patent license (a) in connection with copies of the covered work
531
+ conveyed by you (or copies made from those copies), or (b) primarily
532
+ for and in connection with specific products or compilations that
533
+ contain the covered work, unless you entered into that arrangement,
534
+ or that patent license was granted, prior to 28 March 2007.
535
+
536
+ Nothing in this License shall be construed as excluding or limiting
537
+ any implied license or other defenses to infringement that may
538
+ otherwise be available to you under applicable patent law.
539
+
540
+ 12. No Surrender of Others' Freedom.
541
+
542
+ If conditions are imposed on you (whether by court order, agreement or
543
+ otherwise) that contradict the conditions of this License, they do not
544
+ excuse you from the conditions of this License. If you cannot convey a
545
+ covered work so as to satisfy simultaneously your obligations under this
546
+ License and any other pertinent obligations, then as a consequence you may
547
+ not convey it at all. For example, if you agree to terms that obligate you
548
+ to collect a royalty for further conveying from those to whom you convey
549
+ the Program, the only way you could satisfy both those terms and this
550
+ License would be to refrain entirely from conveying the Program.
551
+
552
+ 13. Use with the GNU Affero General Public License.
553
+
554
+ Notwithstanding any other provision of this License, you have
555
+ permission to link or combine any covered work with a work licensed
556
+ under version 3 of the GNU Affero General Public License into a single
557
+ combined work, and to convey the resulting work. The terms of this
558
+ License will continue to apply to the part which is the covered work,
559
+ but the special requirements of the GNU Affero General Public License,
560
+ section 13, concerning interaction through a network will apply to the
561
+ combination as such.
562
+
563
+ 14. Revised Versions of this License.
564
+
565
+ The Free Software Foundation may publish revised and/or new versions of
566
+ the GNU General Public License from time to time. Such new versions will
567
+ be similar in spirit to the present version, but may differ in detail to
568
+ address new problems or concerns.
569
+
570
+ Each version is given a distinguishing version number. If the
571
+ Program specifies that a certain numbered version of the GNU General
572
+ Public License "or any later version" applies to it, you have the
573
+ option of following the terms and conditions either of that numbered
574
+ version or of any later version published by the Free Software
575
+ Foundation. If the Program does not specify a version number of the
576
+ GNU General Public License, you may choose any version ever published
577
+ by the Free Software Foundation.
578
+
579
+ If the Program specifies that a proxy can decide which future
580
+ versions of the GNU General Public License can be used, that proxy's
581
+ public statement of acceptance of a version permanently authorizes you
582
+ to choose that version for the Program.
583
+
584
+ Later license versions may give you additional or different
585
+ permissions. However, no additional obligations are imposed on any
586
+ author or copyright holder as a result of your choosing to follow a
587
+ later version.
588
+
589
+ 15. Disclaimer of Warranty.
590
+
591
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599
+
600
+ 16. Limitation of Liability.
601
+
602
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610
+ SUCH DAMAGES.
611
+
612
+ 17. Interpretation of Sections 15 and 16.
613
+
614
+ If the disclaimer of warranty and limitation of liability provided
615
+ above cannot be given local legal effect according to their terms,
616
+ reviewing courts shall apply local law that most closely approximates
617
+ an absolute waiver of all civil liability in connection with the
618
+ Program, unless a warranty or assumption of liability accompanies a
619
+ copy of the Program in return for a fee.
620
+
621
+ END OF TERMS AND CONDITIONS
622
+
623
+ How to Apply These Terms to Your New Programs
624
+
625
+ If you develop a new program, and you want it to be of the greatest
626
+ possible use to the public, the best way to achieve this is to make it
627
+ free software which everyone can redistribute and change under these terms.
628
+
629
+ To do so, attach the following notices to the program. It is safest
630
+ to attach them to the start of each source file to most effectively
631
+ state the exclusion of warranty; and each file should have at least
632
+ the "copyright" line and a pointer to where the full notice is found.
633
+
634
+ <one line to give the program's name and a brief idea of what it does.>
635
+ Copyright (C) <year> <name of author>
636
+
637
+ This program is free software: you can redistribute it and/or modify
638
+ it under the terms of the GNU General Public License as published by
639
+ the Free Software Foundation, either version 3 of the License, or
640
+ (at your option) any later version.
641
+
642
+ This program is distributed in the hope that it will be useful,
643
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
644
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645
+ GNU General Public License for more details.
646
+
647
+ You should have received a copy of the GNU General Public License
648
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
649
+
650
+ Also add information on how to contact you by electronic and paper mail.
651
+
652
+ If the program does terminal interaction, make it output a short
653
+ notice like this when it starts in an interactive mode:
654
+
655
+ <program> Copyright (C) <year> <name of author>
656
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657
+ This is free software, and you are welcome to redistribute it
658
+ under certain conditions; type `show c' for details.
659
+
660
+ The hypothetical commands `show w' and `show c' should show the appropriate
661
+ parts of the General Public License. Of course, your program's commands
662
+ might be different; for a GUI interface, you would use an "about box".
663
+
664
+ You should also get your employer (if you work as a programmer) or school,
665
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
666
+ For more information on this, and how to apply and follow the GNU GPL, see
667
+ <https://www.gnu.org/licenses/>.
668
+
669
+ The GNU General Public License does not permit incorporating your program
670
+ into proprietary programs. If your program is a subroutine library, you
671
+ may consider it more useful to permit linking proprietary applications with
672
+ the library. If this is what you want to do, use the GNU Lesser General
673
+ Public License instead of this License. But first, please read
674
+ <https://www.gnu.org/licenses/why-not-lgpl.html>.
README.md CHANGED
@@ -1,10 +1,241 @@
1
  ---
2
- title: Chatterbox Api
3
- emoji: 🌖
4
- colorFrom: purple
5
- colorTo: red
6
  sdk: docker
7
- pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Chatterbox TTS API
 
 
 
3
  sdk: docker
4
+ app_port: 7860
5
  ---
6
 
7
+ # Chatterbox TTS API 服务
8
+
9
+ 这是一个基于 [Chatterbox-TTS](https://github.com/resemble-ai/chatterbox) 的高性能文字转语音(TTS)服务。它提供了一个与 OpenAI TTS 兼容的 API 接口、一个支持声音克隆的增强接口,以及一个简洁的 Web 用户界面。
10
+
11
+ 本项目旨在为开发者和内容创作者提供一个私有化部署、功能强大且易于集成的 TTS 解决方案。
12
+
13
+
14
+ ![](https://pvtr2.pyvideotrans.com/1751778208772_image.png)
15
+
16
+
17
+ ## ✨ 功能特性
18
+
19
+ - **支持 23种语言**
20
+ - **两种 API 接口**:
21
+ 1. **OpenAI 兼容接口**: `/v1/audio/speech`,可无缝对接到任何支持 OpenAI SDK 的现有工作流中。
22
+ 2. **声音克隆接口**: `/v2/audio/speech_with_prompt`,通过上传一小段参考音频,即可生成具有相同音色的语音。
23
+ - **Web 用户界面**: 提供一个直观的前端页面,用于快速测试和使用 TTS 功能,无需编写任何代码。
24
+ - **灵活的输出格式**: 支持生成 `.mp3` 和 `.wav` 格式的音频。
25
+ - **跨平台支持**: 提供在 Windows, macOS 和 Linux 上的详细安装指南。
26
+ - **一键式 Windows 部署**: 为 Windows 用户提供了包含所有依赖和启动脚本的压缩包,实现开箱即用。
27
+ - **GPU 加速**: 支持 NVIDIA GPU(CUDA),并为 Windows 用户提供了一键升级脚本。
28
+ - **无缝集成**: 可作为后端服务,与 [pyVideoTrans](https://github.com/jianchang512/pyvideotrans) 等工具轻松集成。
29
+
30
+ ---
31
+
32
+ ## 🚀 快速开始
33
+
34
+ ### 方式一:Windows 用户(推荐,一键启动)
35
+
36
+ 我们为 Windows 用户准备了包含所有依赖的便携包 `win.7z`,大大简化了安装过程。
37
+
38
+ 1. **下载并解压**:
39
+
40
+ - 百度网盘下载地址【内置模型共5.9G】: https://pan.baidu.com/s/12Le6rhOQnBL-sNd0uGZ91A?pwd=t55t
41
+ > 压缩包内含模型和所需环境文件,体积较大,分2个压缩卷下载,下载后全部选中解压即可,解压路径中最好不要包含中文
42
+
43
+ - 百度网盘下载地址【不含模型460MB启动后自动下载,需科学上网】: https://pan.baidu.com/s/1g3w1jFHxe_IjqgasoiVB6w?pwd=nf8v
44
+
45
+ - GitHub 下载地址【不含模型460MB启动后自动下载,需科学上网】:https://github.com/jianchang512/chatterbox-api/releases/download/0.2/1020-chatterbox-win-NO-Models.7z
46
+
47
+ 2. **启动服务**:
48
+ - 双击运行根目录下的 **`启动服务.bat`** 脚本。
49
+
50
+ 当您在命令行窗口看到类似以下信息时,表示服务已成功启动:
51
+
52
+ ![](https://pvtr2.pyvideotrans.com/1751778142538_image.png)
53
+
54
+ ```
55
+ ✅ 模型加载完成.
56
+ 服务启动完成,http地址是: http://127.0.0.1:5093
57
+ ```
58
+
59
+ ### 方式二:macOS, Linux 和手动安装用户
60
+
61
+ 对于 macOS, Linux 用户,或者希望手动设置环境的 Windows 用户,请按照以下步骤操作。
62
+
63
+ > ### 纯CPU运行注意
64
+ >
65
+ > 在 Windows和Linux上,如果想CPU运行,需要修改源码,虚拟环境下/Lib/site-packages/chatterbox/mtl_tts.py`
66
+ >
67
+ > 搜索`torch.load(ckpt_dir / "ve.pt",` 改为 `torch.load(ckpt_dir / "ve.pt", map_location=device,`
68
+ >
69
+ > 搜索 `torch.load(ckpt_dir / "s3gen.pt",` 改为 `torch.load(ckpt_dir / "s3gen.pt", map_location=device,`
70
+
71
+ #### 1. 前置依赖
72
+
73
+ - **Python**: 确保已安装 Python 3.10 或更高版本。
74
+ - **ffmpeg**: 这是一个必需的音视频处理工具。
75
+ - **macOS (使用 Homebrew)**: `brew install ffmpeg`
76
+ - **Debian/Ubuntu**: `sudo apt-get update && sudo apt-get install ffmpeg`
77
+ - **Windows (手动)**: 下载 [ffmpeg](https://ffmpeg.org/download.html) ,并将其添加到系统环境变量 `PATH` 中。
78
+
79
+ #### 2. 安装步骤
80
+
81
+ ```bash
82
+ # 1. 克隆项目仓库
83
+ git clone https://github.com/jianchang512/chatterbox-api.git
84
+ cd chatterbox-api
85
+
86
+ # 2. 创建并激活 Python 虚拟环境 (推荐)
87
+ python3 -m venv venv
88
+ # on Windows:
89
+ # venv\Scripts\activate
90
+ # on macOS/Linux:
91
+ source venv/bin/activate
92
+
93
+ # 3. 安装依赖
94
+ pip install -r requirements.txt
95
+
96
+ # 4. 启动服务
97
+ python app.py
98
+ ```
99
+
100
+ 当服务成功启动后,您将在终端看到服务地址 `http://127.0.0.1:5093`。
101
+
102
+ ---
103
+
104
+ ## ☁️ 部署到 Hugging Face Spaces(免费 CPU,API 对外)
105
+
106
+ 1. 新建 Space,选择 `Docker`。
107
+ 2. 直接上传本仓库代码(`chatterbox-api` 目录内容)到 Space。
108
+ 3. Space 构建完成后即可访问:
109
+ - Web UI: `https://<your-space>.hf.space/`
110
+ - OpenAI 兼容 API: `https://<your-space>.hf.space/v1/audio/speech`
111
+ - 声音克隆 API: `https://<your-space>.hf.space/v2/audio/speech_with_prompt`
112
+
113
+ ## ⚡ 升级到 GPU 版本 (可选)
114
+
115
+ 如果您的电脑配备了支持 CUDA 的 NVIDIA 显卡,并已正确安装 [NVIDIA 驱动](https://www.nvidia.com/Download/index.aspx) 和 [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive),您可以升级到 GPU 版本以获得显著的性能提升。
116
+
117
+ ### Windows 用户 (一键升级)
118
+
119
+ 1. 请先确保您已经成功运行过一次 `启动服务.bat`,以完成基础环境的安装。
120
+ 2. 双击运行 **`安装N卡GPU支持.bat`** 脚本。
121
+ 3. 脚本会自动卸载 CPU 版本的 PyTorch,并安装与 CUDA 12.8 兼容的 GPU 版本。
122
+
123
+ ### Linux 手动升级
124
+
125
+ 在激活虚拟环境后,执行以下命令:
126
+
127
+ ```bash
128
+ # 1. 卸载现有的 CPU 版本 PyTorch
129
+ pip uninstall -y torch torchaudio
130
+
131
+ # 2. 安装与您的 CUDA 版本匹配的 PyTorch
132
+ # 以下命令适用于 CUDA 12.6,请根据您的 CUDA 版本从 PyTorch 官网获取正确的命令
133
+ pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126
134
+ ```
135
+ *您可以访问 [PyTorch 官网](https://pytorch.org/get-started/locally/) 来获取适合您系统的安装命令。*
136
+
137
+ 升级后,重新启动服务,您将在启动日志中看到 `Using device: cuda`。
138
+
139
+ ---
140
+
141
+ ## 📖 使用指南
142
+
143
+ ### 1. Web 界面
144
+
145
+ 服务启动后,在浏览器中打开 **`http://127.0.0.1:5093`** 即可访问 Web UI。
146
+
147
+ - **输入文本**: 在文本框中输入您想要转换的文字。
148
+ - **调整参数**:
149
+ - `cfg_weight`: (范围 0.0 - 1.0) 控制语音的节奏。值越低,语速越慢、越从容。对于快节奏的参考音频,可适当降低此值(如 0.3)。
150
+ - `exaggeration`: (范围 0.25 - 2.0) 控制语音的情感和语调夸张程度。值越高,情感越丰富,语速也可能越快。
151
+ - **声音克隆**: 点击 "选择文件" 上传一段参考音频(如 .mp3, .wav)。如果提供了参考音频,服务将使用克隆接口。
152
+ - **生成语音**: 点击 "生成语音" 按钮,稍等片刻即可在线试听和下载生成的 MP3 文件。
153
+
154
+ ### 2. API 调用
155
+
156
+ #### 接口 1: OpenAI 兼容接口 (`/v1/audio/speech`)
157
+
158
+ 此接口无需参考音频,可使用 OpenAI SDK 直接调用。
159
+
160
+ **Python 示例 (`openai` SDK):**
161
+
162
+ ```python
163
+ from openai import OpenAI
164
+ import os
165
+
166
+ # 将客户端指向我们的本地服务
167
+ client = OpenAI(
168
+ base_url="http://127.0.0.1:5093/v1",
169
+ api_key="not-needed" # API密钥不是必需的,但SDK要求提供
170
+ )
171
+
172
+ response = client.audio.speech.create(
173
+ model="chatterbox-tts", # 此参数会被忽略
174
+ voice="en", #
175
+ speed=0.5, # 对应 cfg_weight 参数
176
+ input="Hello, this is a test from the OpenAI compatible API.",
177
+ instructions="0.5" # (可选) 对应 exaggeration 参数, 注意需要是字符串
178
+ response_format="mp3" # 可选 'mp3' 或 'wav'
179
+ )
180
+
181
+ # 将音频流保存到文件
182
+ response.stream_to_file("output_api1.mp3")
183
+ print("音频已保存到 output_api1.mp3")
184
+ ```
185
+
186
+ #### 接口 2: 声音克隆接口 (`/v2/audio/speech_with_prompt`)
187
+
188
+ 此接口需要通过 `multipart/form-data` 格式同时上传文本和参考音频文件。
189
+
190
+ **Python 示例 (`requests` 库):**
191
+
192
+ ```python
193
+ import requests
194
+
195
+ API_URL = "http://127.0.0.1:5093/v2/audio/speech_with_prompt"
196
+ REFERENCE_AUDIO = "path/to/your/reference.mp3" # 替换为您的参考音频路径
197
+
198
+ form_data = {
199
+ 'input': 'This voice should sound like the reference audio.',
200
+ 'cfg_weight': '0.5',
201
+ 'exaggeration': '0.5',
202
+ 'response_format': 'mp3' # 可选 'mp3' 或 'wav'
203
+ }
204
+
205
+ with open(REFERENCE_AUDIO, 'rb') as audio_file:
206
+ files = {'audio_prompt': audio_file}
207
+ response = requests.post(API_URL, data=form_data, files=files)
208
+
209
+ if response.ok:
210
+ with open("output_api2.mp3", "wb") as f:
211
+ f.write(response.content)
212
+ print("克隆音频已保存到 output_api2.mp3")
213
+ else:
214
+ print("请求失败:", response.text)
215
+ ```
216
+
217
+ ### 3. 在 pyVideoTrans 中使用
218
+
219
+ 本项目可以作为强大的 TTS 后端,为 [pyVideoTrans](https://github.com/jianchang512/pyvideotrans) 提供高质量的英文配音。
220
+
221
+ 1. **启动本项目**: 确保 Chatterbox TTS API 服务正在本地运行 (`http://127.0.0.1:5093`)。
222
+ 2. **更新 pyVideoTrans**: 确保您的 pyVideoTrans 版本升级到 `v3.73` 或更高。
223
+ 3. **配置 pyVideoTrans**:
224
+ ![](https://pvtr2.pyvideotrans.com/1751778270190_image.png)
225
+
226
+ - 在 pyVideoTrans 菜单中,进入 `TTS设置` -> `Chatterbox TTS`。
227
+ - **API 地址**: 填写本服务的地址,默认为 `http://127.0.0.1:5093`。
228
+ - **参考音频** (可选): 如果您想使用声音克隆,请在此处填写参考音频的文件名(例如 `my_voice.wav`)。请确保该音频文件已放置在 pyVideoTrans 根目录下的 `chatterbox` 文件夹内。
229
+ - **调整参数**: 根据需要调整 `cfg_weight` 和 `exaggeration` 以获得最佳效果。
230
+
231
+ **参数调整建议**:
232
+ - **通用场景 (TTS, 语音助手)**: 默认设置 (`cfg_weight=0.5`, `exaggeration=0.5`) 适用于大多数情况。
233
+ - **快语速参考音频**: 如果参考音频的语速较快,可以尝试将 `cfg_weight` 降低到 `0.3` 左右,以改善生成语音的节奏。
234
+ - **富有表现力/戏剧性演讲**: 尝试较低的 `cfg_weight` (如 `0.3`) 和较高的 `exaggeration` (如 `0.7` 或更高)。通常提高 `exaggeration` 会加快语速,降低 `cfg_weight` 有助于平衡,使节奏更从容、更清晰。
235
+
236
+ ---
237
+
238
+
239
+ ## 致谢
240
+
241
+ [Chatterbox-TTS](https://github.com/resemble-ai/chatterbox)
app.py ADDED
@@ -0,0 +1,331 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os,time,shutil,sys
2
+ #os.environ['htts_proxy']='http://127.0.0.1:10808'
3
+ #os.environ['htt_proxy']='http://127.0.0.1:10808'
4
+ from pathlib import Path
5
+ import threading
6
+ import warnings
7
+ warnings.filterwarnings("ignore", category=FutureWarning)
8
+ #from chatterbox.tts import ChatterboxTTS
9
+
10
+ def _env_int(name: str, default: int) -> int:
11
+ val = os.environ.get(name)
12
+ if val is None or val == "":
13
+ return default
14
+ try:
15
+ return int(val)
16
+ except ValueError:
17
+ return default
18
+
19
+ host = os.environ.get("HOST", "127.0.0.1")
20
+ port = _env_int("PORT", 5093)
21
+ threads = _env_int("THREADS", 4)
22
+
23
+ ROOT_DIR=Path(os.getcwd()).as_posix()
24
+ # 对于国内用户,使用Hugging Face镜像能显著提高下载速度
25
+ os.environ['HF_HOME'] = ROOT_DIR + "/models"
26
+ os.environ['HF_HUB_DISABLE_SYMLINKS_WARNING'] = 'true'
27
+ os.environ['HF_HUB_DISABLE_PROGRESS_BARS'] = 'true'
28
+ os.environ['HF_HUB_DOWNLOAD_TIMEOUT'] = "1200"
29
+
30
+ import subprocess,traceback
31
+ import io
32
+ import uuid
33
+ import tempfile
34
+ from flask import Flask, request, jsonify, send_file, render_template, make_response
35
+ from waitress import serve
36
+ import torch
37
+ import torchaudio as ta
38
+
39
+ try:
40
+ import soundfile as sf
41
+ except ImportError:
42
+ print('No soundfile, exec cmd ` runtime\\\\python -m pip install soundfile`')
43
+ sys.exit()
44
+
45
+ try:
46
+ from pydub import AudioSegment
47
+ except ImportError:
48
+ print('No soundfile, exec cmd ` runtime\\\\python -m pip install pydub`')
49
+ sys.exit()
50
+
51
+
52
+ if sys.platform == 'win32':
53
+ os.environ['PATH'] = ROOT_DIR + f';{ROOT_DIR}/ffmpeg;{ROOT_DIR}/tools;' + os.environ['PATH']
54
+ from chatterbox.mtl_tts import ChatterboxMultilingualTTS as ChatterboxTTS
55
+
56
+ # 检查ffmpeg是否安装
57
+ def check_ffmpeg():
58
+ """检查系统中是否安装了ffmpeg"""
59
+ try:
60
+ subprocess.run(["ffmpeg", "-version"], check=True, capture_output=True)
61
+ print("FFmpeg 已安装.")
62
+ return True
63
+ except (subprocess.CalledProcessError, FileNotFoundError):
64
+ print("ERROR: 不存在ffmpeg,请先安装ffmpeg.")
65
+ sys.exit(1) # 强制退出,因为MP3转换是必须功能
66
+
67
+ # 加载Chatterbox TTS模型
68
+ def load_tts_model():
69
+ """加载TTS模型到指定设备"""
70
+ print("⏳ 开始加载模型 ChatterboxTTS model... 请耐心等待.")
71
+ try:
72
+ # 自动检测可用设备 (CUDA > CPU)
73
+ device = "cuda" if torch.cuda.is_available() else "cpu"
74
+ print(f"Using device: {device}")
75
+
76
+ # 从预训练模型加载
77
+ # CPU 环境下,某些 checkpoint 可能会带 CUDA tensors;强制 map_location 防止反序列化失败。
78
+ if device == "cpu":
79
+ original_torch_load = torch.load
80
+
81
+ def _cpu_safe_load(*args, **kwargs):
82
+ kwargs.setdefault("map_location", "cpu")
83
+ return original_torch_load(*args, **kwargs)
84
+
85
+ torch.load = _cpu_safe_load
86
+ try:
87
+ tts_model = ChatterboxTTS.from_pretrained(device=device)
88
+ finally:
89
+ torch.load = original_torch_load
90
+ else:
91
+ tts_model = ChatterboxTTS.from_pretrained(device=device)
92
+ print("模型加载完成.")
93
+ return tts_model
94
+ except Exception as e:
95
+ print(f"FATAL: 模型加载失败: {e}")
96
+ sys.exit(1)
97
+
98
+ # --- 全局变量初始化 ---
99
+ check_ffmpeg()
100
+ model = None
101
+ model_lock = threading.Lock()
102
+ app = Flask(__name__)
103
+
104
+ def get_model():
105
+ global model
106
+ if model is not None:
107
+ return model
108
+ with model_lock:
109
+ if model is None:
110
+ model = load_tts_model()
111
+ return model
112
+
113
+
114
+
115
+ def convert_to_wav(input_path, output_path, sample_rate=16000):
116
+ """
117
+ Converts any audio file to a standardized WAV format using ffmpeg.
118
+ - 16-bit PCM
119
+ - Specified sample rate (default 16kHz, common for TTS)
120
+ - Mono channel
121
+ """
122
+ print(f" - Converting '{input_path}' to WAV at {sample_rate}Hz...")
123
+ command = [
124
+ 'ffmpeg',
125
+ '-i', input_path, # Input file
126
+ '-y', # Overwrite output file if it exists
127
+ '-acodec', 'pcm_s16le',# Use 16-bit PCM encoding
128
+ '-ar', str(sample_rate),# Set audio sample rate
129
+ '-ac', '1', # Set to 1 audio channel (mono)
130
+ output_path # Output file
131
+ ]
132
+ try:
133
+ process = subprocess.run(
134
+ command,
135
+ check=True, # Raise an exception if ffmpeg fails
136
+ capture_output=True, # Capture stdout and stderr
137
+ text=True, # Decode stdout/stderr as text
138
+ encoding='utf-8', # 明确指定使用 UTF-8 解码
139
+ errors='replace' # 如果遇到解码错误,用'�'替换,而不是崩溃
140
+
141
+ )
142
+ print(f" - FFmpeg conversion successful.")
143
+ except subprocess.CalledProcessError as e:
144
+ # If ffmpeg fails, print its error output for easier debugging
145
+ print("FFmpeg conversion failed!")
146
+ print(f" - Command: {' '.join(command)}")
147
+ print(f" - Stderr: {e.stderr}")
148
+ raise e # Re-raise the exception to be caught by the main try...except block
149
+
150
+ # --- API 接口 ---
151
+
152
+ @app.route('/')
153
+ def index():
154
+ """提供前端界面"""
155
+ return render_template('index.html')
156
+
157
+ # 接口1: 兼容OpenAI TTS接口
158
+ @app.route('/v1/audio/speech', methods=['POST'])
159
+ def tts_openai_compatible():
160
+ """
161
+ OpenAI TTS兼容接口。
162
+ 接收JSON: {"input": "text", "model": "chatterbox", "voice": "default", ...}
163
+ `model`和`voice`参数会被接收但当前实现中忽略。
164
+ """
165
+ if not request.is_json:
166
+ return jsonify({"error": "Request must be JSON"}), 400
167
+
168
+ data = request.get_json()
169
+ text = data.get('input')
170
+ # voice 用来接收语言代码
171
+ lang=data.get('voice','en')
172
+ # speed用于接收 cfg_weight
173
+ cfg_weight=float(data.get('speed',0.5))
174
+ # instructions 用于接收 exaggeration
175
+ exaggeration=float(data.get('instructions',0.5))
176
+ #if lang != 'en':
177
+ # return jsonify({"error": "Only support English"}), 400
178
+
179
+
180
+ if not text:
181
+ return jsonify({"error": "Missing 'input' field in request body"}), 400
182
+
183
+ print(f"[APIv1] Received text: '{text[:50]}...'")
184
+
185
+ try:
186
+ # 生成WAV音频
187
+ tts_model = get_model()
188
+ wav_tensor = tts_model.generate(text,exaggeration=exaggeration,cfg_weight=cfg_weight,language_id=lang)
189
+
190
+ # 检查请求的响应格式,默认为mp3
191
+ response_format = data.get('response_format', 'mp3').lower()
192
+ download_name=f'{time.time()}'
193
+
194
+ # 对于其他格式(如wav),直接返回
195
+ wav_buffer = io.BytesIO()
196
+ wav_tensor = wav_tensor.detach().cpu()
197
+ if wav_tensor.ndim == 2:
198
+ wav_np = wav_tensor.transpose(0, 1).numpy()
199
+ else:
200
+ wav_np = wav_tensor.numpy()
201
+ # 写入 WAV 格式到内存
202
+ sf.write(wav_buffer, wav_np, tts_model.sr, format='wav')
203
+ wav_buffer.seek(0)
204
+ if response_format=='mp3':
205
+ mp3_buffer = io.BytesIO()
206
+ AudioSegment.from_file(wav_buffer, format="wav").export(mp3_buffer, format="mp3")
207
+ mp3_buffer.seek(0)
208
+
209
+ return send_file(
210
+ mp3_buffer,
211
+ mimetype='audio/mpeg',
212
+ as_attachment=False,
213
+ download_name=f'{download_name}.mp3'
214
+ )
215
+
216
+ return send_file(
217
+ wav_buffer,
218
+ mimetype='audio/wav',
219
+ as_attachment=False,
220
+ download_name=f'{download_name}.wav'
221
+ )
222
+
223
+ except Exception as e:
224
+ print(f"[APIv1] Error during TTS generation: {e}")
225
+ return jsonify({"error": f"An internal error occurred: {str(e)}"}), 500
226
+
227
+
228
+ # 接口2: 带参考音频的TTS
229
+ @app.route('/v2/audio/speech_with_prompt', methods=['POST'])
230
+ def tts_with_prompt():
231
+ """
232
+ 带参考音频的接口。
233
+ 接收 multipart/form-data:
234
+ - 'input': (string) 要转换的文本
235
+ - 'audio_prompt': (file) 参考音频文件
236
+ """
237
+ if 'input' not in request.form:
238
+ return jsonify({"error": "Missing 'input' field in form data"}), 400
239
+ if 'audio_prompt' not in request.files:
240
+ return jsonify({"error": "Missing 'audio_prompt' file in form data"}), 400
241
+
242
+ text = request.form['input']
243
+ audio_file = request.files['audio_prompt']
244
+ response_format = request.form.get('response_format', 'wav').lower()
245
+
246
+ cfg_weight=float(request.form.get('cfg_weight',0.5))
247
+
248
+ exaggeration=float(request.form.get('exaggeration',0.5))
249
+ lang = request.form.get('language','en')
250
+ #if lang != 'en':
251
+ # return jsonify({"error": "Only support English"}), 400
252
+
253
+ print(f"[APIv2] Received text: '{text[:50]}...' with audio prompt '{audio_file.filename}'")
254
+
255
+
256
+ temp_upload_path = None
257
+ temp_wav_path = None
258
+ try:
259
+ # --- Stage 1 & 2: Save and Convert uploaded file ---
260
+ temp_dir = tempfile.gettempdir()
261
+ upload_suffix = os.path.splitext(audio_file.filename)[1]
262
+ temp_upload_path = os.path.join(temp_dir, f"{uuid.uuid4()}{upload_suffix}")
263
+ audio_file.save(temp_upload_path)
264
+ print(f" - Uploaded audio saved to: {temp_upload_path}")
265
+
266
+ temp_wav_path = os.path.join(temp_dir, f"{uuid.uuid4()}.wav")
267
+ convert_to_wav(temp_upload_path, temp_wav_path)
268
+
269
+ # --- Stage 3: Generate TTS using the converted WAV file ---
270
+ print(f" - Generating TTS with prompt: {temp_wav_path}")
271
+ tts_model = get_model()
272
+ wav_tensor = tts_model.generate(text, audio_prompt_path=temp_wav_path,exaggeration=exaggeration,cfg_weight=cfg_weight,language_id=lang)
273
+
274
+ # --- Stage 4: Format and Return Response Based on Request ---
275
+ download_name=f'{time.time()}'
276
+
277
+ print(" - Formatting response as WAV.")
278
+ wav_buffer = io.BytesIO()
279
+ wav_tensor = wav_tensor.detach().cpu()
280
+ if wav_tensor.ndim == 2:
281
+ wav_np = wav_tensor.transpose(0, 1).numpy()
282
+ else:
283
+ wav_np = wav_tensor.numpy()
284
+ # 写入 WAV 格式到内存
285
+ sf.write(wav_buffer, wav_np, tts_model.sr, format='wav')
286
+ wav_buffer.seek(0)
287
+ if response_format == 'mp3':
288
+ mp3_buffer = io.BytesIO()
289
+ AudioSegment.from_file(wav_buffer, format="wav").export(mp3_buffer, format="mp3")
290
+ mp3_buffer.seek(0)
291
+ return send_file(
292
+ mp3_buffer,
293
+ mimetype='audio/mpeg',
294
+ as_attachment=False,
295
+ download_name=f'{download_name}.mp3'
296
+ )
297
+
298
+ return send_file(
299
+ wav_buffer,
300
+ mimetype='audio/wav',
301
+ as_attachment=False,
302
+ download_name=f'{download_name}.wav'
303
+ )
304
+
305
+ except Exception as e:
306
+ print(f"[APIv2] An error occurred: {e}")
307
+ traceback.print_exc()
308
+ return jsonify({"error": f"An internal error occurred: {str(e)}"}), 500
309
+
310
+ finally:
311
+ # --- Stage 5: Cleanup ---
312
+ if temp_upload_path and os.path.exists(temp_upload_path):
313
+ try:
314
+ os.remove(temp_upload_path)
315
+ print(f" - Cleaned up upload file: {temp_upload_path}")
316
+ except OSError as e:
317
+ print(f" - Error cleaning up upload file {temp_upload_path}: {e}")
318
+
319
+ if temp_wav_path and os.path.exists(temp_wav_path):
320
+ try:
321
+ os.remove(temp_wav_path)
322
+ print(f" - Cleaned up WAV file: {temp_wav_path}")
323
+ except OSError as e:
324
+ print(f" - Error cleaning up WAV file {temp_wav_path}: {e}")
325
+
326
+
327
+ # --- 服务启动 ---
328
+ if __name__ == '__main__':
329
+
330
+ print(f"\n服务启动完成,http地址是: http://{host}:{port} \n")
331
+ serve(app, host=host, port=port, threads=threads)
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ --extra-index-url https://download.pytorch.org/whl/cpu
2
+ torch==2.6.0+cpu
3
+ torchaudio==2.6.0+cpu
4
+
5
+ numpy==1.24.0
6
+ chatterbox-tts
7
+ flask>=3.1.2
8
+ waitress>=3.0.2
9
+ pydub>=0.25.1
10
+ soundfile>=0.12.1
templates/index.html ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- templates/index.html -->
2
+ <!DOCTYPE html>
3
+ <html lang="zh-CN">
4
+ <head>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>Chatterbox TTS 服务</title>
8
+ <style>
9
+ body {
10
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
11
+ background-color: #f4f7f9;
12
+ color: #333;
13
+ display: flex;
14
+ justify-content: center;
15
+ align-items: center;
16
+ min-height: 100vh;
17
+ margin: 0;
18
+ padding: 20px;
19
+ box-sizing: border-box;
20
+ }
21
+ .container {
22
+ width: 100%;
23
+ max-width: 1200px;
24
+ background: #fff;
25
+ padding: 30px;
26
+ border-radius: 12px;
27
+ box-shadow: 0 8px 30px rgba(0,0,0,0.1);
28
+ }
29
+ h1 {
30
+ text-align: center;
31
+ color: #1a237e;
32
+ margin-bottom: 25px;
33
+ }
34
+ .form-group {
35
+ margin-bottom: 20px;
36
+ }
37
+ label {
38
+ display: block;
39
+ font-weight: 600;
40
+ color: #555;
41
+ }
42
+ textarea {
43
+ width: 100%;
44
+ padding: 12px;
45
+ border-radius: 8px;
46
+ border: 1px solid #ccc;
47
+ font-size: 16px;
48
+ min-height: 120px;
49
+ resize: vertical;
50
+ box-sizing: border-box;
51
+ transition: border-color 0.3s;
52
+ }
53
+ textarea:focus {
54
+ border-color: #3f51b5;
55
+ outline: none;
56
+ }
57
+ .file-input-wrapper {
58
+ position: relative;
59
+ overflow: hidden;
60
+ display: inline-block;
61
+ cursor: pointer;
62
+ padding: 10px 15px;
63
+ border: 1px dashed #ccc;
64
+ border-radius: 8px;
65
+ background-color: #f9f9f9;
66
+ }
67
+ .file-input-wrapper:hover {
68
+ border-color: #3f51b5;
69
+ }
70
+ input[type="file"] {
71
+ position: absolute;
72
+ left: 0;
73
+ top: 0;
74
+ opacity: 0;
75
+ width: 100%;
76
+ height: 100%;
77
+ cursor: pointer;
78
+ }
79
+ input[type="number"]{
80
+ line-height:25px;
81
+ margin-right:10px;
82
+ margin-left:5px;
83
+ }
84
+ .d-flex{
85
+ display:flex;
86
+ align-items:center
87
+
88
+ }
89
+
90
+ #file-name {
91
+ margin-left: 10px;
92
+ font-style: italic;
93
+ color: #777;
94
+ }
95
+ .submit-btn {
96
+ width: 100%;
97
+ padding: 15px;
98
+ font-size: 18px;
99
+ font-weight: bold;
100
+ color: #fff;
101
+ background-color: #3f51b5;
102
+ border: none;
103
+ border-radius: 8px;
104
+ cursor: pointer;
105
+ transition: background-color 0.3s, transform 0.1s;
106
+ }
107
+ .submit-btn:hover:not(:disabled) {
108
+ background-color: #303f9f;
109
+ }
110
+ .submit-btn:active:not(:disabled) {
111
+ transform: scale(0.98);
112
+ }
113
+ .submit-btn:disabled {
114
+ background-color: #9fa8da;
115
+ cursor: not-allowed;
116
+ }
117
+ .result-container {
118
+ margin-top: 30px;
119
+ display: none; /* Initially hidden */
120
+ }
121
+ audio {
122
+ width: 100%;
123
+ margin-bottom: 15px;
124
+ }
125
+ .download-link {
126
+ display: block;
127
+ text-align: center;
128
+ padding: 10px;
129
+ background: #e8eaf6;
130
+ color: #3f51b5;
131
+ border-radius: 8px;
132
+ text-decoration: none;
133
+ font-weight: 600;
134
+ }
135
+ .download-link:hover {
136
+ background: #c5cae9;
137
+ }
138
+ .status {
139
+ text-align: center;
140
+ margin-top: 20px;
141
+ font-weight: 500;
142
+ height: 24px; /* Reserve space to prevent layout shift */
143
+ }
144
+ .status.loading::after {
145
+ content: '...';
146
+ display: inline-block;
147
+ animation: BouncingDots 1.4s infinite ease-in-out both;
148
+ }
149
+ @keyframes BouncingDots {
150
+ 0%, 80%, 100% { transform: scale(0); }
151
+ 40% { transform: scale(1.0); }
152
+ }
153
+ #tips{
154
+ display:block;
155
+ width:30%;
156
+ max-width:600px;
157
+ min-width:200px;
158
+ text-align:center;
159
+ color:#777;
160
+ margin:15px auto;
161
+ }
162
+ #params-tip{
163
+ font-size:12px;color:#999;
164
+ margin:5px 10px 0;
165
+ }
166
+ #language{padding:5px;}
167
+ </style>
168
+ </head>
169
+ <body>
170
+
171
+ <div class="container">
172
+ <h1>Chatterbox TTS 服务</h1>
173
+ <form id="tts-form">
174
+ <div class="form-group">
175
+ <label for="text-input">输入文本</label>
176
+ <textarea id="text-input" placeholder="在此输入您想转换的文字..." required>你好啊,亲爱的朋友,祝你早日发财.</textarea>
177
+ </div>
178
+ <div class="d-flex">
179
+
180
+ <label for="language">语言</label>
181
+ <select id="language" class="">
182
+ <option value="zh">��文</option>
183
+ <option value="en">英语</option>
184
+ <option value="ar">阿拉伯语</option>
185
+ <option value="da">丹麦语</option>
186
+ <option value="de">德语</option>
187
+ <option value="el">希腊语</option>
188
+ <option value="es">西班牙语</option>
189
+ <option value="fi">芬兰语</option>
190
+ <option value="fr">法语</option>
191
+ <option value="he">希伯来语</option>
192
+ <option value="hi">印地语</option>
193
+ <option value="it">意大利语</option>
194
+ <option value="ja">日语</option>
195
+ <option value="ko">韩语</option>
196
+ <option value="ms">马来语</option>
197
+ <option value="nl">荷兰语</option>
198
+ <option value="no">挪威语</option>
199
+ <option value="pl">波兰语</option>
200
+ <option value="pt">葡萄牙语</option>
201
+ <option value="ru">俄语</option>
202
+ <option value="sv">瑞典语</option>
203
+ <option value="sw">斯瓦西里语</option>
204
+ <option value="tr">土耳其语</option>
205
+
206
+
207
+ </select>
208
+ <label for="cfg_weight">cfg_weight</label>
209
+ <input type="number" id="cfg_weight" value="0.5" min="0.0" max="1.0" step='0.05'>
210
+ <label for="exaggeration">exaggeration</label>
211
+ <input type="number" id="exaggeration" value="0.5" min="0.25" max="2.0" step='0.05'>
212
+ </div>
213
+ <div id="params-tip">
214
+ <strong>cfg_weight: (范围 0.0 - 1.0)</strong> 控制语音的节奏。值越低,语速越慢、越从容。
215
+ <strong>exaggeration: (范围 0.25 - 2.0)</strong> 控制语音的情感和语调夸张程度。值越高,情感越丰富。
216
+ </div>
217
+
218
+ <div class="form-group d-flex">
219
+ <label for="audio-prompt">参考音频 (可选, 用于声音克隆) </label>
220
+ <div class="file-input-wrapper">
221
+ <span>选择文件</span>
222
+ <input type="file" id="audio-prompt" accept="audio/*">
223
+ </div>
224
+ <span id="file-name"></span>
225
+ </div>
226
+ <button type="submit" id="generate-btn" class="submit-btn">生成语音</button>
227
+ </form>
228
+ <div id="status" class="status"></div>
229
+ <div id="result-container" class="result-container">
230
+ <audio id="audio-player" controls></audio>
231
+ <a id="download-link" href="#" download="synthesis.mp3" class="download-link">下载 MP3</a>
232
+ </div>
233
+
234
+
235
+ <a href="https://github.com/jianchang512/chatterbox-api" target="_blank" id="tips">GitHub:jianchang512/chatterbox-api</a>
236
+ </div>
237
+
238
+ <script>
239
+ document.addEventListener('DOMContentLoaded', () => {
240
+ const ttsForm = document.getElementById('tts-form');
241
+ const textInput = document.getElementById('text-input');
242
+ const audioPromptInput = document.getElementById('audio-prompt');
243
+ const fileNameSpan = document.getElementById('file-name');
244
+ const generateBtn = document.getElementById('generate-btn');
245
+ const statusDiv = document.getElementById('status');
246
+ const resultContainer = document.getElementById('result-container');
247
+ const audioPlayer = document.getElementById('audio-player');
248
+ const downloadLink = document.getElementById('download-link');
249
+
250
+ // 更新显示的文件名
251
+ audioPromptInput.addEventListener('change', () => {
252
+ if (audioPromptInput.files.length > 0) {
253
+ fileNameSpan.textContent = audioPromptInput.files[0].name;
254
+ } else {
255
+ fileNameSpan.textContent = '未选择文件';
256
+ }
257
+ });
258
+
259
+ ttsForm.addEventListener('submit', async (event) => {
260
+ event.preventDefault(); // 阻止表单默认提交
261
+
262
+ const text = textInput.value.trim();
263
+ if (!text) {
264
+ alert('请输入要转换的文本!');
265
+ return;
266
+ }
267
+
268
+ // 禁用按钮并显示加载状态
269
+ generateBtn.disabled = true;
270
+ generateBtn.textContent = '生成中...';
271
+ statusDiv.textContent = '正在请求服务器,请稍候';
272
+ statusDiv.classList.add('loading');
273
+ resultContainer.style.display = 'none';
274
+ let cfg_weight=document.getElementById('cfg_weight').value
275
+ let language=document.getElementById('language').value
276
+ let exaggeration= String(document.getElementById('exaggeration').value)
277
+
278
+ const audioFile = audioPromptInput.files[0];
279
+
280
+ try {
281
+ let response;
282
+ if (audioFile) {
283
+ // 使用接口2:带参考音频
284
+ const formData = new FormData();
285
+ formData.append('input', text);
286
+ formData.append('response_format', 'mp3');
287
+ formData.append('exaggeration', exaggeration);
288
+ formData.append('cfg_weight', cfg_weight);
289
+ formData.append('audio_prompt', audioFile);
290
+
291
+ response = await fetch('/v2/audio/speech_with_prompt', {
292
+ method: 'POST',
293
+ body: formData,
294
+ });
295
+ } else {
296
+ // 使用接口1:兼容OpenAI
297
+ const payload = {
298
+ input: text,
299
+ speed:cfg_weight,
300
+ voice:language,
301
+ instructions:exaggeration,
302
+ model: 'chatterbox-tts', // 兼容参数
303
+ response_format: 'mp3' // 请求mp3格式
304
+ };
305
+ response = await fetch('/v1/audio/speech', {
306
+ method: 'POST',
307
+ headers: { 'Content-Type': 'application/json' },
308
+ body: JSON.stringify(payload),
309
+ });
310
+ }
311
+
312
+ if (!response.ok) {
313
+ const errorData = await response.json().catch(() => ({error: '无法解析的服务器错误'}));
314
+ throw new Error(`服务器错误: ${response.status} - ${errorData.error || '未知错误'}`);
315
+ }
316
+
317
+ // 处理成功的音频流
318
+ const blob = await response.blob();
319
+ const audioUrl = URL.createObjectURL(blob);
320
+
321
+ audioPlayer.src = audioUrl;
322
+ downloadLink.href = audioUrl;
323
+
324
+ resultContainer.style.display = 'block';
325
+ statusDiv.textContent = '🎉 生成成功!';
326
+ statusDiv.style.color = 'green';
327
+
328
+ } catch (error) {
329
+ console.error('TTS Generation Error:', error);
330
+ statusDiv.textContent = `❌ 生成失败: ${error.message}`;
331
+ statusDiv.style.color = 'red';
332
+ } finally {
333
+ // 恢复按钮状态
334
+ generateBtn.disabled = false;
335
+ generateBtn.textContent = '生成语音';
336
+ statusDiv.classList.remove('loading');
337
+ }
338
+ });
339
+ });
340
+ </script>
341
+
342
+ </body>
343
+ </html>
启动服务.bat ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ :: 将当前代码页设置为UTF-8,以正确显示中文字符。
3
+ chcp 65001 > nul
4
+
5
+ TITLE Chatterbox TTS 服务启动器
6
+
7
+ :: =======================================================
8
+ :: == Chatterbox TTS 服务启动器 ==
9
+ :: =======================================================
10
+ echo.
11
+
12
+
13
+
14
+ :: 定义虚拟环境中Python解释器的路径
15
+ rem set HF_ENDPOINT=https://hf-mirror.com
16
+ rem set https_proxy=http://127.0.0.1:10808
17
+ set "VENV_PYTHON="%~dp0runtime\python.exe""
18
+
19
+ %VENV_PYTHON% app.py
20
+
21
+ echo.
22
+ echo(服务已停止。
23
+ echo.
24
+ pause
安装N卡GPU支持.bat ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ :: 将当前代码页设置为UTF-8,以正确显示中文字符。
3
+ chcp 65001 > nul
4
+
5
+ TITLE 安装N卡GPU支持
6
+
7
+
8
+
9
+ set "VENV_PYTHON="%~dp0runtime\python.exe""
10
+
11
+
12
+
13
+ call %VENV_PYTHON% -m pip uninstall -y torch torchaudio
14
+ call %VENV_PYTHON% -m pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
15
+
16
+ echo.
17
+ echo( 安装 cuda12.8 完毕,请重新执行启动脚本
18
+ echo.
19
+
20
+ pause