HFHash789 commited on
Commit
dc89cfc
·
verified ·
1 Parent(s): 49e9941

Upload folder using huggingface_hub

Browse files
Files changed (9) hide show
  1. .dockerignore +18 -0
  2. .gitignore +12 -0
  3. Dockerfile +29 -0
  4. LICENSE +674 -0
  5. README.md +167 -6
  6. app.py +267 -0
  7. index.html +271 -0
  8. pyproject.toml +17 -0
  9. requirements.txt +11 -0
.dockerignore ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .gitignore
3
+ __pycache__
4
+ *.pyc
5
+ *.pyo
6
+ *.pyd
7
+ .pytest_cache
8
+ .mypy_cache
9
+ .ruff_cache
10
+
11
+ # Local tooling / deploy helper (not needed inside the Space image)
12
+ chouxiang
13
+
14
+ # Local-only artifacts
15
+ token.txt
16
+ shibie.wav
17
+ uv.lock
18
+
.gitignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ shibie.wav
12
+ token.txt
Dockerfile ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+
3
+ WORKDIR /app
4
+
5
+ ENV PYTHONDONTWRITEBYTECODE=1
6
+ ENV PYTHONUNBUFFERED=1
7
+
8
+ # System deps:
9
+ # - ffmpeg: input media decoding/encoding
10
+ # - libsndfile1: audio I/O deps used by common audio stacks
11
+ # - libgomp1: OpenMP runtime (often needed by ML deps like ctranslate2)
12
+ RUN apt-get update && apt-get install -y --no-install-recommends \
13
+ ffmpeg \
14
+ libsndfile1 \
15
+ libgomp1 \
16
+ git \
17
+ && rm -rf /var/lib/apt/lists/*
18
+
19
+ COPY requirements.txt .
20
+ RUN pip install --no-cache-dir -r requirements.txt
21
+
22
+ COPY . .
23
+
24
+ ENV HOST=0.0.0.0
25
+ ENV PORT=7860
26
+ EXPOSE 7860
27
+
28
+ CMD ["python", "-u", "app.py"]
29
+
LICENSE ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GNU GENERAL PUBLIC LICENSE
2
+ Version 3, 29 June 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ Preamble
9
+
10
+ The GNU General Public License is a free, copyleft license for
11
+ software and other kinds of works.
12
+
13
+ The licenses for most software and other practical works are designed
14
+ to take away your freedom to share and change the works. By contrast,
15
+ the GNU General Public License is intended to guarantee your freedom to
16
+ share and change all versions of a program--to make sure it remains free
17
+ software for all its users. We, the Free Software Foundation, use the
18
+ GNU General Public License for most of our software; it applies also to
19
+ any other work released this way by its authors. You can apply it to
20
+ your programs, too.
21
+
22
+ When we speak of free software, we are referring to freedom, not
23
+ price. Our General Public Licenses are designed to make sure that you
24
+ have the freedom to distribute copies of free software (and charge for
25
+ them if you wish), that you receive source code or can get it if you
26
+ want it, that you can change the software or use pieces of it in new
27
+ free programs, and that you know you can do these things.
28
+
29
+ To protect your rights, we need to prevent others from denying you
30
+ these rights or asking you to surrender the rights. Therefore, you have
31
+ certain responsibilities if you distribute copies of the software, or if
32
+ you modify it: responsibilities to respect the freedom of others.
33
+
34
+ For example, if you distribute copies of such a program, whether
35
+ gratis or for a fee, you must pass on to the recipients the same
36
+ freedoms that you received. You must make sure that they, too, receive
37
+ or can get the source code. And you must show them these terms so they
38
+ know their rights.
39
+
40
+ Developers that use the GNU GPL protect your rights with two steps:
41
+ (1) assert copyright on the software, and (2) offer you this License
42
+ giving you legal permission to copy, distribute and/or modify it.
43
+
44
+ For the developers' and authors' protection, the GPL clearly explains
45
+ that there is no warranty for this free software. For both users' and
46
+ authors' sake, the GPL requires that modified versions be marked as
47
+ changed, so that their problems will not be attributed erroneously to
48
+ authors of previous versions.
49
+
50
+ Some devices are designed to deny users access to install or run
51
+ modified versions of the software inside them, although the manufacturer
52
+ can do so. This is fundamentally incompatible with the aim of
53
+ protecting users' freedom to change the software. The systematic
54
+ pattern of such abuse occurs in the area of products for individuals to
55
+ use, which is precisely where it is most unacceptable. Therefore, we
56
+ have designed this version of the GPL to prohibit the practice for those
57
+ products. If such problems arise substantially in other domains, we
58
+ stand ready to extend this provision to those domains in future versions
59
+ of the GPL, as needed to protect the freedom of users.
60
+
61
+ Finally, every program is threatened constantly by software patents.
62
+ States should not allow patents to restrict development and use of
63
+ software on general-purpose computers, but in those that do, we wish to
64
+ avoid the special danger that patents applied to a free program could
65
+ make it effectively proprietary. To prevent this, the GPL assures that
66
+ patents cannot be used to render the program non-free.
67
+
68
+ The precise terms and conditions for copying, distribution and
69
+ modification follow.
70
+
71
+ TERMS AND CONDITIONS
72
+
73
+ 0. Definitions.
74
+
75
+ "This License" refers to version 3 of the GNU General Public License.
76
+
77
+ "Copyright" also means copyright-like laws that apply to other kinds of
78
+ works, such as semiconductor masks.
79
+
80
+ "The Program" refers to any copyrightable work licensed under this
81
+ License. Each licensee is addressed as "you". "Licensees" and
82
+ "recipients" may be individuals or organizations.
83
+
84
+ To "modify" a work means to copy from or adapt all or part of the work
85
+ in a fashion requiring copyright permission, other than the making of an
86
+ exact copy. The resulting work is called a "modified version" of the
87
+ earlier work or a work "based on" the earlier work.
88
+
89
+ A "covered work" means either the unmodified Program or a work based
90
+ on the Program.
91
+
92
+ To "propagate" a work means to do anything with it that, without
93
+ permission, would make you directly or secondarily liable for
94
+ infringement under applicable copyright law, except executing it on a
95
+ computer or modifying a private copy. Propagation includes copying,
96
+ distribution (with or without modification), making available to the
97
+ public, and in some countries other activities as well.
98
+
99
+ To "convey" a work means any kind of propagation that enables other
100
+ parties to make or receive copies. Mere interaction with a user through
101
+ a computer network, with no transfer of a copy, is not conveying.
102
+
103
+ An interactive user interface displays "Appropriate Legal Notices"
104
+ to the extent that it includes a convenient and prominently visible
105
+ feature that (1) displays an appropriate copyright notice, and (2)
106
+ tells the user that there is no warranty for the work (except to the
107
+ extent that warranties are provided), that licensees may convey the
108
+ work under this License, and how to view a copy of this License. If
109
+ the interface presents a list of user commands or options, such as a
110
+ menu, a prominent item in the list meets this criterion.
111
+
112
+ 1. Source Code.
113
+
114
+ The "source code" for a work means the preferred form of the work
115
+ for making modifications to it. "Object code" means any non-source
116
+ form of a work.
117
+
118
+ A "Standard Interface" means an interface that either is an official
119
+ standard defined by a recognized standards body, or, in the case of
120
+ interfaces specified for a particular programming language, one that
121
+ is widely used among developers working in that language.
122
+
123
+ The "System Libraries" of an executable work include anything, other
124
+ than the work as a whole, that (a) is included in the normal form of
125
+ packaging a Major Component, but which is not part of that Major
126
+ Component, and (b) serves only to enable use of the work with that
127
+ Major Component, or to implement a Standard Interface for which an
128
+ implementation is available to the public in source code form. A
129
+ "Major Component", in this context, means a major essential component
130
+ (kernel, window system, and so on) of the specific operating system
131
+ (if any) on which the executable work runs, or a compiler used to
132
+ produce the work, or an object code interpreter used to run it.
133
+
134
+ The "Corresponding Source" for a work in object code form means all
135
+ the source code needed to generate, install, and (for an executable
136
+ work) run the object code and to modify the work, including scripts to
137
+ control those activities. However, it does not include the work's
138
+ System Libraries, or general-purpose tools or generally available free
139
+ programs which are used unmodified in performing those activities but
140
+ which are not part of the work. For example, Corresponding Source
141
+ includes interface definition files associated with source files for
142
+ the work, and the source code for shared libraries and dynamically
143
+ linked subprograms that the work is specifically designed to require,
144
+ such as by intimate data communication or control flow between those
145
+ subprograms and other parts of the work.
146
+
147
+ The Corresponding Source need not include anything that users
148
+ can regenerate automatically from other parts of the Corresponding
149
+ Source.
150
+
151
+ The Corresponding Source for a work in source code form is that
152
+ same work.
153
+
154
+ 2. Basic Permissions.
155
+
156
+ All rights granted under this License are granted for the term of
157
+ copyright on the Program, and are irrevocable provided the stated
158
+ conditions are met. This License explicitly affirms your unlimited
159
+ permission to run the unmodified Program. The output from running a
160
+ covered work is covered by this License only if the output, given its
161
+ content, constitutes a covered work. This License acknowledges your
162
+ rights of fair use or other equivalent, as provided by copyright law.
163
+
164
+ You may make, run and propagate covered works that you do not
165
+ convey, without conditions so long as your license otherwise remains
166
+ in force. You may convey covered works to others for the sole purpose
167
+ of having them make modifications exclusively for you, or provide you
168
+ with facilities for running those works, provided that you comply with
169
+ the terms of this License in conveying all material for which you do
170
+ not control copyright. Those thus making or running the covered works
171
+ for you must do so exclusively on your behalf, under your direction
172
+ and control, on terms that prohibit them from making any copies of
173
+ your copyrighted material outside their relationship with you.
174
+
175
+ Conveying under any other circumstances is permitted solely under
176
+ the conditions stated below. Sublicensing is not allowed; section 10
177
+ makes it unnecessary.
178
+
179
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180
+
181
+ No covered work shall be deemed part of an effective technological
182
+ measure under any applicable law fulfilling obligations under article
183
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184
+ similar laws prohibiting or restricting circumvention of such
185
+ measures.
186
+
187
+ When you convey a covered work, you waive any legal power to forbid
188
+ circumvention of technological measures to the extent such circumvention
189
+ is effected by exercising rights under this License with respect to
190
+ the covered work, and you disclaim any intention to limit operation or
191
+ modification of the work as a means of enforcing, against the work's
192
+ users, your or third parties' legal rights to forbid circumvention of
193
+ technological measures.
194
+
195
+ 4. Conveying Verbatim Copies.
196
+
197
+ You may convey verbatim copies of the Program's source code as you
198
+ receive it, in any medium, provided that you conspicuously and
199
+ appropriately publish on each copy an appropriate copyright notice;
200
+ keep intact all notices stating that this License and any
201
+ non-permissive terms added in accord with section 7 apply to the code;
202
+ keep intact all notices of the absence of any warranty; and give all
203
+ recipients a copy of this License along with the Program.
204
+
205
+ You may charge any price or no price for each copy that you convey,
206
+ and you may offer support or warranty protection for a fee.
207
+
208
+ 5. Conveying Modified Source Versions.
209
+
210
+ You may convey a work based on the Program, or the modifications to
211
+ produce it from the Program, in the form of source code under the
212
+ terms of section 4, provided that you also meet all of these conditions:
213
+
214
+ a) The work must carry prominent notices stating that you modified
215
+ it, and giving a relevant date.
216
+
217
+ b) The work must carry prominent notices stating that it is
218
+ released under this License and any conditions added under section
219
+ 7. This requirement modifies the requirement in section 4 to
220
+ "keep intact all notices".
221
+
222
+ c) You must license the entire work, as a whole, under this
223
+ License to anyone who comes into possession of a copy. This
224
+ License will therefore apply, along with any applicable section 7
225
+ additional terms, to the whole of the work, and all its parts,
226
+ regardless of how they are packaged. This License gives no
227
+ permission to license the work in any other way, but it does not
228
+ invalidate such permission if you have separately received it.
229
+
230
+ d) If the work has interactive user interfaces, each must display
231
+ Appropriate Legal Notices; however, if the Program has interactive
232
+ interfaces that do not display Appropriate Legal Notices, your
233
+ work need not make them do so.
234
+
235
+ A compilation of a covered work with other separate and independent
236
+ works, which are not by their nature extensions of the covered work,
237
+ and which are not combined with it such as to form a larger program,
238
+ in or on a volume of a storage or distribution medium, is called an
239
+ "aggregate" if the compilation and its resulting copyright are not
240
+ used to limit the access or legal rights of the compilation's users
241
+ beyond what the individual works permit. Inclusion of a covered work
242
+ in an aggregate does not cause this License to apply to the other
243
+ parts of the aggregate.
244
+
245
+ 6. Conveying Non-Source Forms.
246
+
247
+ You may convey a covered work in object code form under the terms
248
+ of sections 4 and 5, provided that you also convey the
249
+ machine-readable Corresponding Source under the terms of this License,
250
+ in one of these ways:
251
+
252
+ a) Convey the object code in, or embodied in, a physical product
253
+ (including a physical distribution medium), accompanied by the
254
+ Corresponding Source fixed on a durable physical medium
255
+ customarily used for software interchange.
256
+
257
+ b) Convey the object code in, or embodied in, a physical product
258
+ (including a physical distribution medium), accompanied by a
259
+ written offer, valid for at least three years and valid for as
260
+ long as you offer spare parts or customer support for that product
261
+ model, to give anyone who possesses the object code either (1) a
262
+ copy of the Corresponding Source for all the software in the
263
+ product that is covered by this License, on a durable physical
264
+ medium customarily used for software interchange, for a price no
265
+ more than your reasonable cost of physically performing this
266
+ conveying of source, or (2) access to copy the
267
+ Corresponding Source from a network server at no charge.
268
+
269
+ c) Convey individual copies of the object code with a copy of the
270
+ written offer to provide the Corresponding Source. This
271
+ alternative is allowed only occasionally and noncommercially, and
272
+ only if you received the object code with such an offer, in accord
273
+ with subsection 6b.
274
+
275
+ d) Convey the object code by offering access from a designated
276
+ place (gratis or for a charge), and offer equivalent access to the
277
+ Corresponding Source in the same way through the same place at no
278
+ further charge. You need not require recipients to copy the
279
+ Corresponding Source along with the object code. If the place to
280
+ copy the object code is a network server, the Corresponding Source
281
+ may be on a different server (operated by you or a third party)
282
+ that supports equivalent copying facilities, provided you maintain
283
+ clear directions next to the object code saying where to find the
284
+ Corresponding Source. Regardless of what server hosts the
285
+ Corresponding Source, you remain obligated to ensure that it is
286
+ available for as long as needed to satisfy these requirements.
287
+
288
+ e) Convey the object code using peer-to-peer transmission, provided
289
+ you inform other peers where the object code and Corresponding
290
+ Source of the work are being offered to the general public at no
291
+ charge under subsection 6d.
292
+
293
+ A separable portion of the object code, whose source code is excluded
294
+ from the Corresponding Source as a System Library, need not be
295
+ included in conveying the object code work.
296
+
297
+ A "User Product" is either (1) a "consumer product", which means any
298
+ tangible personal property which is normally used for personal, family,
299
+ or household purposes, or (2) anything designed or sold for incorporation
300
+ into a dwelling. In determining whether a product is a consumer product,
301
+ doubtful cases shall be resolved in favor of coverage. For a particular
302
+ product received by a particular user, "normally used" refers to a
303
+ typical or common use of that class of product, regardless of the status
304
+ of the particular user or of the way in which the particular user
305
+ actually uses, or expects or is expected to use, the product. A product
306
+ is a consumer product regardless of whether the product has substantial
307
+ commercial, industrial or non-consumer uses, unless such uses represent
308
+ the only significant mode of use of the product.
309
+
310
+ "Installation Information" for a User Product means any methods,
311
+ procedures, authorization keys, or other information required to install
312
+ and execute modified versions of a covered work in that User Product from
313
+ a modified version of its Corresponding Source. The information must
314
+ suffice to ensure that the continued functioning of the modified object
315
+ code is in no case prevented or interfered with solely because
316
+ modification has been made.
317
+
318
+ If you convey an object code work under this section in, or with, or
319
+ specifically for use in, a User Product, and the conveying occurs as
320
+ part of a transaction in which the right of possession and use of the
321
+ User Product is transferred to the recipient in perpetuity or for a
322
+ fixed term (regardless of how the transaction is characterized), the
323
+ Corresponding Source conveyed under this section must be accompanied
324
+ by the Installation Information. But this requirement does not apply
325
+ if neither you nor any third party retains the ability to install
326
+ modified object code on the User Product (for example, the work has
327
+ been installed in ROM).
328
+
329
+ The requirement to provide Installation Information does not include a
330
+ requirement to continue to provide support service, warranty, or updates
331
+ for a work that has been modified or installed by the recipient, or for
332
+ the User Product in which it has been modified or installed. Access to a
333
+ network may be denied when the modification itself materially and
334
+ adversely affects the operation of the network or violates the rules and
335
+ protocols for communication across the network.
336
+
337
+ Corresponding Source conveyed, and Installation Information provided,
338
+ in accord with this section must be in a format that is publicly
339
+ documented (and with an implementation available to the public in
340
+ source code form), and must require no special password or key for
341
+ unpacking, reading or copying.
342
+
343
+ 7. Additional Terms.
344
+
345
+ "Additional permissions" are terms that supplement the terms of this
346
+ License by making exceptions from one or more of its conditions.
347
+ Additional permissions that are applicable to the entire Program shall
348
+ be treated as though they were included in this License, to the extent
349
+ that they are valid under applicable law. If additional permissions
350
+ apply only to part of the Program, that part may be used separately
351
+ under those permissions, but the entire Program remains governed by
352
+ this License without regard to the additional permissions.
353
+
354
+ When you convey a copy of a covered work, you may at your option
355
+ remove any additional permissions from that copy, or from any part of
356
+ it. (Additional permissions may be written to require their own
357
+ removal in certain cases when you modify the work.) You may place
358
+ additional permissions on material, added by you to a covered work,
359
+ for which you have or can give appropriate copyright permission.
360
+
361
+ Notwithstanding any other provision of this License, for material you
362
+ add to a covered work, you may (if authorized by the copyright holders of
363
+ that material) supplement the terms of this License with terms:
364
+
365
+ a) Disclaiming warranty or limiting liability differently from the
366
+ terms of sections 15 and 16 of this License; or
367
+
368
+ b) Requiring preservation of specified reasonable legal notices or
369
+ author attributions in that material or in the Appropriate Legal
370
+ Notices displayed by works containing it; or
371
+
372
+ c) Prohibiting misrepresentation of the origin of that material, or
373
+ requiring that modified versions of such material be marked in
374
+ reasonable ways as different from the original version; or
375
+
376
+ d) Limiting the use for publicity purposes of names of licensors or
377
+ authors of the material; or
378
+
379
+ e) Declining to grant rights under trademark law for use of some
380
+ trade names, trademarks, or service marks; or
381
+
382
+ f) Requiring indemnification of licensors and authors of that
383
+ material by anyone who conveys the material (or modified versions of
384
+ it) with contractual assumptions of liability to the recipient, for
385
+ any liability that these contractual assumptions directly impose on
386
+ those licensors and authors.
387
+
388
+ All other non-permissive additional terms are considered "further
389
+ restrictions" within the meaning of section 10. If the Program as you
390
+ received it, or any part of it, contains a notice stating that it is
391
+ governed by this License along with a term that is a further
392
+ restriction, you may remove that term. If a license document contains
393
+ a further restriction but permits relicensing or conveying under this
394
+ License, you may add to a covered work material governed by the terms
395
+ of that license document, provided that the further restriction does
396
+ not survive such relicensing or conveying.
397
+
398
+ If you add terms to a covered work in accord with this section, you
399
+ must place, in the relevant source files, a statement of the
400
+ additional terms that apply to those files, or a notice indicating
401
+ where to find the applicable terms.
402
+
403
+ Additional terms, permissive or non-permissive, may be stated in the
404
+ form of a separately written license, or stated as exceptions;
405
+ the above requirements apply either way.
406
+
407
+ 8. Termination.
408
+
409
+ You may not propagate or modify a covered work except as expressly
410
+ provided under this License. Any attempt otherwise to propagate or
411
+ modify it is void, and will automatically terminate your rights under
412
+ this License (including any patent licenses granted under the third
413
+ paragraph of section 11).
414
+
415
+ However, if you cease all violation of this License, then your
416
+ license from a particular copyright holder is reinstated (a)
417
+ provisionally, unless and until the copyright holder explicitly and
418
+ finally terminates your license, and (b) permanently, if the copyright
419
+ holder fails to notify you of the violation by some reasonable means
420
+ prior to 60 days after the cessation.
421
+
422
+ Moreover, your license from a particular copyright holder is
423
+ reinstated permanently if the copyright holder notifies you of the
424
+ violation by some reasonable means, this is the first time you have
425
+ received notice of violation of this License (for any work) from that
426
+ copyright holder, and you cure the violation prior to 30 days after
427
+ your receipt of the notice.
428
+
429
+ Termination of your rights under this section does not terminate the
430
+ licenses of parties who have received copies or rights from you under
431
+ this License. If your rights have been terminated and not permanently
432
+ reinstated, you do not qualify to receive new licenses for the same
433
+ material under section 10.
434
+
435
+ 9. Acceptance Not Required for Having Copies.
436
+
437
+ You are not required to accept this License in order to receive or
438
+ run a copy of the Program. Ancillary propagation of a covered work
439
+ occurring solely as a consequence of using peer-to-peer transmission
440
+ to receive a copy likewise does not require acceptance. However,
441
+ nothing other than this License grants you permission to propagate or
442
+ modify any covered work. These actions infringe copyright if you do
443
+ not accept this License. Therefore, by modifying or propagating a
444
+ covered work, you indicate your acceptance of this License to do so.
445
+
446
+ 10. Automatic Licensing of Downstream Recipients.
447
+
448
+ Each time you convey a covered work, the recipient automatically
449
+ receives a license from the original licensors, to run, modify and
450
+ propagate that work, subject to this License. You are not responsible
451
+ for enforcing compliance by third parties with this License.
452
+
453
+ An "entity transaction" is a transaction transferring control of an
454
+ organization, or substantially all assets of one, or subdividing an
455
+ organization, or merging organizations. If propagation of a covered
456
+ work results from an entity transaction, each party to that
457
+ transaction who receives a copy of the work also receives whatever
458
+ licenses to the work the party's predecessor in interest had or could
459
+ give under the previous paragraph, plus a right to possession of the
460
+ Corresponding Source of the work from the predecessor in interest, if
461
+ the predecessor has it or can get it with reasonable efforts.
462
+
463
+ You may not impose any further restrictions on the exercise of the
464
+ rights granted or affirmed under this License. For example, you may
465
+ not impose a license fee, royalty, or other charge for exercise of
466
+ rights granted under this License, and you may not initiate litigation
467
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
468
+ any patent claim is infringed by making, using, selling, offering for
469
+ sale, or importing the Program or any portion of it.
470
+
471
+ 11. Patents.
472
+
473
+ A "contributor" is a copyright holder who authorizes use under this
474
+ License of the Program or a work on which the Program is based. The
475
+ work thus licensed is called the contributor's "contributor version".
476
+
477
+ A contributor's "essential patent claims" are all patent claims
478
+ owned or controlled by the contributor, whether already acquired or
479
+ hereafter acquired, that would be infringed by some manner, permitted
480
+ by this License, of making, using, or selling its contributor version,
481
+ but do not include claims that would be infringed only as a
482
+ consequence of further modification of the contributor version. For
483
+ purposes of this definition, "control" includes the right to grant
484
+ patent sublicenses in a manner consistent with the requirements of
485
+ this License.
486
+
487
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
488
+ patent license under the contributor's essential patent claims, to
489
+ make, use, sell, offer for sale, import and otherwise run, modify and
490
+ propagate the contents of its contributor version.
491
+
492
+ In the following three paragraphs, a "patent license" is any express
493
+ agreement or commitment, however denominated, not to enforce a patent
494
+ (such as an express permission to practice a patent or covenant not to
495
+ sue for patent infringement). To "grant" such a patent license to a
496
+ party means to make such an agreement or commitment not to enforce a
497
+ patent against the party.
498
+
499
+ If you convey a covered work, knowingly relying on a patent license,
500
+ and the Corresponding Source of the work is not available for anyone
501
+ to copy, free of charge and under the terms of this License, through a
502
+ publicly available network server or other readily accessible means,
503
+ then you must either (1) cause the Corresponding Source to be so
504
+ available, or (2) arrange to deprive yourself of the benefit of the
505
+ patent license for this particular work, or (3) arrange, in a manner
506
+ consistent with the requirements of this License, to extend the patent
507
+ license to downstream recipients. "Knowingly relying" means you have
508
+ actual knowledge that, but for the patent license, your conveying the
509
+ covered work in a country, or your recipient's use of the covered work
510
+ in a country, would infringe one or more identifiable patents in that
511
+ country that you have reason to believe are valid.
512
+
513
+ If, pursuant to or in connection with a single transaction or
514
+ arrangement, you convey, or propagate by procuring conveyance of, a
515
+ covered work, and grant a patent license to some of the parties
516
+ receiving the covered work authorizing them to use, propagate, modify
517
+ or convey a specific copy of the covered work, then the patent license
518
+ you grant is automatically extended to all recipients of the covered
519
+ work and works based on it.
520
+
521
+ A patent license is "discriminatory" if it does not include within
522
+ the scope of its coverage, prohibits the exercise of, or is
523
+ conditioned on the non-exercise of one or more of the rights that are
524
+ specifically granted under this License. You may not convey a covered
525
+ work if you are a party to an arrangement with a third party that is
526
+ in the business of distributing software, under which you make payment
527
+ to the third party based on the extent of your activity of conveying
528
+ the work, and under which the third party grants, to any of the
529
+ parties who would receive the covered work from you, a discriminatory
530
+ patent license (a) in connection with copies of the covered work
531
+ conveyed by you (or copies made from those copies), or (b) primarily
532
+ for and in connection with specific products or compilations that
533
+ contain the covered work, unless you entered into that arrangement,
534
+ or that patent license was granted, prior to 28 March 2007.
535
+
536
+ Nothing in this License shall be construed as excluding or limiting
537
+ any implied license or other defenses to infringement that may
538
+ otherwise be available to you under applicable patent law.
539
+
540
+ 12. No Surrender of Others' Freedom.
541
+
542
+ If conditions are imposed on you (whether by court order, agreement or
543
+ otherwise) that contradict the conditions of this License, they do not
544
+ excuse you from the conditions of this License. If you cannot convey a
545
+ covered work so as to satisfy simultaneously your obligations under this
546
+ License and any other pertinent obligations, then as a consequence you may
547
+ not convey it at all. For example, if you agree to terms that obligate you
548
+ to collect a royalty for further conveying from those to whom you convey
549
+ the Program, the only way you could satisfy both those terms and this
550
+ License would be to refrain entirely from conveying the Program.
551
+
552
+ 13. Use with the GNU Affero General Public License.
553
+
554
+ Notwithstanding any other provision of this License, you have
555
+ permission to link or combine any covered work with a work licensed
556
+ under version 3 of the GNU Affero General Public License into a single
557
+ combined work, and to convey the resulting work. The terms of this
558
+ License will continue to apply to the part which is the covered work,
559
+ but the special requirements of the GNU Affero General Public License,
560
+ section 13, concerning interaction through a network will apply to the
561
+ combination as such.
562
+
563
+ 14. Revised Versions of this License.
564
+
565
+ The Free Software Foundation may publish revised and/or new versions of
566
+ the GNU General Public License from time to time. Such new versions will
567
+ be similar in spirit to the present version, but may differ in detail to
568
+ address new problems or concerns.
569
+
570
+ Each version is given a distinguishing version number. If the
571
+ Program specifies that a certain numbered version of the GNU General
572
+ Public License "or any later version" applies to it, you have the
573
+ option of following the terms and conditions either of that numbered
574
+ version or of any later version published by the Free Software
575
+ Foundation. If the Program does not specify a version number of the
576
+ GNU General Public License, you may choose any version ever published
577
+ by the Free Software Foundation.
578
+
579
+ If the Program specifies that a proxy can decide which future
580
+ versions of the GNU General Public License can be used, that proxy's
581
+ public statement of acceptance of a version permanently authorizes you
582
+ to choose that version for the Program.
583
+
584
+ Later license versions may give you additional or different
585
+ permissions. However, no additional obligations are imposed on any
586
+ author or copyright holder as a result of your choosing to follow a
587
+ later version.
588
+
589
+ 15. Disclaimer of Warranty.
590
+
591
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599
+
600
+ 16. Limitation of Liability.
601
+
602
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610
+ SUCH DAMAGES.
611
+
612
+ 17. Interpretation of Sections 15 and 16.
613
+
614
+ If the disclaimer of warranty and limitation of liability provided
615
+ above cannot be given local legal effect according to their terms,
616
+ reviewing courts shall apply local law that most closely approximates
617
+ an absolute waiver of all civil liability in connection with the
618
+ Program, unless a warranty or assumption of liability accompanies a
619
+ copy of the Program in return for a fee.
620
+
621
+ END OF TERMS AND CONDITIONS
622
+
623
+ How to Apply These Terms to Your New Programs
624
+
625
+ If you develop a new program, and you want it to be of the greatest
626
+ possible use to the public, the best way to achieve this is to make it
627
+ free software which everyone can redistribute and change under these terms.
628
+
629
+ To do so, attach the following notices to the program. It is safest
630
+ to attach them to the start of each source file to most effectively
631
+ state the exclusion of warranty; and each file should have at least
632
+ the "copyright" line and a pointer to where the full notice is found.
633
+
634
+ <one line to give the program's name and a brief idea of what it does.>
635
+ Copyright (C) <year> <name of author>
636
+
637
+ This program is free software: you can redistribute it and/or modify
638
+ it under the terms of the GNU General Public License as published by
639
+ the Free Software Foundation, either version 3 of the License, or
640
+ (at your option) any later version.
641
+
642
+ This program is distributed in the hope that it will be useful,
643
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
644
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645
+ GNU General Public License for more details.
646
+
647
+ You should have received a copy of the GNU General Public License
648
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
649
+
650
+ Also add information on how to contact you by electronic and paper mail.
651
+
652
+ If the program does terminal interaction, make it output a short
653
+ notice like this when it starts in an interactive mode:
654
+
655
+ <program> Copyright (C) <year> <name of author>
656
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657
+ This is free software, and you are welcome to redistribute it
658
+ under certain conditions; type `show c' for details.
659
+
660
+ The hypothetical commands `show w' and `show c' should show the appropriate
661
+ parts of the General Public License. Of course, your program's commands
662
+ might be different; for a GUI interface, you would use an "about box".
663
+
664
+ You should also get your employer (if you work as a programmer) or school,
665
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
666
+ For more information on this, and how to apply and follow the GNU GPL, see
667
+ <https://www.gnu.org/licenses/>.
668
+
669
+ The GNU General Public License does not permit incorporating your program
670
+ into proprietary programs. If your program is a subroutine library, you
671
+ may consider it more useful to permit linking proprietary applications with
672
+ the library. If this is what you want to do, use the GNU Lesser General
673
+ Public License instead of this License. But first, please read
674
+ <https://www.gnu.org/licenses/why-not-lgpl.html>.
README.md CHANGED
@@ -1,10 +1,171 @@
1
  ---
2
- title: Whisperx Api
3
- emoji: 🏃
4
- colorFrom: green
5
- colorTo: gray
6
  sdk: docker
7
- pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: WhisperX API
 
 
 
3
  sdk: docker
4
+ app_port: 7860
5
  ---
6
 
7
+ # WhisperX API with Web UI
8
+
9
+
10
+ 一键在本地部署一个带 Web 界面的 WhisperX 服务,提供高精度的语音转录和说话人分离功能,同时兼容 OpenAI API。
11
+
12
+ ![Web UI Screenshot](https://pvtr2.pyvideotrans.com/1762936042163_image.png)
13
+
14
+ ## ✨ 项目亮点
15
+
16
+ * 🚀 **一键启动**: 使用 `uv` 工具,一条命令即可完成环境配置和启动。
17
+ * 💻 **简洁 Web UI**: 提供开箱即用的网页界面,通过拖拽即可完成音频/视频转录。
18
+ * 🗣️ **说话人分离**: 基于 `pyannote.audio`,自动识别并标注对话中的不同说话人。
19
+ * ⚡ **OpenAI 兼容 API**: 可作为 OpenAI Whisper API 的本地平替,无缝集成到现有项目中。
20
+ * 🔒 **完全本地化**: 所有计算都在你的电脑上完成,确保数据隐私和安全。
21
+ * 🎯 **高精度转录**: 基于强大的 WhisperX (FasterWhisper),提供快速且准确的转录结果。
22
+
23
+ ## 🛠️ 准备工作
24
+
25
+ 在开始之前,请确保您的系统已安装以下必备软件:
26
+
27
+ 1. **硬件要求**:
28
+ * **强烈推荐**: 拥有一块 NVIDIA 显卡 (GPU) 并安装 [CUDA](https://developer.nvidia.com/cuda-toolkit),6GB 以上显存。
29
+ * **最低要求**: 现代多核 CPU,但处理速度会较慢。
30
+
31
+ 2. **软件依赖**:
32
+ * **Python**: `3.10` - `3.12` 版本。
33
+ * **[uv](https://github.com/astral-sh/uv)**: 一个极速的 Python 包管理器。
34
+ * **[FFmpeg](https://ffmpeg.org/download.html)**: 用于音视频格式转换。
35
+
36
+
37
+ 3. **网络环境**:
38
+ * 首次运行时需要从 Hugging Face 下载模型,请确保您的网络可以访问 `huggingface.co`。
39
+
40
+ ## 🚀 快速开始
41
+
42
+ #### 第 1 步:克隆项目
43
+
44
+ ```bash
45
+ git clone https://github.com/jianchang512/whisperx-api.git
46
+ cd whisperx-api
47
+ ```
48
+
49
+ #### 第 2 步:配置说话人分离 (可选)
50
+
51
+
52
+ 1. **登录 Hugging Face**: 访问 [huggingface.co](https://huggingface.co/) 并注册/登录。
53
+
54
+ 2. **同意模型协议**:
55
+ * 访问 [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
56
+ * 访问 [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
57
+ * 在这两个页面同意并接受用户协议。
58
+
59
+ 3. **获取并配置 Token**:
60
+ * 在 [Hugging Face Tokens 页面](https://huggingface.co/settings/tokens) 创建一个新的 **read** 权限的访问令牌。
61
+ * 在项目根目录下创建一个名为 `token.txt` 的文件,并将复制的令牌粘贴进去。
62
+
63
+ #### 第 3 步:一键启动!
64
+
65
+ 确保终端位于项目根目录,然后运行:
66
+
67
+ ```bash
68
+ uv run app.py
69
+ ```
70
+
71
+ `uv` 会自动处理所有 Python 依赖的安装。首次运行会下载模型,请耐心等待。当看到以下日志时,表示服务已成功启动:
72
+
73
+ ![Startup Log](https://pvtr2.pyvideotrans.com/1762936028281_image.png)
74
+
75
+ 服务启动后,它会自动在浏览器中打开 **`http://127.0.0.1:9092`**。
76
+
77
+ ## ☁️ 部署到 Hugging Face Spaces(免费 CPU,API 对外)
78
+
79
+ 1. 新建 Space,选择 `Docker`。
80
+ 2. 直接上传本仓库代码(或用 `chouxiang/deploy.py` 自动上传)。
81
+ 3. Space 会自动构建并对外提供:
82
+ * Web UI: `https://<your-space>.hf.space/`
83
+ * OpenAI 兼容 API: `https://<your-space>.hf.space/v1/audio/transcriptions`
84
+
85
+ 可选环境变量(Space `Settings -> Secrets`):
86
+ * `HUGGING_FACE_TOKEN`: 启用说话人分离(pyannote)所需;不设置则自动禁用。
87
+ * `DEFAULT_MODEL`: CPU 建议用 `tiny|base|small`(默认 CPU 为 `small`)。
88
+ * `ALLOW_LARGE_ON_CPU=1`: 允许在 CPU 上请求 `large-*`(不推荐,容易超时/内存不足)。
89
+
90
+ ## 📖 使用指南
91
+
92
+ ### 方式一:Web 界面 (推荐)
93
+
94
+ 1. **上传文件**: 点击或拖拽音频/视频文件到上传区域。
95
+ 2. **配置参数**:
96
+ * **语言**: 选择音频语言或保持“自动检测”。
97
+ * **模型**: 模型越大,效果越好但速度越慢。`large-v3-turbo` 是推荐的平衡点。
98
+ * **提示词 (Prompt)**: 提供专业术语、人名等可以提高识别准确率 (例如 `OpenAI, WhisperX, PyTorch`)。
99
+ 3. **开始转录**: 点击“提交转录”按钮。
100
+ 4. **查看和下载**: 结果会以 SRT 字幕格式显示在下方文本框中,可直接编辑并下载。
101
+
102
+ ### 方式二:OpenAI 兼容 API
103
+
104
+ 您可以将此服务作为 OpenAI Whisper API 的本地替代品。
105
+
106
+ model: tiny|base|small|medium|large-v2|large-v3|large-v3-turbo
107
+
108
+ response_format: 固定值 diarized_json
109
+
110
+ extra_body:
111
+
112
+ max_speakers: 最大说话人数量,-1:不启用,0:启用说话人并且不限制最大说话人数量,>0:最大说话人数量
113
+
114
+ min_speakers: 最小说话人数量,=0:不指定最小说话人数量,>0:最小说话人数量
115
+
116
+ **示例 Python 代码:**
117
+
118
+ ```python
119
+ from openai import OpenAI
120
+
121
+ # base_url 指向本地服务地址,api_key 可任意填写
122
+ client = OpenAI(base_url='http://127.0.0.1:9092/v1', api_key='dummy-key')
123
+
124
+ audio_path = "path/to/your/audio.wav"
125
+
126
+ with open(audio_path, "rb") as audio_file:
127
+ transcript = client.audio.transcriptions.create(
128
+ model="large-v3", # 可选 'tiny', 'base', 'large-v3' 等
129
+ file=audio_file,
130
+ response_format="diarized_json", # 固定值
131
+ extra_body={
132
+ "max_speakers": 4, # -1=不启用说话人识别,0=启用说话人并且不指定最大说话人数量,>0=最大说话人数量
133
+ "min_speakers": 2 # =0 不指定最小说话人数量,>0=最小说话人数量
134
+ },
135
+ )
136
+
137
+ # 打印带说话人信息的字幕片段
138
+ for segment in transcript.segments:
139
+ speaker = segment.get('speaker', 'Unknown')
140
+ start_time = segment['start']
141
+ end_time = segment['end']
142
+ text = segment['text']
143
+
144
+ print(f"[{start_time:.2f}s -> {end_time:.2f}s] {speaker}: {text}")
145
+
146
+ # output: [TranscriptionDiarizedSegment(id=None, end=24.283, speaker=None, start=0.031, text='五老星系中發訊的有機分子我們林第三類接觸還有多人微博 真是展開拍攝任務已經進來中年最近也傳過來許多過去難以拍攝到的照片又越出天文學家在自然期看上發表了這場照片在藍色核心外環繞著一圈橘黃色的光 芒這是一個星系規模的甜甜圈', type=None), TranscriptionDiarizedSegment(id=None, end=40.821, speaker=None, start=24.263, text='這是一個傳送門這是外星文明的代生環其實這是一個還有有幾五多環方向聽的古老星系他的名字是SPT 臨四一巴 带選四十七因為名字很長以下我們就檢稱為SPT 臨四一巴吧', type=None), TranscriptionDiarizedSegment(id=None, end=57.544, speaker=None, start=40.801, text='這個結果有什麼特殊意義這代表我們發現外形生命的嗎?本集節目是販唐會員選題紅每個月都會製作由會員投票出來的題目如果你有好題目希望我們做一集來講解或討論哪上點擊加入按鈕成為我們的會員吧', type=None),...]
147
+
148
+ ```
149
+
150
+
151
+ ## ❓ 常见问题 (FAQ)
152
+
153
+ * **Q: 启动时提示 `FFmpeg not found`?**
154
+ A: 说明 FFmpeg 未正确安装或未添加到系统环境变量(PATH)。请参考 **准备工作** 中的安装指南。
155
+
156
+ * **Q: 点击“提交转录”后长时间无响应或报错?**
157
+ A: 首次运行需要下载模型,请耐心等待。如果报错,请检查终端日志。最常见的原因是网络问题导致模型下载失败。
158
+
159
+ * **Q: 结果中为何没有 `[Speaker1]`, `[Speaker2]` 标记?**
160
+ A: 1) 音频中只有单人说话,程序会自动判断。2) 您未配置说话人分离功能(**第 2 步**),或 Hugging Face 上的模型协议申请还未通过审核。
161
+
162
+ * **Q: 处理速度很慢怎么办?**
163
+ A: 这是因为您在使用 CPU 进行计算。使用 NVIDIA GPU 会极大提升处理速度。
164
+
165
+
166
+
167
+ ## 致谢
168
+
169
+ * [WhisperX](https://github.com/m-bain/whisperX)
170
+ * [Faster Whisper](https://github.com/guillaumekln/faster-whisper)
171
+ * [pyannote.audio](https://github.com/pyannote/pyannote-audio)
app.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os
3
+ import tempfile
4
+ import torch
5
+ import whisperx
6
+ from flask import Flask, request, jsonify, render_template
7
+ from waitress import serve
8
+ import logging
9
+ import webbrowser
10
+ from threading import Timer
11
+ import shutil
12
+ import sys
13
+ import ffmpeg
14
+ try:
15
+ from whisperx.diarize import DiarizationPipeline
16
+ except Exception:
17
+ DiarizationPipeline = None
18
+
19
+ # --- 全局配置与初始化 ---
20
+
21
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
22
+
23
+ def _env_bool(name: str, default: bool) -> bool:
24
+ val = os.environ.get(name)
25
+ if val is None:
26
+ return default
27
+ return val.strip().lower() in {"1", "true", "yes", "y", "on"}
28
+
29
+ def get_hf_token():
30
+ """
31
+ 获取 Hugging Face 令牌。
32
+ 优先从当前目录的 'token.txt' 文件读取,如果失败则从环境变量 'HUGGING_FACE_TOKEN' 读取。
33
+ """
34
+ token = None
35
+ token_file = 'token.txt'
36
+ if os.path.exists(token_file):
37
+ try:
38
+ with open(token_file, 'r', encoding='utf-8') as f:
39
+ token = f.read().strip()
40
+ if token:
41
+ logging.info(f"成功从 {token_file} 文件中读取 Hugging Face 令牌。")
42
+ return token
43
+ except Exception as e:
44
+ logging.warning(f"无法从 {token_file} 读取令牌: {e}")
45
+
46
+ token = os.environ.get("HUGGING_FACE_TOKEN")
47
+ if token:
48
+ logging.info("成功从环境变量中读取 Hugging Face 令牌。")
49
+ else:
50
+ logging.warning("在 token.txt 或环境变量中均未找到 Hugging Face 令牌。说话人分离功能将被禁用。")
51
+ return token
52
+
53
+ HF_TOKEN = get_hf_token()
54
+
55
+ # 设备和计算类型配置
56
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
57
+ COMPUTE_TYPE = "float16" if torch.cuda.is_available() else "int8"
58
+ BATCH_SIZE = 16 if DEVICE == "cuda" else 8
59
+
60
+ logging.info(f"使用设备: {DEVICE},计算类型: {COMPUTE_TYPE}")
61
+
62
+ # 模型配置
63
+ ALLOWED_MODELS = ['tiny', 'base', 'small', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large-v3-turbo']
64
+ DEFAULT_MODEL = os.environ.get("DEFAULT_MODEL") or ("small" if DEVICE == "cpu" else "large-v3")
65
+ ALLOW_LARGE_ON_CPU = _env_bool("ALLOW_LARGE_ON_CPU", False)
66
+
67
+ # 模型缓存
68
+ whisper_models_cache = {}
69
+ diarize_model = None
70
+ diarize_model_loaded = False
71
+ align_models_cache = {}
72
+
73
+ def get_whisper_model(model_name: str):
74
+ if model_name not in whisper_models_cache:
75
+ logging.info(f"正在加载 Whisper 模型 '{model_name}'...")
76
+ try:
77
+ model = whisperx.load_model(model_name, DEVICE, compute_type=COMPUTE_TYPE)
78
+ whisper_models_cache[model_name] = model
79
+ logging.info(f"模型 '{model_name}' 加载成功。")
80
+ except Exception as e:
81
+ logging.error(f"加载 Whisper 模型 '{model_name}' 失败: {e}")
82
+ if str(e).find('huggingface'):
83
+ print(f"\n\n=======可能模型下载失败,请尝试科学上网后再次重试=======\n\n")
84
+ raise
85
+ return whisper_models_cache[model_name]
86
+
87
+ def get_align_model(language_code: str):
88
+ if language_code not in align_models_cache:
89
+ logging.info(f"正在加载对齐模型 (language={language_code})...")
90
+ model_a, metadata = whisperx.load_align_model(language_code=language_code, device=DEVICE)
91
+ align_models_cache[language_code] = (model_a, metadata)
92
+ logging.info("对齐模型加载成功。")
93
+ return align_models_cache[language_code]
94
+
95
+ def get_diarize_model():
96
+ global diarize_model, diarize_model_loaded
97
+
98
+
99
+ if not diarize_model_loaded:
100
+ logging.info("正在尝试加载说话人分离模型...")
101
+ if DiarizationPipeline is None:
102
+ logging.warning("未检测到说话人分离依赖 (DiarizationPipeline),此功能将被禁用。")
103
+ diarize_model_loaded = True
104
+ return None
105
+ if not HF_TOKEN:
106
+ diarize_model_loaded = True
107
+ return None
108
+ try:
109
+ diarize_model = DiarizationPipeline(use_auth_token=HF_TOKEN, device=DEVICE)
110
+ diarize_model_loaded = True
111
+ logging.info("说话人分离模型加载成功。")
112
+ except Exception as e:
113
+ logging.error(f"严重错误: 说话人分离模型加载失败。此功能将被禁用。错误信息: {e}")
114
+ diarize_model = None
115
+ diarize_model_loaded = True
116
+ return diarize_model
117
+
118
+ # --- Flask 应用 ---
119
+ app = Flask(__name__, template_folder='.')
120
+
121
+ @app.route('/', methods=['GET'])
122
+ def index():
123
+ return render_template('index.html')
124
+
125
+ @app.route('/v1/audio/transcriptions', methods=['POST'])
126
+ def audio_transcriptions():
127
+ if 'file' not in request.files:
128
+ return jsonify({"error": "请求中未包含文件部分"}), 400
129
+ file = request.files['file']
130
+ if file.filename == '':
131
+ return jsonify({"error": "未选择任何文件"}), 400
132
+
133
+ print(request.form)
134
+ model_id = request.form.get('model', DEFAULT_MODEL)
135
+ model_name = 'large-v3' if model_id == 'large-v3-turbo' else model_id
136
+ if model_name not in ALLOWED_MODELS:
137
+ model_name = DEFAULT_MODEL
138
+ if DEVICE == "cpu" and (model_name.startswith("large-") or model_name == "large") and not ALLOW_LARGE_ON_CPU:
139
+ logging.warning(f"CPU 环境下请求大模型 '{model_name}',将自动降级为 'small' (可通过 ALLOW_LARGE_ON_CPU=1 关闭降级)。")
140
+ model_name = "small"
141
+
142
+ language = request.form.get('language') or None
143
+ prompt = request.form.get('prompt')
144
+ max_speakers=int(request.form.get('max_speakers',-1))
145
+ min_speakers=int(request.form.get('min_speakers',0))
146
+
147
+ logging.info(f"收到请求: 模型='{model_id}', 语言='{language or '自动检测'}', 提示词='{'有' if prompt else '无'}'")
148
+
149
+ input_file_path = None
150
+ processed_wav_path = None
151
+ try:
152
+ suffix = os.path.splitext(file.filename)[1]
153
+ with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
154
+ file.save(tmp.name)
155
+ input_file_path = tmp.name
156
+
157
+ logging.info(f"正在将上传的文件 '{file.filename}' 转换为标准的 16kHz 单声道 WAV 格式...")
158
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_wav:
159
+ processed_wav_path = tmp_wav.name
160
+
161
+ try:
162
+ (
163
+ ffmpeg
164
+ .input(input_file_path)
165
+ .output(processed_wav_path, ac=1, ar=16000, acodec='pcm_s16le', vn=None)
166
+ .run(capture_stdout=True, capture_stderr=True, overwrite_output=True)
167
+ )
168
+ logging.info("文件格式转换成功。")
169
+ except ffmpeg.Error as e:
170
+ error_details = e.stderr.decode('utf-8', errors='ignore')
171
+ logging.error(f"FFmpeg 文件转换失败: {error_details}")
172
+ return jsonify({"error": f"音频/视频文件处理失败,可能是文件已损坏或格式不受支持。"}), 400
173
+
174
+ audio = whisperx.load_audio(processed_wav_path)
175
+ model = get_whisper_model(model_name)
176
+
177
+ # ---
178
+ # *** FIX IS HERE ***
179
+ # ---
180
+ transcribe_options = {}
181
+ if language:
182
+ transcribe_options['language'] = language
183
+ if prompt:
184
+ # 使用正确的参数名 'prompt'
185
+ transcribe_options['prompt'] = prompt
186
+ print('开始转录')
187
+ result = model.transcribe(audio, batch_size=BATCH_SIZE, **transcribe_options)
188
+ print('转录结束,准备对齐')
189
+ model_a, metadata = get_align_model(result["language"])
190
+ result = whisperx.align(result["segments"], model_a, metadata, audio, DEVICE, return_char_alignments=False)
191
+
192
+ if max_speakers>-1:
193
+ print('进入说话人识别')
194
+ diar_model = get_diarize_model()
195
+ if diar_model:
196
+ try:
197
+ diarize_segments = diar_model(audio,max_speakers=max_speakers if max_speakers>0 else None,min_speakers=min_speakers if min_speakers>0 else None)
198
+ result = whisperx.assign_word_speakers(diarize_segments, result)
199
+ except Exception as e:
200
+ logging.error(f"说话人分离运行时失败: {e}。将回退到单说话人模式。")
201
+
202
+ speakers = {segment.get('speaker') for segment in result["segments"] if 'speaker' in segment}
203
+ is_single_speaker = len(speakers) <= 1
204
+ logging.info(f"检测到的说话人: {speakers}。单说话人模式: {'是' if is_single_speaker else '否'}")
205
+
206
+ speaker_mapping = {f"SPEAKER_{i:02d}": f"Speaker{i+1}" for i in range(20)}
207
+
208
+
209
+ print(result)
210
+ formatted_segments = []
211
+ for segment in result["segments"]:
212
+ speaker_raw = segment.get("speaker", "SPEAKER_00")
213
+ speaker_name = speaker_mapping.get(speaker_raw, speaker_raw)
214
+ text = segment['text'].strip()
215
+ if not text:
216
+ continue
217
+
218
+
219
+ tmp={
220
+ "start": segment['start'],
221
+ "end": segment['end'],
222
+ "text": text
223
+ }
224
+ segment_speaker = speaker_name if not is_single_speaker else None
225
+ if segment_speaker:
226
+ tmp['speaker']=segment_speaker
227
+ formatted_segments.append(tmp)
228
+
229
+ response_data = {"segments": formatted_segments}
230
+ return jsonify(response_data)
231
+
232
+ except Exception as e:
233
+ logging.error(f"处理流程中发生未知错误: {e}", exc_info=True)
234
+ return jsonify({"error": "处理过程中发生内部错误。"}), 500
235
+ finally:
236
+ if input_file_path and os.path.exists(input_file_path):
237
+ os.remove(input_file_path)
238
+ logging.info(f"已清理临时上传文件: {input_file_path}")
239
+ if processed_wav_path and os.path.exists(processed_wav_path):
240
+ os.remove(processed_wav_path)
241
+ logging.info(f"已清理临时WAV文件: {processed_wav_path}")
242
+
243
+ # --- 启动服务 ---
244
+ def check_ffmpeg():
245
+ if not shutil.which("ffmpeg"):
246
+ logging.error("错误: 系统 PATH 中未找到 FFmpeg。")
247
+ print("\n错误: 系统 PATH 中未找到 FFmpeg。")
248
+ print("请确保您已安装 FFmpeg 并且其路径已添加到系统环境变量中。")
249
+ print("Windows 安装指南: https://www.wikihow.com/Install-FFmpeg-on-Windows")
250
+ print("macOS (使用 Homebrew): brew install ffmpeg")
251
+ print("Linux (Ubuntu/Debian): sudo apt update && sudo apt install ffmpeg")
252
+ sys.exit(1)
253
+ logging.info("FFmpeg 环境检查通过。")
254
+
255
+ def open_browser(url):
256
+ webbrowser.open_new(url)
257
+
258
+ if __name__ == '__main__':
259
+ check_ffmpeg()
260
+ host = os.environ.get("HOST", "127.0.0.1")
261
+ port = int(os.environ.get("PORT", "9092"))
262
+ url = f"http://{host}:{port}"
263
+ running_in_space = bool(os.environ.get("SPACE_ID")) or bool(os.environ.get("HF_SPACE")) or bool(os.environ.get("SYSTEM") == "spaces")
264
+ if _env_bool("OPEN_BROWSER", True) and not running_in_space:
265
+ Timer(1, lambda: open_browser(url)).start()
266
+ logging.info(f"服务已启动,正在监听 http://{host}:{port}")
267
+ serve(app, host=host, port=port, threads=10)
index.html ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="zh-CN">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>WhisperX API</title>
7
+ <style>
8
+ :root {
9
+ --bg-color: #f8f9fa; --font-color: #212529; --primary-color: #007bff;
10
+ --primary-hover-color: #0056b3; --border-color: #dee2e6; --card-bg: #ffffff;
11
+ --input-bg: #ffffff; --disabled-color: #6c757d; --error-color: #dc3545;
12
+ --success-color: #28a745;
13
+ }
14
+ body {
15
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, "Noto Sans", sans-serif;
16
+ margin: 0; padding: 2rem 1rem; background-color: var(--bg-color);
17
+ color: var(--font-color); display: flex; flex-direction: column; align-items: center; min-height: 100vh;
18
+ }
19
+ main { width: 100%; max-width: 1200px; }
20
+ .container { background-color: var(--card-bg); border-radius: 8px; padding: 2rem; box-shadow: 0 4px 8px rgba(0,0,0,0.05); }
21
+ h1 { text-align: center; color: var(--font-color); margin-bottom: 2rem; }
22
+ .form-group { margin-bottom: 1.5rem; }
23
+ .controls-row { display: flex; gap: 1.5rem; align-items: flex-end; flex-wrap: wrap; }
24
+ .control-item { flex: 1; min-width: 200px; }
25
+ label { display: block; font-weight: 600; margin-bottom: 0.5rem; }
26
+ input[type="text"], select {
27
+ width: 100%; padding: 0.75rem; border: 1px solid var(--border-color);
28
+ border-radius: 4px; font-size: 1rem; background-color: var(--input-bg);
29
+ box-sizing: border-box;
30
+ }
31
+ .prompt-help { font-size: 0.875rem; color: var(--disabled-color); margin-top: 0.5rem; display: none; }
32
+ #drop-zone {
33
+ border: 2px dashed var(--border-color); border-radius: 8px; padding: 3rem;
34
+ text-align: center; cursor: pointer; transition: background-color 0.2s ease, border-color 0.2s ease;
35
+ }
36
+ #drop-zone.drag-over { border-color: var(--primary-color); background-color: #e9f3ff; }
37
+ #drop-zone p { margin: 0; font-size: 1.1rem; color: var(--disabled-color); }
38
+ #file-name { font-weight: bold; color: var(--primary-color); margin-top: 0.5rem; }
39
+ #submit-btn {
40
+ width: 100%; padding: 0.8rem; font-size: 1.1rem; font-weight: 600;
41
+ color: #fff; background-color: var(--primary-color); border: none;
42
+ border-radius: 4px; cursor: pointer; transition: background-color 0.2s ease;
43
+ }
44
+ #submit-btn:hover:not(:disabled) { background-color: var(--primary-hover-color); }
45
+ #submit-btn:disabled { background-color: var(--disabled-color); cursor: not-allowed; }
46
+ #result-container { margin-top: 2rem; display: none; }
47
+ textarea {
48
+ width: 100%; height: 400px; padding: 1rem; border: 1px solid var(--border-color);
49
+ border-radius: 4px; font-family: "Courier New", Courier, monospace;
50
+ font-size: 0.95rem; line-height: 1.5; box-sizing: border-box; resize: vertical;
51
+ }
52
+ .result-actions { text-align: right; margin-top: 1rem; }
53
+ #download-btn {
54
+ padding: 0.6rem 1.2rem; font-size: 1rem; color: #fff;
55
+ background-color: var(--success-color); border: none; border-radius: 4px; cursor: pointer;
56
+ }
57
+ footer {
58
+ margin-top: 2rem;
59
+ padding: 1rem;
60
+ text-align: center;
61
+ color: var(--disabled-color);
62
+ font-size: 0.9rem;
63
+ }
64
+ footer a {
65
+ color: var(--primary-color);
66
+ text-decoration: none;
67
+ }
68
+ footer a:hover {
69
+ text-decoration: underline;
70
+ }
71
+ </style>
72
+ </head>
73
+ <body>
74
+ <main>
75
+ <div class="container">
76
+ <h1>WhisperX 语音转录UI & API</h1>
77
+
78
+ <div class="form-group">
79
+ <input type="file" id="file-input" accept="audio/*,video/*" style="display: none;">
80
+ <div id="drop-zone">
81
+ <p>点击此处选择文件,或将音频/视频文件拖拽到这里</p>
82
+ <p id="file-name"></p>
83
+ </div>
84
+ </div>
85
+
86
+ <div class="form-group controls-row">
87
+ <div class="control-item">
88
+ <label for="language">语言</label>
89
+ <select id="language"></select>
90
+ </div>
91
+ <div class="control-item">
92
+ <label for="model">模型</label>
93
+ <select id="model">
94
+ <option value="large-v3-turbo">large-v3-turbo (推荐)</option>
95
+ <option value="large-v3">large-v3</option>
96
+ <option value="large-v2">large-v2</option>
97
+ <option value="medium">medium</option>
98
+ <option value="small">small</option>
99
+ <option value="base">base</option>
100
+ <option value="tiny">tiny</option>
101
+ </select>
102
+ </div>
103
+ <div class="control-item" style="flex: 1.5;">
104
+ <label for="prompt">提示词 (Prompt)</label>
105
+ <input type="text" id="prompt" placeholder="提高特定词汇识别率, 如: OpenAI, WhisperX">
106
+ </div>
107
+ </div>
108
+
109
+ <button id="submit-btn" disabled>提交转录</button>
110
+
111
+ <div id="result-container">
112
+ <h2>预览和编辑</h2>
113
+ <textarea id="srt-output" placeholder="转录结果将显示在这里..."></textarea>
114
+ <div class="result-actions">
115
+ <button id="download-btn">下载 SRT 文件</button>
116
+ </div>
117
+ </div>
118
+ </div>
119
+ </main>
120
+ <footer>
121
+ <p>By <a href="https://github.com/jianchang512/whisperx-api" target="_blank">jianchang512/whisperx-api</a></p>
122
+ </footer>
123
+
124
+ <script>
125
+ // DOM Elements
126
+ const dropZone = document.getElementById('drop-zone');
127
+ const fileInput = document.getElementById('file-input');
128
+ const fileNameDisplay = document.getElementById('file-name');
129
+ const languageSelect = document.getElementById('language');
130
+ const modelSelect = document.getElementById('model');
131
+ const promptInput = document.getElementById('prompt');
132
+ const submitBtn = document.getElementById('submit-btn');
133
+ const resultContainer = document.getElementById('result-container');
134
+ const srtOutput = document.getElementById('srt-output');
135
+ const downloadBtn = document.getElementById('download-btn');
136
+
137
+ let selectedFile = null;
138
+
139
+ const languages = {
140
+ "auto": "自动检测", "en": "English (英语)", "zh": "Chinese (中文)", "de": "German (德语)", "es": "Spanish (西班牙语)",
141
+ "ru": "Russian (俄语)", "ko": "Korean (韩语)", "fr": "French (法语)", "ja": "Japanese (日语)", "pt": "Portuguese (葡萄牙语)",
142
+ "tr": "Turkish (土耳其语)", "pl": "Polish (波兰语)", "ca": "Catalan (加泰罗尼亚语)", "nl": "Dutch (荷兰语)", "ar": "Arabic (阿拉伯语)",
143
+ "sv": "Swedish (瑞典语)", "it": "Italian (意大利语)", "id": "Indonesian (印尼语)", "hi": "Hindi (印地语)", "fi": "Finnish (芬兰语)",
144
+ "vi": "Vietnamese (越南语)", "he": "Hebrew (希伯来语)", "uk": "Ukrainian (乌克兰语)", "el": "Greek (希腊语)", "ms": "Malay (马来语)",
145
+ "cs": "Czech (捷克语)", "ro": "Romanian (罗马尼亚语)", "da": "Danish (丹麦语)", "hu": "Hungarian (匈牙利语)", "ta": "Tamil (泰米尔语)",
146
+ "no": "Norwegian (挪威语)", "th": "Thai (泰语)", "ur": "Urdu (乌尔都语)", "hr": "Croatian (克罗地亚语)", "bg": "Bulgarian (保加利亚语)",
147
+ "lt": "Lithuanian (立陶宛语)", "la": "Latin (拉丁语)", "mi": "Maori (毛利语)", "ml": "Malayalam (马拉雅拉姆语)", "cy": "Welsh (威尔士语)",
148
+ "sk": "Slovak (斯洛伐克语)", "te": "Telugu (泰卢固语)", "fa": "Persian (波斯语)", "lv": "Latvian (拉脱维亚语)", "bn": "Bengali (孟加拉语)",
149
+ "sr": "Serbian (塞尔维亚语)", "az": "Azerbaijani (阿塞拜疆语)", "sl": "Slovenian (斯洛文尼亚语)", "kn": "Kannada (卡纳达语)", "et": "Estonian (爱沙尼亚语)",
150
+ "mk": "Macedonian (马其顿语)", "br": "Breton (布列塔尼语)", "eu": "Basque (巴斯克语)", "is": "Icelandic (冰岛语)", "hy": "Armenian (亚美尼亚语)",
151
+ "ne": "Nepali (尼泊尔语)", "mn": "Mongolian (蒙古语)", "bs": "Bosnian (波斯尼亚语)", "kk": "Kazakh (哈萨克语)", "sq": "Albanian (阿尔巴尼亚语)",
152
+ "sw": "Swahili (斯瓦希里语)", "gl": "Galician (加利西亚语)", "mr": "Marathi (马拉地语)", "pa": "Punjabi (旁遮普语)", "si": "Sinhala (僧伽罗语)",
153
+ "km": "Khmer (高棉语)", "sn": "Shona (绍纳语)", "yo": "Yoruba (约鲁巴语)", "so": "Somali (索马里语)", "af": "Afrikaans (南非荷兰语)",
154
+ "oc": "Occitan (奥克语)", "ka": "Georgian (格鲁吉亚语)", "be": "Belarusian (白俄罗斯语)", "tg": "Tajik (塔吉克语)", "sd": "Sindhi (信德语)",
155
+ "gu": "Gujarati (古吉拉特语)", "am": "Amharic (阿姆哈拉语)", "yi": "Yiddish (意第绪语)", "lo": "Lao (老挝语)", "uz": "Uzbek (乌兹别克语)",
156
+ "fo": "Faroese (法罗语)", "ht": "Haitian Creole (海地克里奥尔语)", "ps": "Pashto (普什图语)", "tk": "Turkmen (土库曼语)", "nn": "Nynorsk (新挪威语)",
157
+ "mt": "Maltese (马耳他语)", "sa": "Sanskrit (梵语)", "lb": "Luxembourgish (卢森堡语)", "my": "Myanmar (Burmese) (缅甸语)", "bo": "Tibetan (藏语)",
158
+ "tl": "Tagalog (他加禄语)", "mg": "Malagasy (马尔加什语)", "as": "Assamese (阿萨姆语)", "tt": "Tatar (鞑靼语)", "haw": "Hawaiian (夏威夷语)",
159
+
160
+ "ln": "Lingala (林加拉语)", "ha": "Hausa (豪萨语)", "ba": "Bashkir (巴什基尔语)", "jw": "Javanese (爪哇语)", "su": "Sundanese (巽他语)"
161
+ }; // 修正了这里,补上了缺失的 '}'
162
+
163
+ function populateLanguages() {
164
+ for (const [code, name] of Object.entries(languages)) {
165
+ const option = document.createElement('option');
166
+ option.value = code === 'auto' ? '' : code;
167
+ option.textContent = name;
168
+ languageSelect.appendChild(option);
169
+ }
170
+ }
171
+
172
+ // File Handling Logic
173
+ dropZone.addEventListener('click', () => fileInput.click());
174
+ fileInput.addEventListener('change', (e) => handleFile(e.target.files[0]));
175
+ ['dragenter', 'dragover', 'dragleave', 'drop'].forEach(eventName => {
176
+ dropZone.addEventListener(eventName, preventDefaults, false)
177
+ });
178
+ function preventDefaults(e) {
179
+ e.preventDefault();
180
+ e.stopPropagation();
181
+ }
182
+ dropZone.addEventListener('dragenter', () => dropZone.classList.add('drag-over'));
183
+ dropZone.addEventListener('dragleave', () => dropZone.classList.remove('drag-over'));
184
+ dropZone.addEventListener('drop', (e) => {
185
+ dropZone.classList.remove('drag-over');
186
+ handleFile(e.dataTransfer.files[0]);
187
+ });
188
+
189
+ function handleFile(file) {
190
+ if (file) {
191
+ selectedFile = file;
192
+ fileNameDisplay.textContent = `已选择文件: ${file.name}`;
193
+ submitBtn.disabled = false;
194
+ }
195
+ }
196
+
197
+ // Submission Logic
198
+ submitBtn.addEventListener('click', async () => {
199
+ if (!selectedFile) return alert('请先选择一个文件!');
200
+
201
+ submitBtn.disabled = true;
202
+ submitBtn.textContent = '转录中,请稍候...';
203
+ resultContainer.style.display = 'none';
204
+ srtOutput.value = '';
205
+
206
+ const formData = new FormData();
207
+ formData.append('file', selectedFile);
208
+ formData.append('model', modelSelect.value);
209
+ if (languageSelect.value) formData.append('language', languageSelect.value);
210
+ if (promptInput.value) formData.append('prompt', promptInput.value);
211
+
212
+ try {
213
+ const response = await fetch('/v1/audio/transcriptions', { method: 'POST', body: formData });
214
+ const data = await response.json();
215
+
216
+ if (!response.ok) throw new Error(data.error || `HTTP error! status: ${response.status}`);
217
+
218
+ const srtContent = jsonToSrt(data);
219
+ srtOutput.value = srtContent;
220
+ resultContainer.style.display = 'block';
221
+ } catch (error) {
222
+ console.error('Error:', error);
223
+ alert(`转录失败: ${error.message}`);
224
+ } finally {
225
+ submitBtn.disabled = false;
226
+ submitBtn.textContent = '提交转录';
227
+ }
228
+ });
229
+
230
+ // Download Logic
231
+ downloadBtn.addEventListener('click', () => {
232
+ if (!srtOutput.value) return alert('没有内容可下载!');
233
+ const blob = new Blob([srtOutput.value], { type: 'text/srt;charset=utf-8' });
234
+ const url = URL.createObjectURL(blob);
235
+ const a = document.createElement('a');
236
+ a.href = url;
237
+ a.download = `${getTimestamp()}.srt`;
238
+ document.body.appendChild(a);
239
+ a.click();
240
+ document.body.removeChild(a);
241
+ URL.revokeObjectURL(url);
242
+ });
243
+
244
+ // Helper Functions
245
+ function jsonToSrt(data) {
246
+ return data.segments.map((segment, index) => {
247
+ const startTime = formatSrtTime(segment.start);
248
+ const endTime = formatSrtTime(segment.end);
249
+ const text = segment.text.trim();
250
+ // speaker 字段现在可能是 null,如果是,则不添加标签
251
+ const speakerTag = segment.speaker ? `[${segment.speaker}] ` : '';
252
+
253
+ return `${index + 1}\n${startTime} --> ${endTime}\n${speakerTag}${text}\n`;
254
+ }).join('\n');
255
+ }
256
+
257
+ function formatSrtTime(seconds) {
258
+ const date = new Date(0);
259
+ date.setSeconds(seconds);
260
+ return date.toISOString().substr(11, 12).replace('.', ',');
261
+ }
262
+
263
+ function getTimestamp() {
264
+ return new Date().toISOString().replace(/[-:T.]/g, '').slice(0, 14);
265
+ }
266
+
267
+ // Initialize
268
+ populateLanguages();
269
+ </script>
270
+ </body>
271
+ </html>
pyproject.toml ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "wx"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.12"
7
+ dependencies = [
8
+ "ffmpeg-python>=0.2.0",
9
+ "flask>=3.1.2",
10
+ "openai>=2.7.2",
11
+ "pydub>=0.25.1",
12
+ "waitress>=3.0.2",
13
+ "whisperx>=3.7.4",
14
+ ]
15
+
16
+ [[tool.uv.index]]
17
+ url = "https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --extra-index-url https://download.pytorch.org/whl/cpu
2
+ torch
3
+ torchaudio
4
+
5
+ ffmpeg-python>=0.2.0
6
+ flask>=3.1.2
7
+ openai>=2.7.2
8
+ pydub>=0.25.1
9
+ waitress>=3.0.2
10
+ whisperx>=3.7.4
11
+