owlninjam commited on
Commit
21b59b8
Β·
verified Β·
1 Parent(s): 46c731b

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +42 -0
  2. README.md +45 -10
  3. app.py +492 -0
  4. requirements.txt +8 -0
Dockerfile ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ # System dependencies
4
+ RUN apt-get update && \
5
+ apt-get install -y --no-install-recommends \
6
+ ffmpeg \
7
+ git \
8
+ build-essential \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Install numpy FIRST (pkuseg needs it at build time)
12
+ RUN pip install --no-cache-dir numpy==1.25.2
13
+
14
+ # Install chatterbox-tts (now pkuseg can build because numpy is available)
15
+ # Using --no-build-isolation so pkuseg's setup.py can see the installed numpy
16
+ RUN pip install --no-cache-dir --no-build-isolation chatterbox-tts
17
+
18
+ # Install remaining dependencies
19
+ RUN pip install --no-cache-dir \
20
+ torch \
21
+ torchaudio \
22
+ soundfile \
23
+ pydub \
24
+ fastapi \
25
+ uvicorn \
26
+ gradio==5.31.0
27
+
28
+ # Create non-root user (required by HF Spaces)
29
+ RUN useradd -m -u 1000 user
30
+ ENV HOME=/home/user \
31
+ PATH=/home/user/.local/bin:$PATH
32
+
33
+ WORKDIR /home/user/app
34
+
35
+ # Copy application
36
+ COPY --chown=user app.py .
37
+
38
+ USER user
39
+
40
+ EXPOSE 7860
41
+
42
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,10 +1,45 @@
1
- ---
2
- title: Chatterbox Tts
3
- emoji: πŸš€
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Chatterbox TTS API
3
+ emoji: πŸŽ™οΈ
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: mit
10
+ ---
11
+
12
+ # Chatterbox TTS API
13
+
14
+ A free, CPU-powered TTS service with **voice cloning** and an **OpenAI-compatible API**.
15
+
16
+ ## Features
17
+ - 🎀 **Voice Cloning** β€” clone any voice from a ~10s reference clip
18
+ - πŸ”Œ **OpenAI-Compatible API** β€” drop-in replacement at `/v1/audio/speech`
19
+ - 🌊 **Streaming** β€” chunked audio streaming for faster time-to-first-byte
20
+ - πŸ†“ **Free** β€” runs on HF Spaces CPU tier
21
+
22
+ ## API Usage
23
+
24
+ ```bash
25
+ # Basic TTS
26
+ curl -X POST https://YOUR-SPACE.hf.space/v1/audio/speech \
27
+ -H "Content-Type: application/json" \
28
+ -d '{"model":"chatterbox","input":"Hello world!","voice":"default"}' \
29
+ --output speech.wav
30
+
31
+ # Voice cloning (multipart)
32
+ curl -X POST https://YOUR-SPACE.hf.space/v1/audio/speech \
33
+ -F 'request={"model":"chatterbox","input":"Hello!","voice":"clone"};type=application/json' \
34
+ -F "file=@reference.wav" \
35
+ --output cloned.wav
36
+ ```
37
+
38
+ ## OpenAI SDK
39
+
40
+ ```python
41
+ from openai import OpenAI
42
+ client = OpenAI(base_url="https://YOUR-SPACE.hf.space/v1", api_key="not-needed")
43
+ response = client.audio.speech.create(model="chatterbox", voice="default", input="Hello!")
44
+ response.stream_to_file("output.wav")
45
+ ```
app.py ADDED
@@ -0,0 +1,492 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Chatterbox TTS β€” HF Space with Gradio UI + OpenAI-Compatible API
3
+ Supports voice cloning and chunked streaming on CPU.
4
+ """
5
+
6
+ import io
7
+ import os
8
+ import re
9
+ import json
10
+ import tempfile
11
+ import logging
12
+ from typing import Optional
13
+
14
+ import torch
15
+ import torchaudio as ta
16
+ import soundfile as sf
17
+ import numpy as np
18
+ import gradio as gr
19
+ from fastapi import FastAPI, UploadFile, File, Form, Request, HTTPException
20
+ from fastapi.responses import StreamingResponse, Response
21
+ from pydub import AudioSegment
22
+
23
+ # ---------------------------------------------------------------------------
24
+ # Logging
25
+ # ---------------------------------------------------------------------------
26
+ logging.basicConfig(level=logging.INFO)
27
+ logger = logging.getLogger("chatterbox-tts")
28
+
29
+ # ---------------------------------------------------------------------------
30
+ # Global model (loaded once at startup)
31
+ # ---------------------------------------------------------------------------
32
+ MODEL = None
33
+ MODEL_NAME = "chatterbox"
34
+ DEVICE = "cpu"
35
+
36
+
37
+ def get_model():
38
+ """Lazy-load the Chatterbox model."""
39
+ global MODEL
40
+ if MODEL is None:
41
+ logger.info("Loading Chatterbox TTS model on CPU β€” this may take 30-60s on first run...")
42
+ try:
43
+ # Try Turbo first (faster, 350M, 1-step decoder)
44
+ from chatterbox.tts_turbo import ChatterboxTurboTTS
45
+ MODEL = ChatterboxTurboTTS.from_pretrained(device=DEVICE)
46
+ logger.info("Loaded ChatterboxTurboTTS (350M) successfully.")
47
+ except Exception as e:
48
+ logger.warning(f"Turbo model failed ({e}), falling back to standard ChatterboxTTS...")
49
+ from chatterbox.tts import ChatterboxTTS
50
+ MODEL = ChatterboxTTS.from_pretrained(device=DEVICE)
51
+ logger.info("Loaded ChatterboxTTS (standard) successfully.")
52
+ return MODEL
53
+
54
+
55
+ # ---------------------------------------------------------------------------
56
+ # Audio helpers
57
+ # ---------------------------------------------------------------------------
58
+ def wav_tensor_to_bytes(wav: torch.Tensor, sr: int, fmt: str = "wav") -> bytes:
59
+ """Convert a waveform tensor to audio bytes in the requested format."""
60
+ # Ensure 2D: (channels, samples)
61
+ if wav.dim() == 1:
62
+ wav = wav.unsqueeze(0)
63
+
64
+ buf = io.BytesIO()
65
+ ta.save(buf, wav, sr, format="wav")
66
+ buf.seek(0)
67
+
68
+ if fmt == "wav":
69
+ return buf.read()
70
+ elif fmt == "mp3":
71
+ audio_seg = AudioSegment.from_wav(buf)
72
+ mp3_buf = io.BytesIO()
73
+ audio_seg.export(mp3_buf, format="mp3")
74
+ mp3_buf.seek(0)
75
+ return mp3_buf.read()
76
+ elif fmt == "opus":
77
+ audio_seg = AudioSegment.from_wav(buf)
78
+ opus_buf = io.BytesIO()
79
+ audio_seg.export(opus_buf, format="opus")
80
+ opus_buf.seek(0)
81
+ return opus_buf.read()
82
+ elif fmt == "flac":
83
+ audio_seg = AudioSegment.from_wav(buf)
84
+ flac_buf = io.BytesIO()
85
+ audio_seg.export(flac_buf, format="flac")
86
+ flac_buf.seek(0)
87
+ return flac_buf.read()
88
+ else:
89
+ return buf.read()
90
+
91
+
92
+ def split_into_sentences(text: str) -> list[str]:
93
+ """Split text into sentences for chunked streaming."""
94
+ # Split on sentence-ending punctuation followed by space or end
95
+ parts = re.split(r'(?<=[.!?])\s+', text.strip())
96
+ # Merge very short fragments with their predecessor
97
+ merged = []
98
+ for p in parts:
99
+ p = p.strip()
100
+ if not p:
101
+ continue
102
+ if merged and len(merged[-1]) < 20:
103
+ merged[-1] = merged[-1] + " " + p
104
+ else:
105
+ merged.append(p)
106
+ return merged if merged else [text]
107
+
108
+
109
+ MIME_TYPES = {
110
+ "wav": "audio/wav",
111
+ "mp3": "audio/mpeg",
112
+ "opus": "audio/opus",
113
+ "flac": "audio/flac",
114
+ }
115
+
116
+
117
+ # ---------------------------------------------------------------------------
118
+ # Core TTS generation
119
+ # ---------------------------------------------------------------------------
120
+ def generate_speech(
121
+ text: str,
122
+ ref_audio_path: Optional[str] = None,
123
+ response_format: str = "wav",
124
+ stream: bool = False,
125
+ ):
126
+ """
127
+ Generate speech from text. Optionally clone voice from ref_audio_path.
128
+ If stream=True, yields audio chunks per sentence.
129
+ """
130
+ model = get_model()
131
+
132
+ if stream:
133
+ sentences = split_into_sentences(text)
134
+ for sentence in sentences:
135
+ logger.info(f"Generating chunk: {sentence[:50]}...")
136
+ if ref_audio_path:
137
+ wav = model.generate(sentence, audio_prompt_path=ref_audio_path)
138
+ else:
139
+ wav = model.generate(sentence)
140
+ chunk_bytes = wav_tensor_to_bytes(wav, model.sr, response_format)
141
+ yield chunk_bytes
142
+ else:
143
+ logger.info(f"Generating full: {text[:80]}...")
144
+ if ref_audio_path:
145
+ wav = model.generate(text, audio_prompt_path=ref_audio_path)
146
+ else:
147
+ wav = model.generate(text)
148
+ yield wav_tensor_to_bytes(wav, model.sr, response_format)
149
+
150
+
151
+ # ---------------------------------------------------------------------------
152
+ # FastAPI β€” OpenAI-compatible /v1/audio/speech
153
+ # ---------------------------------------------------------------------------
154
+ api_app = FastAPI(title="Chatterbox TTS API", version="1.0.0")
155
+
156
+
157
+ @api_app.get("/v1/models")
158
+ async def list_models():
159
+ """OpenAI-compatible model listing."""
160
+ return {
161
+ "object": "list",
162
+ "data": [
163
+ {
164
+ "id": "chatterbox",
165
+ "object": "model",
166
+ "created": 1700000000,
167
+ "owned_by": "resemble-ai",
168
+ },
169
+ {
170
+ "id": "chatterbox-turbo",
171
+ "object": "model",
172
+ "created": 1700000000,
173
+ "owned_by": "resemble-ai",
174
+ },
175
+ ],
176
+ }
177
+
178
+
179
+ @api_app.post("/v1/audio/speech")
180
+ async def openai_tts(request: Request):
181
+ """
182
+ OpenAI-compatible TTS endpoint.
183
+
184
+ Accepts either:
185
+ 1. JSON body: {"model": "chatterbox", "input": "text", "voice": "default"}
186
+ 2. Multipart form: model, input, voice fields + optional 'file' for voice cloning
187
+
188
+ voice="clone" + file upload = voice cloning
189
+ voice="default" (or anything else) = default voice
190
+ """
191
+ content_type = request.headers.get("content-type", "")
192
+ ref_audio_path = None
193
+ tmp_file = None
194
+
195
+ try:
196
+ if "multipart/form-data" in content_type:
197
+ # Parse multipart β€” could have JSON part + file
198
+ form = await request.form()
199
+
200
+ # Check if there's a 'request' JSON field (for combined JSON+file uploads)
201
+ if "request" in form:
202
+ try:
203
+ params = json.loads(form["request"])
204
+ except (json.JSONDecodeError, TypeError):
205
+ params = {}
206
+ model = params.get("model", "chatterbox")
207
+ text = params.get("input", "")
208
+ voice = params.get("voice", "default")
209
+ response_format = params.get("response_format", "wav")
210
+ else:
211
+ model = form.get("model", "chatterbox")
212
+ text = form.get("input", "")
213
+ voice = form.get("voice", "default")
214
+ response_format = form.get("response_format", "wav")
215
+
216
+ # Handle file upload for voice cloning
217
+ file_field = form.get("file")
218
+ if file_field and hasattr(file_field, "read"):
219
+ tmp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
220
+ content = await file_field.read()
221
+ tmp_file.write(content)
222
+ tmp_file.flush()
223
+ ref_audio_path = tmp_file.name
224
+ voice = "clone"
225
+
226
+ elif "application/json" in content_type:
227
+ body = await request.json()
228
+ model = body.get("model", "chatterbox")
229
+ text = body.get("input", "")
230
+ voice = body.get("voice", "default")
231
+ response_format = body.get("response_format", "wav")
232
+ else:
233
+ # Try JSON anyway
234
+ try:
235
+ body = await request.json()
236
+ model = body.get("model", "chatterbox")
237
+ text = body.get("input", "")
238
+ voice = body.get("voice", "default")
239
+ response_format = body.get("response_format", "wav")
240
+ except Exception:
241
+ raise HTTPException(status_code=400, detail="Unsupported content type. Use application/json or multipart/form-data.")
242
+
243
+ if not text:
244
+ raise HTTPException(status_code=400, detail="'input' field is required.")
245
+
246
+ if response_format not in MIME_TYPES:
247
+ response_format = "wav"
248
+
249
+ mime = MIME_TYPES[response_format]
250
+
251
+ # Determine if voice cloning
252
+ use_clone = voice == "clone" and ref_audio_path is not None
253
+
254
+ # Check if streaming is beneficial (multiple sentences)
255
+ sentences = split_into_sentences(text)
256
+ use_streaming = len(sentences) > 1
257
+
258
+ if use_streaming:
259
+ def audio_stream():
260
+ try:
261
+ for chunk in generate_speech(
262
+ text,
263
+ ref_audio_path=ref_audio_path if use_clone else None,
264
+ response_format=response_format,
265
+ stream=True,
266
+ ):
267
+ yield chunk
268
+ finally:
269
+ if tmp_file and os.path.exists(tmp_file.name):
270
+ os.unlink(tmp_file.name)
271
+
272
+ return StreamingResponse(
273
+ audio_stream(),
274
+ media_type=mime,
275
+ headers={
276
+ "Content-Disposition": f"attachment; filename=speech.{response_format}",
277
+ "Transfer-Encoding": "chunked",
278
+ },
279
+ )
280
+ else:
281
+ # Single chunk β€” return directly
282
+ try:
283
+ audio_bytes = b""
284
+ for chunk in generate_speech(
285
+ text,
286
+ ref_audio_path=ref_audio_path if use_clone else None,
287
+ response_format=response_format,
288
+ stream=False,
289
+ ):
290
+ audio_bytes += chunk
291
+ finally:
292
+ if tmp_file and os.path.exists(tmp_file.name):
293
+ os.unlink(tmp_file.name)
294
+
295
+ return Response(
296
+ content=audio_bytes,
297
+ media_type=mime,
298
+ headers={
299
+ "Content-Disposition": f"attachment; filename=speech.{response_format}",
300
+ },
301
+ )
302
+
303
+ except HTTPException:
304
+ raise
305
+ except Exception as e:
306
+ logger.error(f"TTS generation failed: {e}", exc_info=True)
307
+ if tmp_file and os.path.exists(tmp_file.name):
308
+ os.unlink(tmp_file.name)
309
+ raise HTTPException(status_code=500, detail=f"TTS generation failed: {str(e)}")
310
+
311
+
312
+ # ---------------------------------------------------------------------------
313
+ # Gradio UI
314
+ # ---------------------------------------------------------------------------
315
+ def gradio_tts(text: str, ref_audio, response_format: str = "wav"):
316
+ """Gradio handler for TTS generation with optional voice cloning."""
317
+ if not text or not text.strip():
318
+ return None
319
+
320
+ ref_path = None
321
+ if ref_audio is not None:
322
+ ref_path = ref_audio # Gradio gives us a file path
323
+
324
+ model = get_model()
325
+
326
+ logger.info(f"Gradio TTS: text={text[:60]}..., clone={ref_path is not None}")
327
+
328
+ if ref_path:
329
+ wav = model.generate(text, audio_prompt_path=ref_path)
330
+ else:
331
+ wav = model.generate(text)
332
+
333
+ # Save to temp file for Gradio audio output
334
+ if wav.dim() == 1:
335
+ wav = wav.unsqueeze(0)
336
+
337
+ tmp = tempfile.NamedTemporaryFile(suffix=f".{response_format}", delete=False)
338
+ ta.save(tmp.name, wav, model.sr, format="wav")
339
+
340
+ if response_format != "wav":
341
+ audio_seg = AudioSegment.from_wav(tmp.name)
342
+ out_path = tmp.name.replace(".wav", f".{response_format}")
343
+ audio_seg.export(out_path, format=response_format)
344
+ os.unlink(tmp.name)
345
+ return out_path
346
+
347
+ return tmp.name
348
+
349
+
350
+ # Build Gradio interface
351
+ with gr.Blocks(
352
+ title="πŸŽ™οΈ Chatterbox TTS",
353
+ theme=gr.themes.Soft(
354
+ primary_hue="purple",
355
+ secondary_hue="blue",
356
+ ),
357
+ ) as demo:
358
+ gr.Markdown(
359
+ """
360
+ # πŸŽ™οΈ Chatterbox TTS
361
+ ### Free, open-source text-to-speech with voice cloning
362
+ *Powered by [Resemble AI Chatterbox](https://github.com/resemble-ai/chatterbox) β€” MIT Licensed*
363
+ """
364
+ )
365
+
366
+ with gr.Tabs():
367
+ # ---- Tab 1: TTS ----
368
+ with gr.TabItem("πŸ—£οΈ Text to Speech"):
369
+ with gr.Row():
370
+ with gr.Column(scale=3):
371
+ text_input = gr.Textbox(
372
+ label="Text",
373
+ placeholder="Type or paste your text here...",
374
+ lines=5,
375
+ max_lines=20,
376
+ )
377
+ ref_audio_input = gr.Audio(
378
+ label="🎀 Reference Audio (optional β€” for voice cloning)",
379
+ type="filepath",
380
+ sources=["upload", "microphone"],
381
+ )
382
+ with gr.Row():
383
+ format_dropdown = gr.Dropdown(
384
+ choices=["wav", "mp3"],
385
+ value="wav",
386
+ label="Output Format",
387
+ )
388
+ generate_btn = gr.Button(
389
+ "πŸ”Š Generate Speech",
390
+ variant="primary",
391
+ size="lg",
392
+ )
393
+
394
+ with gr.Column(scale=2):
395
+ audio_output = gr.Audio(
396
+ label="Generated Audio",
397
+ type="filepath",
398
+ )
399
+
400
+ generate_btn.click(
401
+ fn=gradio_tts,
402
+ inputs=[text_input, ref_audio_input, format_dropdown],
403
+ outputs=[audio_output],
404
+ )
405
+
406
+ gr.Examples(
407
+ examples=[
408
+ ["Hello! This is Chatterbox TTS running on a free Hugging Face Space. Pretty cool, right?", None, "wav"],
409
+ ["The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs.", None, "wav"],
410
+ ["I can't believe it worked! [laugh] This is absolutely amazing.", None, "wav"],
411
+ ],
412
+ inputs=[text_input, ref_audio_input, format_dropdown],
413
+ outputs=[audio_output],
414
+ fn=gradio_tts,
415
+ cache_examples=False,
416
+ )
417
+
418
+ # ---- Tab 2: API Docs ----
419
+ with gr.TabItem("πŸ”Œ API"):
420
+ gr.Markdown(
421
+ """
422
+ ## OpenAI-Compatible API
423
+
424
+ This Space exposes an OpenAI-compatible `/v1/audio/speech` endpoint.
425
+
426
+ ### Base URL
427
+ ```
428
+ https://YOUR-SPACE-NAME.hf.space/v1
429
+ ```
430
+
431
+ ---
432
+
433
+ ### Basic TTS (JSON)
434
+ ```bash
435
+ curl -X POST https://YOUR-SPACE.hf.space/v1/audio/speech \\
436
+ -H "Content-Type: application/json" \\
437
+ -d '{"model":"chatterbox","input":"Hello world!","voice":"default","response_format":"wav"}' \\
438
+ --output speech.wav
439
+ ```
440
+
441
+ ### Voice Cloning (Multipart)
442
+ ```bash
443
+ curl -X POST https://YOUR-SPACE.hf.space/v1/audio/speech \\
444
+ -F 'request={"model":"chatterbox","input":"Hello!","voice":"clone"};type=application/json' \\
445
+ -F "file=@your_reference.wav" \\
446
+ --output cloned.wav
447
+ ```
448
+
449
+ ### OpenAI Python SDK
450
+ ```python
451
+ from openai import OpenAI
452
+
453
+ client = OpenAI(
454
+ base_url="https://YOUR-SPACE.hf.space/v1",
455
+ api_key="not-needed"
456
+ )
457
+
458
+ # Default voice
459
+ response = client.audio.speech.create(
460
+ model="chatterbox",
461
+ voice="default",
462
+ input="Hello, this is a test!",
463
+ response_format="wav"
464
+ )
465
+ response.stream_to_file("output.wav")
466
+ ```
467
+
468
+ ### Streaming
469
+ Multi-sentence inputs are automatically streamed sentence-by-sentence
470
+ for faster time-to-first-byte.
471
+
472
+ ### Parameters
473
+ | Parameter | Type | Required | Description |
474
+ |---|---|---|---|
475
+ | `model` | string | βœ… | `"chatterbox"` or `"chatterbox-turbo"` |
476
+ | `input` | string | βœ… | Text to synthesize |
477
+ | `voice` | string | βœ… | `"default"` or `"clone"` |
478
+ | `response_format` | string | ❌ | `"wav"` (default), `"mp3"`, `"opus"`, `"flac"` |
479
+ | `file` | binary | ❌ | Reference audio for cloning (multipart only) |
480
+
481
+ ---
482
+ *⚑ Running on CPU β€” expect 5-15s per sentence. Multi-sentence inputs stream chunks as they're ready.*
483
+ """
484
+ )
485
+
486
+ # Mount FastAPI + Gradio together
487
+ app = gr.mount_gradio_app(api_app, demo, path="/")
488
+
489
+ # For local development
490
+ if __name__ == "__main__":
491
+ import uvicorn
492
+ uvicorn.run(app, host="0.0.0.0", port=7860)
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ numpy
2
+ chatterbox-tts
3
+ torch
4
+ torchaudio
5
+ soundfile
6
+ pydub
7
+ fastapi
8
+ uvicorn