--- title: Voice Clone Bench (Chatterbox) emoji: ๐ŸŽ™๏ธ colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.29.0 app_file: app.py pinned: false short_description: Zero-shot voice cloning + TTS to A/B against ElevenLabs --- # Voice Clone Bench โ€” Chatterbox Multilingual (zero-shot voice cloning) A standalone prototype for A/B testing open-weight **voice cloning + TTS** against ElevenLabs. Powered by **[Chatterbox Multilingual](https://huggingface.co/ResembleAI/chatterbox)** (Resemble AI, MIT license), which beats ElevenLabs in independent blind preference tests. ## How to use (manual A/B) 1. Upload a **reference audio** clip of the voice to clone (5โ€“20 s of clean speech is ideal). 2. (Optional) Tick **๐Ÿงน Remove background audio from reference** to isolate the voice (HT-Demucs) before cloning if the clip has music/noise. Use **Preview cleaned reference** to hear the isolated result first. 3. Pick the **language** (default: English). 4. Type the **text** to speak (long scripts are auto-chunked at sentence boundaries). 5. Click **Clone & Speak** โ†’ you get audio in the cloned voice. Tip: leave the reference empty to hear a built-in sample voice for the selected language. ### Cloning defaults (tuned for faithful cloning) Tuned for **speaker similarity**, not expressiveness: `exaggeration=0.4` (neutral), `cfg_weight=0.5` (balanced; ~0.3 faster pace, 0.0 cross-lingual), `temperature=0.7` (consistent). All knobs are exposed as sliders. ## API (for bot integration later) Gradio exposes a programmatic endpoint named **`clone`** (plus **`isolate_voice`** for standalone background-audio removal): ```python from gradio_client import Client, handle_file client = Client("ZeroPointMonkey/voice-clone-bench") sr_path = client.predict( text="Hey, it's good to finally hear your voice.", language_id="en", audio_prompt_path=handle_file("reference.wav"), exaggeration=0.4, cfg_weight=0.5, temperature=0.7, seed=0, clean_reference=False, # True = strip background music/noise first repetition_penalty=2.0, min_p=0.05, top_p=1.0, api_name="/clone", ) print(sr_path) # path to generated wav # Just clean a reference clip (returns isolated-voice wav): cleaned = client.predict(handle_file("noisy_reference.wav"), api_name="/isolate_voice") ``` ## Notes - Hardware: ZeroGPU (`zero-a10g`). Outputs are PerTh-watermarked by the model. - License: model weights are **MIT** (Resemble AI / Chatterbox) โ€” free for commercial use.