Foradc Claude Sonnet 4.6 commited on
Commit ·
8b035dd
1
Parent(s): 737927a
docs: update README for 6-engine Voxtral release
Browse files- 6 engines (was 5), 8 pills (was 7)
- Add Voxtral row in engine table with FR★ quality note
- Add Voxtral server setup section (vLLM-Omni, HF token, narrator ref)
- Add VOXTRAL_URL env var note
- Update project structure with voxtral_server.py and make_narrator_reference.py
- Add Voxtral model in auto-download table (~8GB, gated)
- Add voxtral/mistral/vllm to GitHub topics
- Add Mistral AI and arXiv:2508.17494 credits
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
README.md
CHANGED
|
@@ -10,7 +10,7 @@ pinned: false
|
|
| 10 |
|
| 11 |
# 🎙 Boovore — Multi-Engine TTS Studio
|
| 12 |
|
| 13 |
-
**Boovore** is a self-hosted, GPU-accelerated Text-to-Speech studio with
|
| 14 |
|
| 15 |
> **Name**: Boovore = *Book* + *Devour* — built to devour books in audio.
|
| 16 |
|
|
@@ -27,6 +27,9 @@ pinned: false
|
|
| 27 |
| **F5-TTS** | ★★★★ | ⚡⚡ | French voice cloning |
|
| 28 |
| **Fish-Speech 1.5** | ★★★★★ | ⚡⚡ | Multilingual voice cloning (fishaudio) |
|
| 29 |
| **Qwen3-TTS** | ★★★★★ | ⚡ | Clone · Custom · Voice Design |
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -38,10 +41,12 @@ In your Space → **Settings → Variables and secrets**, set:
|
|
| 38 |
|---|---|---|
|
| 39 |
| `kokoro,f5` | CPU (free tier) | Kokoro · F5-TTS |
|
| 40 |
| `kokoro,f5,chatterbox` | GPU T4 (~6 GB) | + Chatterbox |
|
| 41 |
-
| `all` | GPU A10G / A100 | All
|
| 42 |
|
| 43 |
> Default is `all` — on free CPU tier, set `kokoro,f5` to avoid crashes.
|
| 44 |
|
|
|
|
|
|
|
| 45 |
---
|
| 46 |
|
| 47 |
## 🚀 Quick Start (Vast.ai / GPU server)
|
|
@@ -70,7 +75,26 @@ pip3 install -e . --no-deps
|
|
| 70 |
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir /root/fish-speech-model
|
| 71 |
```
|
| 72 |
|
| 73 |
-
### 2. Start
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
```bash
|
| 76 |
nohup python3 server.py --port 7860 >> /root/server.log 2>&1 &
|
|
@@ -88,7 +112,7 @@ ssh -p <PORT> root@<HOST> -L 7860:localhost:7860 -N
|
|
| 88 |
|
| 89 |
## 📖 Features
|
| 90 |
|
| 91 |
-
- **TTS Studio** — one-click engine selector (
|
| 92 |
- **Audiobook Generator** — import `.txt` / `.pdf` / `.epub`, auto-detect chapters, batch generate with any engine, download per chapter or merge into one WAV
|
| 93 |
- **Voice Cloning** — upload a reference audio clip (Chatterbox, F5-TTS, Fish-Speech, Qwen3)
|
| 94 |
- **Real-time metrics** — TTFA, RTF, duration, buffer
|
|
@@ -100,8 +124,11 @@ ssh -p <PORT> root@<HOST> -L 7860:localhost:7860 -N
|
|
| 100 |
## 🗂 Project Structure
|
| 101 |
|
| 102 |
```
|
| 103 |
-
server.py
|
| 104 |
-
index.html
|
|
|
|
|
|
|
|
|
|
| 105 |
requirements.txt
|
| 106 |
Dockerfile
|
| 107 |
```
|
|
@@ -126,12 +153,13 @@ Dockerfile
|
|
| 126 |
| `SWivid/F5-TTS` | ~1.2 GB | F5-TTS |
|
| 127 |
| `resemble-ai/chatterbox` | ~1.5 GB | Chatterbox |
|
| 128 |
| `fishaudio/fish-speech-1.5` | ~1.4 GB | Fish-Speech |
|
|
|
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
## 🏷️ GitHub Topics
|
| 133 |
|
| 134 |
-
`text-to-speech` `tts` `voice-cloning` `audiobook` `french-tts` `kokoro` `f5-tts` `fish-speech` `chatterbox` `qwen3` `fastapi` `cuda` `self-hosted` `gpu` `french` `multilingual`
|
| 135 |
|
| 136 |
---
|
| 137 |
|
|
@@ -142,6 +170,8 @@ Dockerfile
|
|
| 142 |
- [Chatterbox](https://github.com/resemble-ai/chatterbox) — ResembleAI
|
| 143 |
- [F5-TTS](https://github.com/SWivid/F5-TTS) — SWivid
|
| 144 |
- [Kokoro](https://github.com/hexgrad/kokoro) — hexgrad
|
|
|
|
|
|
|
| 145 |
|
| 146 |
---
|
| 147 |
|
|
|
|
| 10 |
|
| 11 |
# 🎙 Boovore — Multi-Engine TTS Studio
|
| 12 |
|
| 13 |
+
**Boovore** is a self-hosted, GPU-accelerated Text-to-Speech studio with 6 best-in-class engines and a built-in audiobook generator. Run it on any CUDA machine (tested on RTX 3090) via a clean, dark-mode web UI.
|
| 14 |
|
| 15 |
> **Name**: Boovore = *Book* + *Devour* — built to devour books in audio.
|
| 16 |
|
|
|
|
| 27 |
| **F5-TTS** | ★★★★ | ⚡⚡ | French voice cloning |
|
| 28 |
| **Fish-Speech 1.5** | ★★★★★ | ⚡⚡ | Multilingual voice cloning (fishaudio) |
|
| 29 |
| **Qwen3-TTS** | ★★★★★ | ⚡ | Clone · Custom · Voice Design |
|
| 30 |
+
| **Voxtral 4B** | ★★★★★ | ⚡⚡ | French-first, 68% win vs ElevenLabs (Mistral AI) |
|
| 31 |
+
|
| 32 |
+
> **Voxtral** uses vLLM-Omni (`mistralai/Voxtral-4B-TTS-2603`) with voice cloning via a reference WAV. Start it separately with `python3 voxtral_server.py`.
|
| 33 |
|
| 34 |
---
|
| 35 |
|
|
|
|
| 41 |
|---|---|---|
|
| 42 |
| `kokoro,f5` | CPU (free tier) | Kokoro · F5-TTS |
|
| 43 |
| `kokoro,f5,chatterbox` | GPU T4 (~6 GB) | + Chatterbox |
|
| 44 |
+
| `all` | GPU A10G / A100 | All engines + Qwen3 |
|
| 45 |
|
| 46 |
> Default is `all` — on free CPU tier, set `kokoro,f5` to avoid crashes.
|
| 47 |
|
| 48 |
+
For **Voxtral**, also set `VOXTRAL_URL` to point to your vLLM-Omni server (default: `http://localhost:8000`).
|
| 49 |
+
|
| 50 |
---
|
| 51 |
|
| 52 |
## 🚀 Quick Start (Vast.ai / GPU server)
|
|
|
|
| 75 |
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir /root/fish-speech-model
|
| 76 |
```
|
| 77 |
|
| 78 |
+
### 2. (Optional) Start Voxtral TTS server
|
| 79 |
+
|
| 80 |
+
Voxtral requires a separate vLLM-Omni process (~8 GB VRAM). Needs a HuggingFace token — accept the CC BY-NC license at [mistralai/Voxtral-4B-TTS-2603](https://huggingface.co/mistralai/Voxtral-4B-TTS-2603) first.
|
| 81 |
+
|
| 82 |
+
```bash
|
| 83 |
+
pip install "vllm[audio]>=0.18.0" httpx soundfile
|
| 84 |
+
export HF_TOKEN=hf_xxxx
|
| 85 |
+
nohup python3 voxtral_server.py >> /root/voxtral.log 2>&1 &
|
| 86 |
+
# Wait 5-10 min for model download + load (first run only)
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
Optionally generate a narrator reference WAV (for voice cloning):
|
| 90 |
+
|
| 91 |
+
```bash
|
| 92 |
+
# While the Qwen3 server is running:
|
| 93 |
+
python3 make_narrator_reference.py
|
| 94 |
+
# Output: /workspace/narrator_reference.wav
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### 3. Start the main server
|
| 98 |
|
| 99 |
```bash
|
| 100 |
nohup python3 server.py --port 7860 >> /root/server.log 2>&1 &
|
|
|
|
| 112 |
|
| 113 |
## 📖 Features
|
| 114 |
|
| 115 |
+
- **TTS Studio** — one-click engine selector (8 pills), single generate button
|
| 116 |
- **Audiobook Generator** — import `.txt` / `.pdf` / `.epub`, auto-detect chapters, batch generate with any engine, download per chapter or merge into one WAV
|
| 117 |
- **Voice Cloning** — upload a reference audio clip (Chatterbox, F5-TTS, Fish-Speech, Qwen3)
|
| 118 |
- **Real-time metrics** — TTFA, RTF, duration, buffer
|
|
|
|
| 124 |
## 🗂 Project Structure
|
| 125 |
|
| 126 |
```
|
| 127 |
+
server.py — FastAPI backend (6 engines)
|
| 128 |
+
index.html — UI single-page (vanilla JS, no frontend deps)
|
| 129 |
+
voxtral_server.py — vLLM-Omni server manager (start/stop/status)
|
| 130 |
+
make_narrator_reference.py — Generate narrator reference WAV via Qwen3
|
| 131 |
+
narrator_reference.wav — (generated) voice clone reference for Voxtral
|
| 132 |
requirements.txt
|
| 133 |
Dockerfile
|
| 134 |
```
|
|
|
|
| 153 |
| `SWivid/F5-TTS` | ~1.2 GB | F5-TTS |
|
| 154 |
| `resemble-ai/chatterbox` | ~1.5 GB | Chatterbox |
|
| 155 |
| `fishaudio/fish-speech-1.5` | ~1.4 GB | Fish-Speech |
|
| 156 |
+
| `mistralai/Voxtral-4B-TTS-2603` | ~8 GB (BF16) | Voxtral (gated — HF token required) |
|
| 157 |
|
| 158 |
---
|
| 159 |
|
| 160 |
## 🏷️ GitHub Topics
|
| 161 |
|
| 162 |
+
`text-to-speech` `tts` `voice-cloning` `audiobook` `french-tts` `kokoro` `f5-tts` `fish-speech` `chatterbox` `qwen3` `voxtral` `mistral` `vllm` `fastapi` `cuda` `self-hosted` `gpu` `french` `multilingual`
|
| 163 |
|
| 164 |
---
|
| 165 |
|
|
|
|
| 170 |
- [Chatterbox](https://github.com/resemble-ai/chatterbox) — ResembleAI
|
| 171 |
- [F5-TTS](https://github.com/SWivid/F5-TTS) — SWivid
|
| 172 |
- [Kokoro](https://github.com/hexgrad/kokoro) — hexgrad
|
| 173 |
+
- [Voxtral](https://mistral.ai) — Mistral AI (`mistralai/Voxtral-4B-TTS-2603`, CC BY-NC)
|
| 174 |
+
- French prosody preprocessing inspired by [arXiv:2508.17494](https://arxiv.org/abs/2508.17494)
|
| 175 |
|
| 176 |
---
|
| 177 |
|