| # BoomConnex Voice Studio β Backend |
|
|
| Single FastAPI app that hosts three TTS endpoints and serves the React SPA on |
| the same origin. Designed for one HuggingFace Space on a dedicated GPU. |
|
|
| ## Endpoints |
|
|
| | Method | Path | Purpose | |
| | ------ | ----------------------------- | ---------------------------------------------- | |
| | POST | `/api/voice-clone` | OmniVoice voice clone (LavaSR pre-enhances ref)| |
| | POST | `/api/voice-design` | OmniVoice voice design (LavaSR post-enhance) | |
| | POST | `/api/emotion-tts` | Chatterbox emotional TTS | |
| | GET | `/api/health` | Which models are loaded | |
| | GET | `/api/languages` | OmniVoice language list | |
| | GET | `/api/voice-design/options` | Voice Design dropdown taxonomy | |
| | GET | `/`, `/emotion`, `/design` | React SPA (with client-side routing) | |
|
|
| All generation endpoints accept `multipart/form-data` and return |
| `audio/wav` (PCM 16-bit). See `main.py` for the full form-field list. |
|
|
| ## Local development |
|
|
| Two processes β backend on `:7860`, frontend on `:8080` with a Vite proxy: |
|
|
| ```bash |
| # Terminal 1 β backend |
| cd backend |
| pip install -r requirements.txt |
| python main.py |
| |
| # Terminal 2 β frontend (sibling repo) |
| cd remix-of-voicecraft-studio-main |
| npm install |
| npm run dev |
| ``` |
|
|
| The Vite config proxies `/api/*` to `http://localhost:7860`, so the |
| frontend talks to the real backend with no CORS gymnastics. |
|
|
| To skip loading models locally (so the server starts fast for UI work): |
|
|
| ```bash |
| LOAD_OMNIVOICE=0 LOAD_LAVASR=0 LOAD_CHATTERBOX=0 python main.py |
| ``` |
|
|
| ## Production / HuggingFace Space |
|
|
| The Space repo layout must be: |
|
|
| ``` |
| . |
| βββ main.py |
| βββ omnivoice/ # vendored from Voice-Cloning/omnivoice |
| βββ frontend/ # = remix-of-voicecraft-studio-main copied here |
| βββ requirements.txt |
| βββ Dockerfile |
| βββ README.md |
| ``` |
|
|
| Build & run locally to mirror Space: |
|
|
| ```bash |
| # from backend/, with frontend/ next to it |
| docker build -t voice-studio . |
| docker run --rm --gpus all -p 7860:7860 voice-studio |
| ``` |
|
|
| ## Configuration (env vars) |
|
|
| | Variable | Default | Notes | |
| | ---------------------- | ---------------------- | ------------------------------------ | |
| | `OMNIVOICE_CHECKPOINT` | `k2-fsa/OmniVoice` | HF repo id or local path | |
| | `LAVASR_CHECKPOINT` | `YatharthS/LavaSR` | HF repo id or local path | |
| | `LOAD_OMNIVOICE` | `1` | `0` to skip | |
| | `LOAD_LAVASR` | `1` | `0` to skip | |
| | `LOAD_CHATTERBOX` | `1` | `0` to skip | |
| | `LOAD_ASR` | `1` | OmniVoice's Whisper for ref-text auto| |
| | `STATIC_DIR` | `static` | Where the built SPA lives | |
| | `HOST` / `PORT` | `0.0.0.0` / `7860` | | |
|
|