Commit ·
3fb876e
1
Parent(s): e28a050
Add walkthrough screenshots and sample audio to
Browse filesREADME
Four screenshots walk through the full flow
(prompt -> script+voices -> generating -> complete)
plus a sample WAV from the Wizard/Orc/Mom demo
prompt. Also updates repo-layout and features list
to mention gender-aware casting and voice preview.
README.md
CHANGED
|
@@ -29,6 +29,54 @@ Generate realistic multi-speaker conference calls, meetings, and podcasts from a
|
|
| 29 |
- **Editable turn-by-turn script** — tweak speaker assignments or dialogue before rendering
|
| 30 |
- **Title generation** — the LLM names each script automatically
|
| 31 |
- **Two model sizes** — VibeVoice-1.5B (fast) and VibeVoice-7B (higher quality)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
@@ -76,14 +124,14 @@ This project separates the lightweight Gradio frontend (hosted on HF Spaces) fro
|
|
| 76 |
|
| 77 |
## Voices
|
| 78 |
|
| 79 |
-
| Voice
|
| 80 |
-
|-------|:----
|
| 81 |
-
| Cherry
|
| 82 |
-
| Chicago
|
| 83 |
-
| Janus
|
| 84 |
-
| Mantis
|
| 85 |
-
| Sponge
|
| 86 |
-
| Starchild |
|
| 87 |
|
| 88 |
Voice samples live in `public/voices/` and are loaded as short reference clips by the VibeVoice backend.
|
| 89 |
|
|
@@ -106,6 +154,7 @@ python app.py
|
|
| 106 |
```
|
| 107 |
|
| 108 |
Required env:
|
|
|
|
| 109 |
- `HF_TOKEN` — Hugging Face token with Inference API access
|
| 110 |
|
| 111 |
---
|
|
@@ -117,9 +166,11 @@ Required env:
|
|
| 117 |
├── app.py # Gradio frontend + script generation
|
| 118 |
├── requirements.txt # gradio, modal, huggingface_hub
|
| 119 |
├── public/
|
| 120 |
-
│ ├── images/
|
| 121 |
-
│
|
| 122 |
-
|
|
|
|
|
|
|
| 123 |
└── README.md
|
| 124 |
```
|
| 125 |
|
|
@@ -134,4 +185,4 @@ Required env:
|
|
| 134 |
|
| 135 |
---
|
| 136 |
|
| 137 |
-
<sub>HF Spaces configuration reference: https://huggingface.co/docs/hub/spaces-config-reference</sub>
|
|
|
|
| 29 |
- **Editable turn-by-turn script** — tweak speaker assignments or dialogue before rendering
|
| 30 |
- **Title generation** — the LLM names each script automatically
|
| 31 |
- **Two model sizes** — VibeVoice-1.5B (fast) and VibeVoice-7B (higher quality)
|
| 32 |
+
- **Gender-aware voice casting** — female characters get female voices automatically (Mom → Cherry, Wizard → Chicago, etc.) with one-click override
|
| 33 |
+
- **Voice preview** — sample any of the 6 voices before committing to a long generation
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## Walkthrough
|
| 38 |
+
|
| 39 |
+
### 1. Describe your scenario
|
| 40 |
+
|
| 41 |
+
Type any scenario — a meeting, podcast, argument, TED talk — and the LLM writes the full script.
|
| 42 |
+
|
| 43 |
+
<p align="center">
|
| 44 |
+
<img src="public/images/Screenshot-1.png" alt="Step 1: Prompt input" width="100%"/>
|
| 45 |
+
</p>
|
| 46 |
+
|
| 47 |
+
### 2. Review the script and pick voices
|
| 48 |
+
|
| 49 |
+
Speaker tags auto-assign by gender. Every voice dropdown stays in sync with the tags above. Preview any voice before generating.
|
| 50 |
+
|
| 51 |
+
<p align="center">
|
| 52 |
+
<img src="public/images/Screenshot-2.png" alt="Step 2: Script editor with voice sync" width="100%"/>
|
| 53 |
+
</p>
|
| 54 |
+
|
| 55 |
+
### 3. Generate the audio
|
| 56 |
+
|
| 57 |
+
Kick off the GPU job on Modal. A funny parody narration keeps you entertained during the wait.
|
| 58 |
+
|
| 59 |
+
<p align="center">
|
| 60 |
+
<img src="public/images/Screenshot-3.png" alt="Step 3: Generating" width="85%"/>
|
| 61 |
+
</p>
|
| 62 |
+
|
| 63 |
+
### 4. Listen and download
|
| 64 |
+
|
| 65 |
+
Full-length multi-speaker audio, ready to play or download as a WAV.
|
| 66 |
+
|
| 67 |
+
<p align="center">
|
| 68 |
+
<img src="public/images/Screenshot-4.png" alt="Step 4: Complete" width="85%"/>
|
| 69 |
+
</p>
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## Sample output
|
| 74 |
+
|
| 75 |
+
A 3-speaker example generated from the prompt _"A Wizard and Orc arguing about which spell is most powerful against dragons. Suddenly, their Mom comes downstairs to interrupt their LARPing session."_
|
| 76 |
+
|
| 77 |
+
▶️ **[Listen to the sample (WAV)](public/sample-generations/sample-generation-001.wav)**
|
| 78 |
+
|
| 79 |
+
Voices used: **Chicago (M)** as the Wizard, **Janus (M)** as the Orc, **Cherry (F)** as Mom.
|
| 80 |
|
| 81 |
---
|
| 82 |
|
|
|
|
| 124 |
|
| 125 |
## Voices
|
| 126 |
|
| 127 |
+
| Voice | Gender |
|
| 128 |
+
| --------- | :----: |
|
| 129 |
+
| Cherry | F |
|
| 130 |
+
| Chicago | M |
|
| 131 |
+
| Janus | M |
|
| 132 |
+
| Mantis | F |
|
| 133 |
+
| Sponge | M |
|
| 134 |
+
| Starchild | F |
|
| 135 |
|
| 136 |
Voice samples live in `public/voices/` and are loaded as short reference clips by the VibeVoice backend.
|
| 137 |
|
|
|
|
| 154 |
```
|
| 155 |
|
| 156 |
Required env:
|
| 157 |
+
|
| 158 |
- `HF_TOKEN` — Hugging Face token with Inference API access
|
| 159 |
|
| 160 |
---
|
|
|
|
| 166 |
├── app.py # Gradio frontend + script generation
|
| 167 |
├── requirements.txt # gradio, modal, huggingface_hub
|
| 168 |
├── public/
|
| 169 |
+
│ ├── images/ # Banner, architecture diagram, screenshots
|
| 170 |
+
│ ├── voices/ # Voice reference clips (Cherry, Chicago, ...)
|
| 171 |
+
│ └── sample-generations/ # Example WAV outputs
|
| 172 |
+
├── text_examples/ # Example scripts (1p, 2p, 3p, 4p scenarios)
|
| 173 |
+
├── tests/ # Parser tests + example prompts
|
| 174 |
└── README.md
|
| 175 |
```
|
| 176 |
|
|
|
|
| 185 |
|
| 186 |
---
|
| 187 |
|
| 188 |
+
`<sub>`HF Spaces configuration reference: https://huggingface.co/docs/hub/spaces-config-reference`</sub>`
|
public/images/Screenshot-1.png
ADDED
|
Git LFS Details
|
public/images/Screenshot-2.png
ADDED
|
Git LFS Details
|
public/images/Screenshot-3.png
ADDED
|
Git LFS Details
|
public/images/Screenshot-4.png
ADDED
|
Git LFS Details
|
public/sample-generations/sample-generation-001.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a5ed09f7df485f3e41c00e557d7ef24ce9796c68a00fdc606ee5fe645fe32f6f
|
| 3 |
+
size 8160044
|