ACloudCenter commited on
Commit
3fb876e
·
1 Parent(s): e28a050

Add walkthrough screenshots and sample audio to

Browse files

README

Four screenshots walk through the full flow
(prompt -> script+voices -> generating -> complete)
plus a sample WAV from the Wizard/Orc/Mom demo
prompt. Also updates repo-layout and features list
to mention gender-aware casting and voice preview.

README.md CHANGED
@@ -29,6 +29,54 @@ Generate realistic multi-speaker conference calls, meetings, and podcasts from a
29
  - **Editable turn-by-turn script** — tweak speaker assignments or dialogue before rendering
30
  - **Title generation** — the LLM names each script automatically
31
  - **Two model sizes** — VibeVoice-1.5B (fast) and VibeVoice-7B (higher quality)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ---
34
 
@@ -76,14 +124,14 @@ This project separates the lightweight Gradio frontend (hosted on HF Spaces) fro
76
 
77
  ## Voices
78
 
79
- | Voice | Gender |
80
- |-------|:------:|
81
- | Cherry | F |
82
- | Chicago | M |
83
- | Janus | M |
84
- | Mantis | F |
85
- | Sponge | M |
86
- | Starchild | F |
87
 
88
  Voice samples live in `public/voices/` and are loaded as short reference clips by the VibeVoice backend.
89
 
@@ -106,6 +154,7 @@ python app.py
106
  ```
107
 
108
  Required env:
 
109
  - `HF_TOKEN` — Hugging Face token with Inference API access
110
 
111
  ---
@@ -117,9 +166,11 @@ Required env:
117
  ├── app.py # Gradio frontend + script generation
118
  ├── requirements.txt # gradio, modal, huggingface_hub
119
  ├── public/
120
- │ ├── images/ # Banner, architecture diagram, benchmark chart
121
- ── voices/ # Voice reference clips (Cherry, Chicago, ...)
122
- ── text_examples/ # Example scripts (1p, 2p, 3p, 4p scenarios)
 
 
123
  └── README.md
124
  ```
125
 
@@ -134,4 +185,4 @@ Required env:
134
 
135
  ---
136
 
137
- <sub>HF Spaces configuration reference: https://huggingface.co/docs/hub/spaces-config-reference</sub>
 
29
  - **Editable turn-by-turn script** — tweak speaker assignments or dialogue before rendering
30
  - **Title generation** — the LLM names each script automatically
31
  - **Two model sizes** — VibeVoice-1.5B (fast) and VibeVoice-7B (higher quality)
32
+ - **Gender-aware voice casting** — female characters get female voices automatically (Mom → Cherry, Wizard → Chicago, etc.) with one-click override
33
+ - **Voice preview** — sample any of the 6 voices before committing to a long generation
34
+
35
+ ---
36
+
37
+ ## Walkthrough
38
+
39
+ ### 1. Describe your scenario
40
+
41
+ Type any scenario — a meeting, podcast, argument, TED talk — and the LLM writes the full script.
42
+
43
+ <p align="center">
44
+ <img src="public/images/Screenshot-1.png" alt="Step 1: Prompt input" width="100%"/>
45
+ </p>
46
+
47
+ ### 2. Review the script and pick voices
48
+
49
+ Speaker tags auto-assign by gender. Every voice dropdown stays in sync with the tags above. Preview any voice before generating.
50
+
51
+ <p align="center">
52
+ <img src="public/images/Screenshot-2.png" alt="Step 2: Script editor with voice sync" width="100%"/>
53
+ </p>
54
+
55
+ ### 3. Generate the audio
56
+
57
+ Kick off the GPU job on Modal. A funny parody narration keeps you entertained during the wait.
58
+
59
+ <p align="center">
60
+ <img src="public/images/Screenshot-3.png" alt="Step 3: Generating" width="85%"/>
61
+ </p>
62
+
63
+ ### 4. Listen and download
64
+
65
+ Full-length multi-speaker audio, ready to play or download as a WAV.
66
+
67
+ <p align="center">
68
+ <img src="public/images/Screenshot-4.png" alt="Step 4: Complete" width="85%"/>
69
+ </p>
70
+
71
+ ---
72
+
73
+ ## Sample output
74
+
75
+ A 3-speaker example generated from the prompt _"A Wizard and Orc arguing about which spell is most powerful against dragons. Suddenly, their Mom comes downstairs to interrupt their LARPing session."_
76
+
77
+ ▶️ **[Listen to the sample (WAV)](public/sample-generations/sample-generation-001.wav)**
78
+
79
+ Voices used: **Chicago (M)** as the Wizard, **Janus (M)** as the Orc, **Cherry (F)** as Mom.
80
 
81
  ---
82
 
 
124
 
125
  ## Voices
126
 
127
+ | Voice | Gender |
128
+ | --------- | :----: |
129
+ | Cherry | F |
130
+ | Chicago | M |
131
+ | Janus | M |
132
+ | Mantis | F |
133
+ | Sponge | M |
134
+ | Starchild | F |
135
 
136
  Voice samples live in `public/voices/` and are loaded as short reference clips by the VibeVoice backend.
137
 
 
154
  ```
155
 
156
  Required env:
157
+
158
  - `HF_TOKEN` — Hugging Face token with Inference API access
159
 
160
  ---
 
166
  ├── app.py # Gradio frontend + script generation
167
  ├── requirements.txt # gradio, modal, huggingface_hub
168
  ├── public/
169
+ │ ├── images/ # Banner, architecture diagram, screenshots
170
+ ── voices/ # Voice reference clips (Cherry, Chicago, ...)
171
+ │ └── sample-generations/ # Example WAV outputs
172
+ ├── text_examples/ # Example scripts (1p, 2p, 3p, 4p scenarios)
173
+ ├── tests/ # Parser tests + example prompts
174
  └── README.md
175
  ```
176
 
 
185
 
186
  ---
187
 
188
+ `<sub>`HF Spaces configuration reference: https://huggingface.co/docs/hub/spaces-config-reference`</sub>`
public/images/Screenshot-1.png ADDED

Git LFS Details

  • SHA256: 3a8af8f5adaddaa2456b54ab81a1234392f8bd7ba36e24e9a418a334b91f2194
  • Pointer size: 130 Bytes
  • Size of remote file: 99.4 kB
public/images/Screenshot-2.png ADDED

Git LFS Details

  • SHA256: 01c81d4269ad21b1145b84ee390aedc963232365fe5e7f3e7fb94cd9d2b26b92
  • Pointer size: 131 Bytes
  • Size of remote file: 254 kB
public/images/Screenshot-3.png ADDED

Git LFS Details

  • SHA256: 38c66faeeab5f410e0103be9485400df96a1e6e529e5f20e654ddbb738b0d468
  • Pointer size: 131 Bytes
  • Size of remote file: 111 kB
public/images/Screenshot-4.png ADDED

Git LFS Details

  • SHA256: aa41ef0e213fba80bef8c4fdd887d504872c73e707b2ae3a8cb6887b1c117833
  • Pointer size: 130 Bytes
  • Size of remote file: 78 kB
public/sample-generations/sample-generation-001.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5ed09f7df485f3e41c00e557d7ef24ce9796c68a00fdc606ee5fe645fe32f6f
3
+ size 8160044