| --- |
| title: AudioBook Forge |
| emoji: π§ |
| colorFrom: indigo |
| colorTo: blue |
| sdk: gradio |
| sdk_version: 6.13.0 |
| python_version: '3.12' |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| short_description: AI audiobook generator with character voices via Qwen3-TTS |
| --- |
| |
| # AudioBook Forge |
|
|
| **Model-agnostic, high-fidelity audiobook generator** powered by [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS). Create audiobooks where every character speaks with their own unique voice. |
|
|
| ## Features |
|
|
| - π **File Upload** β Import EPUB, PDF, TXT, or HTML directly |
| - π **Chapter Detection** β Auto-detects chapters/sections for selective generation |
| - ποΈ **Character Voice Mapping** β Auto-extract characters and assign unique voices |
| - π **Three Voice Modes for Every Voice** |
| - **Preset** β 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects) |
| - **Clone** β Upload a 3β10 second voice sample to clone any real voice |
| - **Design** β Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it |
| - β‘ **Quick Generate** β One-click audiobook from the Story tab with full voice customization (preset, clone, or design) |
| - ποΈ **Speed & Temperature Control** β Adjust playback speed per voice (0.5xβ2.0x) and generation temperature |
| - π¦ **Multi-format Export** β MP3, WAV, or ZIP of individual segments |
| - πΎ **Save/Load Projects** β Export and restore your voice configurations |
| - π **10 Languages** β English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian |
| - β‘ **ZeroGPU** β Runs on Hugging Face ZeroGPU (free A100/H200 compute) |
| - π§ **Model Agnostic** β Backend is swappable; upgrade to future SOTA TTS models without changing the UI |
|
|
| ## How to Use |
|
|
| 1. **Paste or upload** your story in the π Story tab. |
| 2. **Quick Generate** (optional) β Generate a full audiobook immediately with a customized narrator voice. |
| 3. **Extract characters** with the π button (AI enhancement is on by default for richer voice descriptions). |
| 4. **Configure voices** in the π Voice Cast tab: |
| - Set the **Narrator** voice (preset, cloned, or AI-designed) |
| - Assign a voice to each **Character** (all default to AI-designed voices) |
| 5. **Generate** in the β‘ Generate tab and download your MP3, WAV, or ZIP audiobook. |
| 6. **Save your project** in the πΎ Project tab to preserve voice configs for later. |
|
|
| ## Architecture |
|
|
| - `app.py` β Gradio frontend with dark-themed custom UI |
| - `backend.py` β Model-agnostic TTS engine, dialogue parser, and audio stitcher |
| - **TTS Backend:** Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign) |
| - **Text Processing:** Paragraph-aware chunking, sentence-boundary splitting, quote detection |
| - **Audio Pipeline:** Per-segment synthesis β crossfade stitching β peak normalization β MP3 export |
|
|
| ## License |
|
|
| The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable. |
|
|