File size: 2,997 Bytes
453cba9 a33bc67 0eaa943 453cba9 0eaa943 453cba9 a33bc67 95170f7 a33bc67 95170f7 a33bc67 95170f7 a33bc67 95170f7 a33bc67 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
title: AudioBook Forge
emoji: π§
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.13.0
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
short_description: AI audiobook generator with character voices via Qwen3-TTS
---
# AudioBook Forge
**Model-agnostic, high-fidelity audiobook generator** powered by [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS). Create audiobooks where every character speaks with their own unique voice.
## Features
- π **File Upload** β Import EPUB, PDF, TXT, or HTML directly
- π **Chapter Detection** β Auto-detects chapters/sections for selective generation
- ποΈ **Character Voice Mapping** β Auto-extract characters and assign unique voices
- π **Three Voice Modes for Every Voice**
- **Preset** β 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects)
- **Clone** β Upload a 3β10 second voice sample to clone any real voice
- **Design** β Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it
- β‘ **Quick Generate** β One-click audiobook from the Story tab with full voice customization (preset, clone, or design)
- ποΈ **Speed & Temperature Control** β Adjust playback speed per voice (0.5xβ2.0x) and generation temperature
- π¦ **Multi-format Export** β MP3, WAV, or ZIP of individual segments
- πΎ **Save/Load Projects** β Export and restore your voice configurations
- π **10 Languages** β English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian
- β‘ **ZeroGPU** β Runs on Hugging Face ZeroGPU (free A100/H200 compute)
- π§ **Model Agnostic** β Backend is swappable; upgrade to future SOTA TTS models without changing the UI
## How to Use
1. **Paste or upload** your story in the π Story tab.
2. **Quick Generate** (optional) β Generate a full audiobook immediately with a customized narrator voice.
3. **Extract characters** with the π button (AI enhancement is on by default for richer voice descriptions).
4. **Configure voices** in the π Voice Cast tab:
- Set the **Narrator** voice (preset, cloned, or AI-designed)
- Assign a voice to each **Character** (all default to AI-designed voices)
5. **Generate** in the β‘ Generate tab and download your MP3, WAV, or ZIP audiobook.
6. **Save your project** in the πΎ Project tab to preserve voice configs for later.
## Architecture
- `app.py` β Gradio frontend with dark-themed custom UI
- `backend.py` β Model-agnostic TTS engine, dialogue parser, and audio stitcher
- **TTS Backend:** Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign)
- **Text Processing:** Paragraph-aware chunking, sentence-boundary splitting, quote detection
- **Audio Pipeline:** Per-segment synthesis β crossfade stitching β peak normalization β MP3 export
## License
The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable.
|