A newer version of the Gradio SDK is available: 6.14.0
metadata
title: AudioBook Forge
emoji: π§
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.13.0
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
short_description: AI audiobook generator with character voices via Qwen3-TTS
AudioBook Forge
Model-agnostic, high-fidelity audiobook generator powered by Qwen3-TTS. Create audiobooks where every character speaks with their own unique voice.
Features
- π File Upload β Import EPUB, PDF, TXT, or HTML directly
- π Chapter Detection β Auto-detects chapters/sections for selective generation
- ποΈ Character Voice Mapping β Auto-extract characters and assign unique voices
- π Three Voice Modes for Every Voice
- Preset β 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects)
- Clone β Upload a 3β10 second voice sample to clone any real voice
- Design β Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it
- β‘ Quick Generate β One-click audiobook from the Story tab with full voice customization (preset, clone, or design)
- ποΈ Speed & Temperature Control β Adjust playback speed per voice (0.5xβ2.0x) and generation temperature
- π¦ Multi-format Export β MP3, WAV, or ZIP of individual segments
- πΎ Save/Load Projects β Export and restore your voice configurations
- π 10 Languages β English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian
- β‘ ZeroGPU β Runs on Hugging Face ZeroGPU (free A100/H200 compute)
- π§ Model Agnostic β Backend is swappable; upgrade to future SOTA TTS models without changing the UI
How to Use
- Paste or upload your story in the π Story tab.
- Quick Generate (optional) β Generate a full audiobook immediately with a customized narrator voice.
- Extract characters with the π button (AI enhancement is on by default for richer voice descriptions).
- Configure voices in the π Voice Cast tab:
- Set the Narrator voice (preset, cloned, or AI-designed)
- Assign a voice to each Character (all default to AI-designed voices)
- Generate in the β‘ Generate tab and download your MP3, WAV, or ZIP audiobook.
- Save your project in the πΎ Project tab to preserve voice configs for later.
Architecture
app.pyβ Gradio frontend with dark-themed custom UIbackend.pyβ Model-agnostic TTS engine, dialogue parser, and audio stitcher- TTS Backend: Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign)
- Text Processing: Paragraph-aware chunking, sentence-boundary splitting, quote detection
- Audio Pipeline: Per-segment synthesis β crossfade stitching β peak normalization β MP3 export
License
The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable.