--- title: AudioBook Forge emoji: 🎧 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 6.13.0 python_version: '3.12' app_file: app.py pinned: false license: apache-2.0 short_description: AI audiobook generator with character voices via Qwen3-TTS --- # AudioBook Forge **Model-agnostic, high-fidelity audiobook generator** powered by [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS). Create audiobooks where every character speaks with their own unique voice. ## Features - 📁 **File Upload** — Import EPUB, PDF, TXT, or HTML directly - 📖 **Chapter Detection** — Auto-detects chapters/sections for selective generation - 🎙️ **Character Voice Mapping** — Auto-extract characters and assign unique voices - 🎭 **Three Voice Modes for Every Voice** - **Preset** — 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects) - **Clone** — Upload a 3–10 second voice sample to clone any real voice - **Design** — Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it - ⚡ **Quick Generate** — One-click audiobook from the Story tab with full voice customization (preset, clone, or design) - 🎚️ **Speed & Temperature Control** — Adjust playback speed per voice (0.5x–2.0x) and generation temperature - 📦 **Multi-format Export** — MP3, WAV, or ZIP of individual segments - 💾 **Save/Load Projects** — Export and restore your voice configurations - 🌐 **10 Languages** — English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian - ⚡ **ZeroGPU** — Runs on Hugging Face ZeroGPU (free A100/H200 compute) - 🔧 **Model Agnostic** — Backend is swappable; upgrade to future SOTA TTS models without changing the UI ## How to Use 1. **Paste or upload** your story in the 📖 Story tab. 2. **Quick Generate** (optional) — Generate a full audiobook immediately with a customized narrator voice. 3. **Extract characters** with the 🔍 button (AI enhancement is on by default for richer voice descriptions). 4. **Configure voices** in the 🎭 Voice Cast tab: - Set the **Narrator** voice (preset, cloned, or AI-designed) - Assign a voice to each **Character** (all default to AI-designed voices) 5. **Generate** in the ⚡ Generate tab and download your MP3, WAV, or ZIP audiobook. 6. **Save your project** in the 💾 Project tab to preserve voice configs for later. ## Architecture - `app.py` — Gradio frontend with dark-themed custom UI - `backend.py` — Model-agnostic TTS engine, dialogue parser, and audio stitcher - **TTS Backend:** Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign) - **Text Processing:** Paragraph-aware chunking, sentence-boundary splitting, quote detection - **Audio Pipeline:** Per-segment synthesis → crossfade stitching → peak normalization → MP3 export ## License The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable.