AudioBook / README.md
jkorstad's picture
Update README with current feature set: Quick Generate modes, AI-by-default, chapter detection, multi-format export.
95170f7

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: AudioBook Forge
emoji: 🎧
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.13.0
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
short_description: AI audiobook generator with character voices via Qwen3-TTS

AudioBook Forge

Model-agnostic, high-fidelity audiobook generator powered by Qwen3-TTS. Create audiobooks where every character speaks with their own unique voice.

Features

  • πŸ“ File Upload β€” Import EPUB, PDF, TXT, or HTML directly
  • πŸ“– Chapter Detection β€” Auto-detects chapters/sections for selective generation
  • πŸŽ™οΈ Character Voice Mapping β€” Auto-extract characters and assign unique voices
  • 🎭 Three Voice Modes for Every Voice
    • Preset β€” 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects)
    • Clone β€” Upload a 3–10 second voice sample to clone any real voice
    • Design β€” Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it
  • ⚑ Quick Generate β€” One-click audiobook from the Story tab with full voice customization (preset, clone, or design)
  • 🎚️ Speed & Temperature Control β€” Adjust playback speed per voice (0.5x–2.0x) and generation temperature
  • πŸ“¦ Multi-format Export β€” MP3, WAV, or ZIP of individual segments
  • πŸ’Ύ Save/Load Projects β€” Export and restore your voice configurations
  • 🌐 10 Languages β€” English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian
  • ⚑ ZeroGPU β€” Runs on Hugging Face ZeroGPU (free A100/H200 compute)
  • πŸ”§ Model Agnostic β€” Backend is swappable; upgrade to future SOTA TTS models without changing the UI

How to Use

  1. Paste or upload your story in the πŸ“– Story tab.
  2. Quick Generate (optional) β€” Generate a full audiobook immediately with a customized narrator voice.
  3. Extract characters with the πŸ” button (AI enhancement is on by default for richer voice descriptions).
  4. Configure voices in the 🎭 Voice Cast tab:
    • Set the Narrator voice (preset, cloned, or AI-designed)
    • Assign a voice to each Character (all default to AI-designed voices)
  5. Generate in the ⚑ Generate tab and download your MP3, WAV, or ZIP audiobook.
  6. Save your project in the πŸ’Ύ Project tab to preserve voice configs for later.

Architecture

  • app.py β€” Gradio frontend with dark-themed custom UI
  • backend.py β€” Model-agnostic TTS engine, dialogue parser, and audio stitcher
  • TTS Backend: Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign)
  • Text Processing: Paragraph-aware chunking, sentence-boundary splitting, quote detection
  • Audio Pipeline: Per-segment synthesis β†’ crossfade stitching β†’ peak normalization β†’ MP3 export

License

The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable.