Spaces:

jkorstad
/

AudioBook

Running on Zero

App Files Files Community

AudioBook / README.md

jkorstad

Update README with current feature set: Quick Generate modes, AI-by-default, chapter detection, multi-format export.

95170f7 about 1 month ago

preview code

raw

history blame contribute delete

3 kB

	---
	title: AudioBook Forge
	emoji: 🎧
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 6.13.0
	python_version: '3.12'
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: AI audiobook generator with character voices via Qwen3-TTS
	---

	# AudioBook Forge

	Model-agnostic, high-fidelity audiobook generator powered by [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS). Create audiobooks where every character speaks with their own unique voice.

	## Features

	- 📁 File Upload — Import EPUB, PDF, TXT, or HTML directly
	- 📖 Chapter Detection — Auto-detects chapters/sections for selective generation
	- 🎙️ Character Voice Mapping — Auto-extract characters and assign unique voices
	- 🎭 Three Voice Modes for Every Voice
	- Preset — 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects)
	- Clone — Upload a 3–10 second voice sample to clone any real voice
	- Design — Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it
	- ⚡ Quick Generate — One-click audiobook from the Story tab with full voice customization (preset, clone, or design)
	- 🎚️ Speed & Temperature Control — Adjust playback speed per voice (0.5x–2.0x) and generation temperature
	- 📦 Multi-format Export — MP3, WAV, or ZIP of individual segments
	- 💾 Save/Load Projects — Export and restore your voice configurations
	- 🌐 10 Languages — English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian
	- ⚡ ZeroGPU — Runs on Hugging Face ZeroGPU (free A100/H200 compute)
	- 🔧 Model Agnostic — Backend is swappable; upgrade to future SOTA TTS models without changing the UI

	## How to Use

	1. Paste or upload your story in the 📖 Story tab.
	2. Quick Generate (optional) — Generate a full audiobook immediately with a customized narrator voice.
	3. Extract characters with the 🔍 button (AI enhancement is on by default for richer voice descriptions).
	4. Configure voices in the 🎭 Voice Cast tab:
	- Set the Narrator voice (preset, cloned, or AI-designed)
	- Assign a voice to each Character (all default to AI-designed voices)
	5. Generate in the ⚡ Generate tab and download your MP3, WAV, or ZIP audiobook.
	6. Save your project in the 💾 Project tab to preserve voice configs for later.

	## Architecture

	- `app.py` — Gradio frontend with dark-themed custom UI
	- `backend.py` — Model-agnostic TTS engine, dialogue parser, and audio stitcher
	- TTS Backend: Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign)
	- Text Processing: Paragraph-aware chunking, sentence-boundary splitting, quote detection
	- Audio Pipeline: Per-segment synthesis → crossfade stitching → peak normalization → MP3 export

	## License

	The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable.

	---
	title: AudioBook Forge
	emoji: 🎧
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 6.13.0
	python_version: '3.12'
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: AI audiobook generator with character voices via Qwen3-TTS
	---

	# AudioBook Forge

	Model-agnostic, high-fidelity audiobook generator powered by [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS). Create audiobooks where every character speaks with their own unique voice.

	## Features

	- 📁 File Upload — Import EPUB, PDF, TXT, or HTML directly
	- 📖 Chapter Detection — Auto-detects chapters/sections for selective generation
	- 🎙️ Character Voice Mapping — Auto-extract characters and assign unique voices
	- 🎭 Three Voice Modes for Every Voice
	- Preset — 9 premium built-in speakers (English, Chinese, Japanese, Korean, dialects)
	- Clone — Upload a 3–10 second voice sample to clone any real voice
	- Design — Describe a voice in text (e.g., "a raspy old man with a warm chuckle") and the AI creates it
	- ⚡ Quick Generate — One-click audiobook from the Story tab with full voice customization (preset, clone, or design)
	- 🎚️ Speed & Temperature Control — Adjust playback speed per voice (0.5x–2.0x) and generation temperature
	- 📦 Multi-format Export — MP3, WAV, or ZIP of individual segments
	- 💾 Save/Load Projects — Export and restore your voice configurations
	- 🌐 10 Languages — English, Chinese, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian
	- ⚡ ZeroGPU — Runs on Hugging Face ZeroGPU (free A100/H200 compute)
	- 🔧 Model Agnostic — Backend is swappable; upgrade to future SOTA TTS models without changing the UI

	## How to Use

	1. Paste or upload your story in the 📖 Story tab.
	2. Quick Generate (optional) — Generate a full audiobook immediately with a customized narrator voice.
	3. Extract characters with the 🔍 button (AI enhancement is on by default for richer voice descriptions).
	4. Configure voices in the 🎭 Voice Cast tab:
	- Set the Narrator voice (preset, cloned, or AI-designed)
	- Assign a voice to each Character (all default to AI-designed voices)
	5. Generate in the ⚡ Generate tab and download your MP3, WAV, or ZIP audiobook.
	6. Save your project in the 💾 Project tab to preserve voice configs for later.

	## Architecture

	- `app.py` — Gradio frontend with dark-themed custom UI
	- `backend.py` — Model-agnostic TTS engine, dialogue parser, and audio stitcher
	- TTS Backend: Qwen3-TTS 1.7B (CustomVoice + Base + VoiceDesign)
	- Text Processing: Paragraph-aware chunking, sentence-boundary splitting, quote detection
	- Audio Pipeline: Per-segment synthesis → crossfade stitching → peak normalization → MP3 export

	## License

	The application code is Apache 2.0. The underlying Qwen3-TTS models are also Apache 2.0, making this stack fully commercially usable.