Spaces:

build-small-hackathon
/

podify

Running on Zero

App Files Files Community

podify / README.md

jayaspjacob

Update README.md

5878205 verified 16 days ago

preview code

Raw

History Blame Contribute Delete

3.59 kB

	---
	title: Podify - AI Podcast Generator
	emoji: 🎙️
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.27.0
	app_file: app.py
	python_version: "3.10"
	hardware: zero-a10g
	suggested_hardware: zero-a10g
	pinned: false
	short_description: Research a topic and turn it into a voiced podcast
	tags:
	- track:backyard
	- achievement:offgrid
	- achievement:offbrand
	- achievement:fieldnotes
	---

	![Podify banner](assets/readme-banner.png)

	# 🎙️ Podify — AI Podcast Generator

	Turn any topic into a finished, voiced podcast in two phases:

	1. Content — research agents (LangGraph) use a HuggingFace-hosted LLM plus live
	DuckDuckGo web search to research the topic and write a speaker-tagged script.
	2. Audio — the self-hosted Fish Audio / OpenAudio S1-mini model speaks the
	script, with selectable preset voices and zero-shot voice cloning from an
	uploaded clip or a live mic recording.

	Everything runs inside this single Gradio Space; the TTS model runs on ZeroGPU.

	## Architecture

	```
	Topic ─▶ LangGraph: plan ─▶ DDG search ─▶ outline ─▶ write ─▶ Script
	Script ─▶ Fish Audio S1-mini (@spaces.GPU): per-line synth ─▶ stitched podcast WAV
	```

	- `app.py` — Gradio Blocks UI (two tabs) wiring both phases.
	- `research/` — `llm.py` (HF Inference client), `search.py` (DuckDuckGo), `graph.py`
	(LangGraph research graph).
	- `tts/` — `engine.py` (model load + GPU synthesis + multi-speaker stitching),
	`voices.py` (preset voice registry).

	## Configuration

	Set these as Space secrets / variables (Settings → Variables and secrets):

	\| Name \| Required \| Purpose \|
	\|-------------\|----------\|----------------------------------------------------------------\|
	\| `HF_TOKEN` \| ✅ \| LLM inference (Inference Providers) + model download. \|
	\| `LLM_MODEL` \| optional \| Override the content LLM (default `Qwen/Qwen2.5-14B-Instruct`, <32B). \|
	\| `TTS_MODEL_REPO` \| optional \| Override the TTS model repo (default `fishaudio/openaudio-s1-mini`). \|

	ZeroGPU requires a HuggingFace PRO account on the Space owner.

	## Run locally

	```bash
	pip install -r requirements.txt
	export HF_TOKEN=hf_xxx # PowerShell: $env:HF_TOKEN="hf_xxx"
	python app.py
	```

	Phase 1 (research + script) runs on CPU. Phase 2 (TTS) needs a GPU and the
	`fish-speech` package; on CPU-only machines the UI loads but synthesis is disabled.

	## Models Used
	- Qwen/Qwen2.5-7B-Instruct For Research and Script Generation
	- fishaudio/openaudio-s1-mini 0.5b For Audio Generation

	## Deploy to a Space

	```bash
	huggingface-cli login
	huggingface-cli upload <user>/podify . --repo-type=space
	# or: git push to the Space remote (preset .wav files tracked via Git LFS)
	```

	## Credits / assets

	- Voice samples (`tts/voices/`): derived from [CMU ARCTIC](http://festvox.org/cmu_arctic/)
	(free for research and commercial use). Rebuild with `scripts/build_voice_samples.py`.
	- Background-music loops (`tts/music_loops/`): [FreePD](https://freepd.com/) by Kevin
	MacLeod — 100% public domain (CC0). Rebuild with `scripts/build_music_loops.py`.
	A procedural numpy fallback in `tts/music.py` is used if the loops are absent.

	## Contributots
	- nvipin63
	- jayaspjacob


	#backyard-ai
	- Blog: [Article](https://huggingface.co/blog/build-small-hackathon/podify)
	- Social Media Post: [Post](https://substack.com/@nvipin63/note/c-276881572?r=637t58&utm_source=notes-share-action&utm_medium=web)
	- Demo: [Video](https://youtu.be/DRVf_Q8IoOI)