--- title: Podify - AI Podcast Generator emoji: 🎙️ colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.27.0 app_file: app.py python_version: "3.10" hardware: zero-a10g suggested_hardware: zero-a10g pinned: false short_description: Research a topic and turn it into a voiced podcast tags: - track:backyard - achievement:offgrid - achievement:offbrand - achievement:fieldnotes --- ![Podify banner](assets/readme-banner.png) # 🎙️ Podify — AI Podcast Generator Turn any topic into a finished, voiced podcast in two phases: 1. **Content** — research agents (LangGraph) use a HuggingFace-hosted LLM plus live DuckDuckGo web search to research the topic and write a speaker-tagged script. 2. **Audio** — the self-hosted **Fish Audio / OpenAudio S1-mini** model speaks the script, with selectable preset voices and zero-shot **voice cloning** from an uploaded clip or a live mic recording. Everything runs inside this single Gradio Space; the TTS model runs on **ZeroGPU**. ## Architecture ``` Topic ─▶ LangGraph: plan ─▶ DDG search ─▶ outline ─▶ write ─▶ Script Script ─▶ Fish Audio S1-mini (@spaces.GPU): per-line synth ─▶ stitched podcast WAV ``` - `app.py` — Gradio Blocks UI (two tabs) wiring both phases. - `research/` — `llm.py` (HF Inference client), `search.py` (DuckDuckGo), `graph.py` (LangGraph research graph). - `tts/` — `engine.py` (model load + GPU synthesis + multi-speaker stitching), `voices.py` (preset voice registry). ## Configuration Set these as **Space secrets / variables** (Settings → Variables and secrets): | Name | Required | Purpose | |-------------|----------|----------------------------------------------------------------| | `HF_TOKEN` | ✅ | LLM inference (Inference Providers) + model download. | | `LLM_MODEL` | optional | Override the content LLM (default `Qwen/Qwen2.5-14B-Instruct`, <32B). | | `TTS_MODEL_REPO` | optional | Override the TTS model repo (default `fishaudio/openaudio-s1-mini`). | **ZeroGPU** requires a HuggingFace **PRO** account on the Space owner. ## Run locally ```bash pip install -r requirements.txt export HF_TOKEN=hf_xxx # PowerShell: $env:HF_TOKEN="hf_xxx" python app.py ``` Phase 1 (research + script) runs on CPU. Phase 2 (TTS) needs a GPU and the `fish-speech` package; on CPU-only machines the UI loads but synthesis is disabled. ## Models Used - **Qwen/Qwen2.5-7B-Instruct** For Research and Script Generation - **fishaudio/openaudio-s1-mini 0.5b** For Audio Generation ## Deploy to a Space ```bash huggingface-cli login huggingface-cli upload /podify . --repo-type=space # or: git push to the Space remote (preset .wav files tracked via Git LFS) ``` ## Credits / assets - **Voice samples** (`tts/voices/`): derived from [CMU ARCTIC](http://festvox.org/cmu_arctic/) (free for research and commercial use). Rebuild with `scripts/build_voice_samples.py`. - **Background-music loops** (`tts/music_loops/`): [FreePD](https://freepd.com/) by Kevin MacLeod — 100% public domain (CC0). Rebuild with `scripts/build_music_loops.py`. A procedural numpy fallback in `tts/music.py` is used if the loops are absent. ## Contributots - **nvipin63** - **jayaspjacob** #backyard-ai - Blog: [Article](https://huggingface.co/blog/build-small-hackathon/podify) - Social Media Post: [Post](https://substack.com/@nvipin63/note/c-276881572?r=637t58&utm_source=notes-share-action&utm_medium=web) - Demo: [Video](https://youtu.be/DRVf_Q8IoOI)