Spaces:
Running on Zero
Running on Zero
| title: Podify - AI Podcast Generator | |
| emoji: ποΈ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.27.0 | |
| app_file: app.py | |
| python_version: "3.10" | |
| hardware: zero-a10g | |
| suggested_hardware: zero-a10g | |
| pinned: false | |
| short_description: Research a topic and turn it into a voiced podcast | |
| tags: | |
| - track:backyard | |
| - achievement:offgrid | |
| - achievement:offbrand | |
| - achievement:fieldnotes | |
|  | |
| # ποΈ Podify β AI Podcast Generator | |
| Turn any topic into a finished, voiced podcast in two phases: | |
| 1. **Content** β research agents (LangGraph) use a HuggingFace-hosted LLM plus live | |
| DuckDuckGo web search to research the topic and write a speaker-tagged script. | |
| 2. **Audio** β the self-hosted **Fish Audio / OpenAudio S1-mini** model speaks the | |
| script, with selectable preset voices and zero-shot **voice cloning** from an | |
| uploaded clip or a live mic recording. | |
| Everything runs inside this single Gradio Space; the TTS model runs on **ZeroGPU**. | |
| ## Architecture | |
| ``` | |
| Topic ββΆ LangGraph: plan ββΆ DDG search ββΆ outline ββΆ write ββΆ Script | |
| Script ββΆ Fish Audio S1-mini (@spaces.GPU): per-line synth ββΆ stitched podcast WAV | |
| ``` | |
| - `app.py` β Gradio Blocks UI (two tabs) wiring both phases. | |
| - `research/` β `llm.py` (HF Inference client), `search.py` (DuckDuckGo), `graph.py` | |
| (LangGraph research graph). | |
| - `tts/` β `engine.py` (model load + GPU synthesis + multi-speaker stitching), | |
| `voices.py` (preset voice registry). | |
| ## Configuration | |
| Set these as **Space secrets / variables** (Settings β Variables and secrets): | |
| | Name | Required | Purpose | | |
| |-------------|----------|----------------------------------------------------------------| | |
| | `HF_TOKEN` | β | LLM inference (Inference Providers) + model download. | | |
| | `LLM_MODEL` | optional | Override the content LLM (default `Qwen/Qwen2.5-14B-Instruct`, <32B). | | |
| | `TTS_MODEL_REPO` | optional | Override the TTS model repo (default `fishaudio/openaudio-s1-mini`). | | |
| **ZeroGPU** requires a HuggingFace **PRO** account on the Space owner. | |
| ## Run locally | |
| ```bash | |
| pip install -r requirements.txt | |
| export HF_TOKEN=hf_xxx # PowerShell: $env:HF_TOKEN="hf_xxx" | |
| python app.py | |
| ``` | |
| Phase 1 (research + script) runs on CPU. Phase 2 (TTS) needs a GPU and the | |
| `fish-speech` package; on CPU-only machines the UI loads but synthesis is disabled. | |
| ## Models Used | |
| - **Qwen/Qwen2.5-7B-Instruct** For Research and Script Generation | |
| - **fishaudio/openaudio-s1-mini 0.5b** For Audio Generation | |
| ## Deploy to a Space | |
| ```bash | |
| huggingface-cli login | |
| huggingface-cli upload <user>/podify . --repo-type=space | |
| # or: git push to the Space remote (preset .wav files tracked via Git LFS) | |
| ``` | |
| ## Credits / assets | |
| - **Voice samples** (`tts/voices/`): derived from [CMU ARCTIC](http://festvox.org/cmu_arctic/) | |
| (free for research and commercial use). Rebuild with `scripts/build_voice_samples.py`. | |
| - **Background-music loops** (`tts/music_loops/`): [FreePD](https://freepd.com/) by Kevin | |
| MacLeod β 100% public domain (CC0). Rebuild with `scripts/build_music_loops.py`. | |
| A procedural numpy fallback in `tts/music.py` is used if the loops are absent. | |
| ## Contributots | |
| - **nvipin63** | |
| - **jayaspjacob** | |
| #backyard-ai | |
| - Blog: [Article](https://huggingface.co/blog/build-small-hackathon/podify) | |
| - Social Media Post: [Post](https://substack.com/@nvipin63/note/c-276881572?r=637t58&utm_source=notes-share-action&utm_medium=web) | |
| - Demo: [Video](https://youtu.be/DRVf_Q8IoOI) |