podify / README.md
jayaspjacob's picture
Update README.md
5878205 verified
|
Raw
History Blame Contribute Delete
3.59 kB
---
title: Podify - AI Podcast Generator
emoji: πŸŽ™οΈ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
python_version: "3.10"
hardware: zero-a10g
suggested_hardware: zero-a10g
pinned: false
short_description: Research a topic and turn it into a voiced podcast
tags:
- track:backyard
- achievement:offgrid
- achievement:offbrand
- achievement:fieldnotes
---
![Podify banner](assets/readme-banner.png)
# πŸŽ™οΈ Podify β€” AI Podcast Generator
Turn any topic into a finished, voiced podcast in two phases:
1. **Content** β€” research agents (LangGraph) use a HuggingFace-hosted LLM plus live
DuckDuckGo web search to research the topic and write a speaker-tagged script.
2. **Audio** β€” the self-hosted **Fish Audio / OpenAudio S1-mini** model speaks the
script, with selectable preset voices and zero-shot **voice cloning** from an
uploaded clip or a live mic recording.
Everything runs inside this single Gradio Space; the TTS model runs on **ZeroGPU**.
## Architecture
```
Topic ─▢ LangGraph: plan ─▢ DDG search ─▢ outline ─▢ write ─▢ Script
Script ─▢ Fish Audio S1-mini (@spaces.GPU): per-line synth ─▢ stitched podcast WAV
```
- `app.py` β€” Gradio Blocks UI (two tabs) wiring both phases.
- `research/` β€” `llm.py` (HF Inference client), `search.py` (DuckDuckGo), `graph.py`
(LangGraph research graph).
- `tts/` β€” `engine.py` (model load + GPU synthesis + multi-speaker stitching),
`voices.py` (preset voice registry).
## Configuration
Set these as **Space secrets / variables** (Settings β†’ Variables and secrets):
| Name | Required | Purpose |
|-------------|----------|----------------------------------------------------------------|
| `HF_TOKEN` | βœ… | LLM inference (Inference Providers) + model download. |
| `LLM_MODEL` | optional | Override the content LLM (default `Qwen/Qwen2.5-14B-Instruct`, <32B). |
| `TTS_MODEL_REPO` | optional | Override the TTS model repo (default `fishaudio/openaudio-s1-mini`). |
**ZeroGPU** requires a HuggingFace **PRO** account on the Space owner.
## Run locally
```bash
pip install -r requirements.txt
export HF_TOKEN=hf_xxx # PowerShell: $env:HF_TOKEN="hf_xxx"
python app.py
```
Phase 1 (research + script) runs on CPU. Phase 2 (TTS) needs a GPU and the
`fish-speech` package; on CPU-only machines the UI loads but synthesis is disabled.
## Models Used
- **Qwen/Qwen2.5-7B-Instruct** For Research and Script Generation
- **fishaudio/openaudio-s1-mini 0.5b** For Audio Generation
## Deploy to a Space
```bash
huggingface-cli login
huggingface-cli upload <user>/podify . --repo-type=space
# or: git push to the Space remote (preset .wav files tracked via Git LFS)
```
## Credits / assets
- **Voice samples** (`tts/voices/`): derived from [CMU ARCTIC](http://festvox.org/cmu_arctic/)
(free for research and commercial use). Rebuild with `scripts/build_voice_samples.py`.
- **Background-music loops** (`tts/music_loops/`): [FreePD](https://freepd.com/) by Kevin
MacLeod β€” 100% public domain (CC0). Rebuild with `scripts/build_music_loops.py`.
A procedural numpy fallback in `tts/music.py` is used if the loops are absent.
## Contributots
- **nvipin63**
- **jayaspjacob**
#backyard-ai
- Blog: [Article](https://huggingface.co/blog/build-small-hackathon/podify)
- Social Media Post: [Post](https://substack.com/@nvipin63/note/c-276881572?r=637t58&utm_source=notes-share-action&utm_medium=web)
- Demo: [Video](https://youtu.be/DRVf_Q8IoOI)