podify / README.md
jayaspjacob's picture
Update README.md
5878205 verified
|
Raw
History Blame Contribute Delete
3.59 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Podify - AI Podcast Generator
emoji: πŸŽ™οΈ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
python_version: '3.10'
hardware: zero-a10g
suggested_hardware: zero-a10g
pinned: false
short_description: Research a topic and turn it into a voiced podcast
tags:
  - track:backyard
  - achievement:offgrid
  - achievement:offbrand
  - achievement:fieldnotes

Podify banner

πŸŽ™οΈ Podify β€” AI Podcast Generator

Turn any topic into a finished, voiced podcast in two phases:

  1. Content β€” research agents (LangGraph) use a HuggingFace-hosted LLM plus live DuckDuckGo web search to research the topic and write a speaker-tagged script.
  2. Audio β€” the self-hosted Fish Audio / OpenAudio S1-mini model speaks the script, with selectable preset voices and zero-shot voice cloning from an uploaded clip or a live mic recording.

Everything runs inside this single Gradio Space; the TTS model runs on ZeroGPU.

Architecture

Topic ─▢ LangGraph: plan ─▢ DDG search ─▢ outline ─▢ write ─▢ Script
Script ─▢ Fish Audio S1-mini (@spaces.GPU): per-line synth ─▢ stitched podcast WAV
  • app.py β€” Gradio Blocks UI (two tabs) wiring both phases.
  • research/ β€” llm.py (HF Inference client), search.py (DuckDuckGo), graph.py (LangGraph research graph).
  • tts/ β€” engine.py (model load + GPU synthesis + multi-speaker stitching), voices.py (preset voice registry).

Configuration

Set these as Space secrets / variables (Settings β†’ Variables and secrets):

Name Required Purpose
HF_TOKEN βœ… LLM inference (Inference Providers) + model download.
LLM_MODEL optional Override the content LLM (default Qwen/Qwen2.5-14B-Instruct, <32B).
TTS_MODEL_REPO optional Override the TTS model repo (default fishaudio/openaudio-s1-mini).

ZeroGPU requires a HuggingFace PRO account on the Space owner.

Run locally

pip install -r requirements.txt
export HF_TOKEN=hf_xxx          # PowerShell: $env:HF_TOKEN="hf_xxx"
python app.py

Phase 1 (research + script) runs on CPU. Phase 2 (TTS) needs a GPU and the fish-speech package; on CPU-only machines the UI loads but synthesis is disabled.

Models Used

  • Qwen/Qwen2.5-7B-Instruct For Research and Script Generation
  • fishaudio/openaudio-s1-mini 0.5b For Audio Generation

Deploy to a Space

huggingface-cli login
huggingface-cli upload <user>/podify . --repo-type=space
# or: git push to the Space remote (preset .wav files tracked via Git LFS)

Credits / assets

  • Voice samples (tts/voices/): derived from CMU ARCTIC (free for research and commercial use). Rebuild with scripts/build_voice_samples.py.
  • Background-music loops (tts/music_loops/): FreePD by Kevin MacLeod β€” 100% public domain (CC0). Rebuild with scripts/build_music_loops.py. A procedural numpy fallback in tts/music.py is used if the loops are absent.

Contributots

  • nvipin63
  • jayaspjacob

#backyard-ai