CineStoryAI / README.md
adi-123's picture
Update README.md
2111aaa verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: CineStory AI
emoji: 🎬
colorFrom: indigo
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
sdk_version: 6.8.0
python_version: '3.12'

CineStory AI

Upload an image, generate a short interactive story, and export a narrated cinematic storyboard video.

Current stack in code:

  • Vision: Groq llama-4-scout-17b-16e-instruct
  • Story: Together Meta-Llama-3.1-8B-Instruct-Turbo
  • Chapter images: Together black-forest-labs/FLUX.1-schnell
  • Narration: Kokoro 82M (local CPU)
  • Video composition: ffmpeg Ken Burns slideshow (local CPU)

No video-generation API (Veo/Seedance) is used in the active pipeline.

End-to-end flow

  1. Upload image
  2. Scene analysis returns structured JSON (mood, atmosphere, setting, palette, etc.)
  3. Story opening is generated with 3 choices
  4. User picks choices, with branching capped at 2 selections (final ending by second choice)
  5. For each chapter:
    • Extract visual brief from chapter text
    • Build character/world consistency bible from chapter 1
    • Generate stylized chapter image
    • Generate chapter narration audio
  6. Compose final MP4 with Ken Burns motion, chapter by chapter, synced to chapter audio durations

Requirements

Python deps:

  • gradio
  • groq
  • together
  • kokoro
  • soundfile
  • numpy
  • requests
  • Pillow

System deps:

  • ffmpeg
  • espeak-ng

Local run

From project root (/Users/adi/Desktop/img2aud):

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Install system deps:

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y ffmpeg espeak-ng

macOS (Homebrew):

brew install ffmpeg espeak-ng

Set env vars in .env or shell:

GROQ_API_KEY=...
TOGETHER_API_KEY=...

Run:

python app.py

Open the local Gradio URL shown in terminal.

App usage order

  1. Click Step 1: Analyze Image
  2. Click Step 2: Generate Story
  3. Choose branch options (up to 2 rounds)
  4. Optional: Step 3: Preview Narration Audio
  5. Click Step 4: Create Cinematic Story Video

Notes

  • .env is auto-loaded by app.py.
  • Story state stores max_branches=2 by default.
  • Image generation failures fall back to generated placeholder images per chapter.
  • Final video timing is audio-driven, so narration/video alignment stays consistent.