Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: CineStory AI
emoji: 🎬
colorFrom: indigo
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
sdk_version: 6.8.0
python_version: '3.12'
CineStory AI
Upload an image, generate a short interactive story, and export a narrated cinematic storyboard video.
Current stack in code:
- Vision: Groq
llama-4-scout-17b-16e-instruct - Story: Together
Meta-Llama-3.1-8B-Instruct-Turbo - Chapter images: Together
black-forest-labs/FLUX.1-schnell - Narration: Kokoro 82M (local CPU)
- Video composition:
ffmpegKen Burns slideshow (local CPU)
No video-generation API (Veo/Seedance) is used in the active pipeline.
End-to-end flow
- Upload image
- Scene analysis returns structured JSON (mood, atmosphere, setting, palette, etc.)
- Story opening is generated with 3 choices
- User picks choices, with branching capped at 2 selections (final ending by second choice)
- For each chapter:
- Extract visual brief from chapter text
- Build character/world consistency bible from chapter 1
- Generate stylized chapter image
- Generate chapter narration audio
- Compose final MP4 with Ken Burns motion, chapter by chapter, synced to chapter audio durations
Requirements
Python deps:
gradiogroqtogetherkokorosoundfilenumpyrequestsPillow
System deps:
ffmpegespeak-ng
Local run
From project root (/Users/adi/Desktop/img2aud):
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Install system deps:
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y ffmpeg espeak-ng
macOS (Homebrew):
brew install ffmpeg espeak-ng
Set env vars in .env or shell:
GROQ_API_KEY=...
TOGETHER_API_KEY=...
Run:
python app.py
Open the local Gradio URL shown in terminal.
App usage order
- Click
Step 1: Analyze Image - Click
Step 2: Generate Story - Choose branch options (up to 2 rounds)
- Optional:
Step 3: Preview Narration Audio - Click
Step 4: Create Cinematic Story Video
Notes
.envis auto-loaded byapp.py.- Story state stores
max_branches=2by default. - Image generation failures fall back to generated placeholder images per chapter.
- Final video timing is audio-driven, so narration/video alignment stays consistent.