YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
AI Reel Creator Platform
End-to-End Reel Generation Platform β From Raw Assets to Editable Reels
A complete pipeline: ingest raw videos, images, brand guidelines and brochure data β build a semantic database β let a user query drive curation of clips, captions, voiceover, audio and effects β output a fully-composed reel ready for Remotion editing.
Repository: https://huggingface.co/acd23/ai-reel-creator-platform
What's Implemented
This repo implements Phase 2β3 pipelines of the full 6-phase PRD:
| Phase | Story | Component | Status |
|---|---|---|---|
| Phase 1 | S1.1βS1.3 | Asset Ingestion & Database (schema + configs) | β Complete |
| Phase 2 | S2.1 | Brochure Knowledge Extraction (Gemini 1.5 Pro) | β Complete |
| Phase 2 | S2.2 | Brochure-to-Asset Mapping | β Complete |
| Phase 2 | S2.3 | Caption & Voiceover Script Library | β Complete |
| Phase 3 | S3.1 | User Query Interface (FastAPI) | β Complete |
| Phase 3 | S3.2 | LLM Orchestrator β Reel Script Generation | β Complete |
| Phase 3 | S3.3 | Beat-Level Asset Retrieval + Manifest Assembly | β Complete |
| Phase 4 | S4.1βS4.5 | Reel Composition Engine (FFmpeg) | π Skeleton |
| Phase 5 | S5.1βS5.2 | Remotion Export | π Planned |
| Phase 6 | S6.1βS6.2 | Integration Testing & Baseline | π Planned |
Quick Architecture
User Query β Query Interface β LLM Orchestrator β Reel Script
β
Asset Metadata Summary (CLIP 768-dim)
β
Beat-Level Asset Retrieval
β
Caption / Voiceover Library Lookup
β
Reel Manifest (JSON)
β
Video Composition (FFmpeg) / Remotion Export
Gemini-Powered Brochure Parsing
The brochure parsing pipeline uses Google Gemini 1.5 Pro for multimodal PDF understanding:
- PDF β Images: Converts brochure PDF pages to images via
pdf2imageorPyMuPDF - Gemini Upload: Uploads page images to Gemini's file API
- Structured Extraction: Gemini returns JSON with sections, key features, taglines, car parts, tone tags
- Post-processing: Deduplication, validation, CLIP text embedding
- Storage: Persisted to PostgreSQL
brochure_nodestable with 768-dim vectors
Environment variable: GEMINI_API_KEY
See: src/pipelines/brochure_parsing.py
Database Schema
Full PostgreSQL + pgvector schema in configs/database_schema.sql:
| Table | Purpose |
|---|---|
assets |
Raw asset registry (video/image/audio) |
asset_metadata |
Rich metadata + CLIP 768-dim embedding |
video_events |
Temporal sub-assets within videos |
brand_configs |
Machine-readable brand rules |
brochure_nodes |
Knowledge graph nodes (Gemini-extracted) |
brochure_asset_map |
Semantic bridge (cosine similarity + rules) |
captions_library |
Pre-generated brand-compliant captions |
voiceover_library |
Pre-generated voiceover lines |
reel_requests |
User query tracking |
reel_scripts |
LLM orchestrator output |
beat_asset_candidates |
Per-beat top-k retrieval log |
reel_manifests |
Final assembly plan |
API Endpoints (FastAPI)
Mount via src/api/main.py:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/assets |
POST/GET | Upload / list assets |
/api/v1/brand |
POST/GET | Brand config CRUD + compliance |
/api/v1/brochures |
POST/GET | Upload PDF β Gemini parse β nodes |
/api/v1/brochures/{id}/mappings |
GET | Brochureβasset mappings |
/api/v1/captions |
GET/POST | Caption library |
/api/v1/reels/request |
POST | Submit reel request |
/api/v1/reels/{id}/script |
POST/GET | Generate / get reel script |
/api/v1/reels/{id}/manifest |
GET | Get assembled manifest |
/api/v1/reels/{id}/render |
POST | Trigger FFmpeg render |
/api/v1/reels/{id}/download/mp4 |
GET | Download rendered reel |
/api/v1/reels/{id}/download/remotion |
GET | Download Remotion ZIP |
Environment Variables
# Gemini (Brochure Parsing)
GEMINI_API_KEY=your_gemini_api_key
# OpenAI (LLM Orchestrator)
OPENAI_API_KEY=your_openai_api_key
# Anthropic (LLM Fallback)
ANTHROPIC_API_KEY=your_anthropic_api_key
# ElevenLabs (TTS)
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_voice_id
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=reel_creator
DB_USER=reel_user
DB_PASSWORD=your_password
Project Structure
ai-reel-creator-platform/
βββ configs/
β βββ database_schema.sql # Full PostgreSQL+pgvector schema
β βββ settings.yaml # App configuration
βββ src/
β βββ api/
β β βββ main.py # FastAPI app factory
β β βββ routes/
β β βββ assets.py # Asset endpoints
β β βββ brand.py # Brand config endpoints
β β βββ brochures.py # Brochure upload + Gemini parse
β β βββ captions.py # Caption/voiceover library
β β βββ preview.py # Preview + swap + regenerate
β β βββ reels.py # Reel request/orchestrate/render
β βββ models/
β β βββ database.py # SQLAlchemy ORM models
β β βββ embedding.py # CLIP embedder (openai/clip-vit-large-patch14)
β β βββ schemas.py # Pydantic request/response schemas
β βββ pipelines/
β β βββ brochure_parsing.py # **Gemini 1.5 Pro PDF parser**
β β βββ caption_generation.py # Caption + voiceover generator
β β βββ reel_composition.py # FFmpeg video assembly
β βββ services/
β β βββ asset_mapping.py # Brochureβasset cosine similarity
β β βββ asset_retrieval.py # Beat-level top-k retrieval
β β βββ brand_compliance.py # Text validation against brand rules
β β βββ llm_orchestrator.py # GPT-4o / Claude script generation
β β βββ tts_service.py # ElevenLabs / OpenAI TTS
β βββ utils/
β βββ ffmpeg.py # FFmpeg command builder
β βββ file_utils.py # Canonical storage paths
β βββ pdf_utils.py # PDF β image conversion for Gemini
βββ tests/
β βββ test_pipelines.py # Unit tests
βββ data/
β βββ sample_brochure.md # Sample automotive brochure
β βββ sample_brand_guidelines.json
βββ TASKS_README.md # **Detailed task-to-file mapping**
βββ README.md # This file
Running Locally
# 1. Clone
git clone https://huggingface.co/acd23/ai-reel-creator-platform
cd ai-reel-creator-platform
# 2. Install
pip install -r requirements.txt
# 3. Set env vars
export GEMINI_API_KEY="..."
export OPENAI_API_KEY="..."
# 4. Start API
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
# 5. Test brochure parsing
curl -X POST http://localhost:8000/api/v1/brochures \
-F "file=@your_brochure.pdf"
# 6. Submit reel request
curl -X POST http://localhost:8000/api/v1/reels/request \
-H "Content-Type: application/json" \
-d '{
"user_query": "10s sporty reel highlighting the car front",
"duration_target": "10s",
"platform": "instagram_reels",
"tone": "sporty"
}'
Task-to-File Mapping
For a detailed mapping of every PRD task to its implementation file, see:
β‘οΈ TASKS_README.md
This document contains:
- Phase-by-phase task breakdown
- Exact file paths for each component
- Key classes and function names
- API endpoint reference
- Environment variable reference
curlexamples for every endpoint
License
MIT License