AI Reel Creator Platform

End-to-End Reel Generation Platform — From Raw Assets to Editable Reels

A complete pipeline: ingest raw videos, images, brand guidelines and brochure data → build a semantic database → let a user query drive curation of clips, captions, voiceover, audio and effects → output a fully-composed reel ready for Remotion editing.

Repository: https://huggingface.co/acd23/ai-reel-creator-platform

What's Implemented

This repo implements Phase 2–3 pipelines of the full 6-phase PRD:

Phase	Story	Component	Status
Phase 1	S1.1–S1.3	Asset Ingestion & Database (schema + configs)	✅ Complete
Phase 2	S2.1	Brochure Knowledge Extraction (Gemini 1.5 Pro)	✅ Complete
Phase 2	S2.2	Brochure-to-Asset Mapping	✅ Complete
Phase 2	S2.3	Caption & Voiceover Script Library	✅ Complete
Phase 3	S3.1	User Query Interface (FastAPI)	✅ Complete
Phase 3	S3.2	LLM Orchestrator — Reel Script Generation	✅ Complete
Phase 3	S3.3	Beat-Level Asset Retrieval + Manifest Assembly	✅ Complete
Phase 4	S4.1–S4.5	Reel Composition Engine (FFmpeg)	🔄 Skeleton
Phase 5	S5.1–S5.2	Remotion Export	🔄 Planned
Phase 6	S6.1–S6.2	Integration Testing & Baseline	🔄 Planned

Quick Architecture

User Query → Query Interface → LLM Orchestrator → Reel Script
                                              ↓
                              Asset Metadata Summary (CLIP 768-dim)
                                              ↓
                                    Beat-Level Asset Retrieval
                                              ↓
                              Caption / Voiceover Library Lookup
                                              ↓
                                    Reel Manifest (JSON)
                                              ↓
                              Video Composition (FFmpeg) / Remotion Export

Gemini-Powered Brochure Parsing

The brochure parsing pipeline uses Google Gemini 1.5 Pro for multimodal PDF understanding:

PDF → Images: Converts brochure PDF pages to images via pdf2image or PyMuPDF
Gemini Upload: Uploads page images to Gemini's file API
Structured Extraction: Gemini returns JSON with sections, key features, taglines, car parts, tone tags
Post-processing: Deduplication, validation, CLIP text embedding
Storage: Persisted to PostgreSQL brochure_nodes table with 768-dim vectors

Environment variable: GEMINI_API_KEY

See: src/pipelines/brochure_parsing.py

Database Schema

Full PostgreSQL + pgvector schema in configs/database_schema.sql:

Table	Purpose
`assets`	Raw asset registry (video/image/audio)
`asset_metadata`	Rich metadata + CLIP 768-dim embedding
`video_events`	Temporal sub-assets within videos
`brand_configs`	Machine-readable brand rules
`brochure_nodes`	Knowledge graph nodes (Gemini-extracted)
`brochure_asset_map`	Semantic bridge (cosine similarity + rules)
`captions_library`	Pre-generated brand-compliant captions
`voiceover_library`	Pre-generated voiceover lines
`reel_requests`	User query tracking
`reel_scripts`	LLM orchestrator output
`beat_asset_candidates`	Per-beat top-k retrieval log
`reel_manifests`	Final assembly plan

API Endpoints (FastAPI)

Mount via src/api/main.py:

Endpoint	Method	Description
`/api/v1/assets`	POST/GET	Upload / list assets
`/api/v1/brand`	POST/GET	Brand config CRUD + compliance
`/api/v1/brochures`	POST/GET	Upload PDF → Gemini parse → nodes
`/api/v1/brochures/{id}/mappings`	GET	Brochure→asset mappings
`/api/v1/captions`	GET/POST	Caption library
`/api/v1/reels/request`	POST	Submit reel request
`/api/v1/reels/{id}/script`	POST/GET	Generate / get reel script
`/api/v1/reels/{id}/manifest`	GET	Get assembled manifest
`/api/v1/reels/{id}/render`	POST	Trigger FFmpeg render
`/api/v1/reels/{id}/download/mp4`	GET	Download rendered reel
`/api/v1/reels/{id}/download/remotion`	GET	Download Remotion ZIP

Environment Variables

# Gemini (Brochure Parsing)
GEMINI_API_KEY=your_gemini_api_key

# OpenAI (LLM Orchestrator)
OPENAI_API_KEY=your_openai_api_key

# Anthropic (LLM Fallback)
ANTHROPIC_API_KEY=your_anthropic_api_key

# ElevenLabs (TTS)
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_voice_id

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=reel_creator
DB_USER=reel_user
DB_PASSWORD=your_password

Project Structure

ai-reel-creator-platform/
├── configs/
│   ├── database_schema.sql       # Full PostgreSQL+pgvector schema
│   └── settings.yaml             # App configuration
├── src/
│   ├── api/
│   │   ├── main.py               # FastAPI app factory
│   │   └── routes/
│   │       ├── assets.py         # Asset endpoints
│   │       ├── brand.py          # Brand config endpoints
│   │       ├── brochures.py      # Brochure upload + Gemini parse
│   │       ├── captions.py       # Caption/voiceover library
│   │       ├── preview.py        # Preview + swap + regenerate
│   │       └── reels.py          # Reel request/orchestrate/render
│   ├── models/
│   │   ├── database.py           # SQLAlchemy ORM models
│   │   ├── embedding.py          # CLIP embedder (openai/clip-vit-large-patch14)
│   │   └── schemas.py            # Pydantic request/response schemas
│   ├── pipelines/
│   │   ├── brochure_parsing.py   # **Gemini 1.5 Pro PDF parser**
│   │   ├── caption_generation.py # Caption + voiceover generator
│   │   └── reel_composition.py   # FFmpeg video assembly
│   ├── services/
│   │   ├── asset_mapping.py      # Brochure→asset cosine similarity
│   │   ├── asset_retrieval.py    # Beat-level top-k retrieval
│   │   ├── brand_compliance.py   # Text validation against brand rules
│   │   ├── llm_orchestrator.py   # GPT-4o / Claude script generation
│   │   └── tts_service.py        # ElevenLabs / OpenAI TTS
│   └── utils/
│       ├── ffmpeg.py             # FFmpeg command builder
│       ├── file_utils.py         # Canonical storage paths
│       └── pdf_utils.py          # PDF → image conversion for Gemini
├── tests/
│   └── test_pipelines.py         # Unit tests
├── data/
│   ├── sample_brochure.md        # Sample automotive brochure
│   └── sample_brand_guidelines.json
├── TASKS_README.md              # **Detailed task-to-file mapping**
└── README.md                    # This file

Running Locally

# 1. Clone
git clone https://huggingface.co/acd23/ai-reel-creator-platform
cd ai-reel-creator-platform

# 2. Install
pip install -r requirements.txt

# 3. Set env vars
export GEMINI_API_KEY="..."
export OPENAI_API_KEY="..."

# 4. Start API
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

# 5. Test brochure parsing
curl -X POST http://localhost:8000/api/v1/brochures \
  -F "file=@your_brochure.pdf"

# 6. Submit reel request
curl -X POST http://localhost:8000/api/v1/reels/request \
  -H "Content-Type: application/json" \
  -d '{
    "user_query": "10s sporty reel highlighting the car front",
    "duration_target": "10s",
    "platform": "instagram_reels",
    "tone": "sporty"
  }'

Task-to-File Mapping

For a detailed mapping of every PRD task to its implementation file, see:

➡️ TASKS_README.md

This document contains:

Phase-by-phase task breakdown
Exact file paths for each component
Key classes and function names
API endpoint reference
Environment variable reference
curl examples for every endpoint

License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support