YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

AI Reel Creator Platform

End-to-End Reel Generation Platform β€” From Raw Assets to Editable Reels

A complete pipeline: ingest raw videos, images, brand guidelines and brochure data β†’ build a semantic database β†’ let a user query drive curation of clips, captions, voiceover, audio and effects β†’ output a fully-composed reel ready for Remotion editing.

Repository: https://huggingface.co/acd23/ai-reel-creator-platform


What's Implemented

This repo implements Phase 2–3 pipelines of the full 6-phase PRD:

Phase Story Component Status
Phase 1 S1.1–S1.3 Asset Ingestion & Database (schema + configs) βœ… Complete
Phase 2 S2.1 Brochure Knowledge Extraction (Gemini 1.5 Pro) βœ… Complete
Phase 2 S2.2 Brochure-to-Asset Mapping βœ… Complete
Phase 2 S2.3 Caption & Voiceover Script Library βœ… Complete
Phase 3 S3.1 User Query Interface (FastAPI) βœ… Complete
Phase 3 S3.2 LLM Orchestrator β€” Reel Script Generation βœ… Complete
Phase 3 S3.3 Beat-Level Asset Retrieval + Manifest Assembly βœ… Complete
Phase 4 S4.1–S4.5 Reel Composition Engine (FFmpeg) πŸ”„ Skeleton
Phase 5 S5.1–S5.2 Remotion Export πŸ”„ Planned
Phase 6 S6.1–S6.2 Integration Testing & Baseline πŸ”„ Planned

Quick Architecture

User Query β†’ Query Interface β†’ LLM Orchestrator β†’ Reel Script
                                              ↓
                              Asset Metadata Summary (CLIP 768-dim)
                                              ↓
                                    Beat-Level Asset Retrieval
                                              ↓
                              Caption / Voiceover Library Lookup
                                              ↓
                                    Reel Manifest (JSON)
                                              ↓
                              Video Composition (FFmpeg) / Remotion Export

Gemini-Powered Brochure Parsing

The brochure parsing pipeline uses Google Gemini 1.5 Pro for multimodal PDF understanding:

  1. PDF β†’ Images: Converts brochure PDF pages to images via pdf2image or PyMuPDF
  2. Gemini Upload: Uploads page images to Gemini's file API
  3. Structured Extraction: Gemini returns JSON with sections, key features, taglines, car parts, tone tags
  4. Post-processing: Deduplication, validation, CLIP text embedding
  5. Storage: Persisted to PostgreSQL brochure_nodes table with 768-dim vectors

Environment variable: GEMINI_API_KEY

See: src/pipelines/brochure_parsing.py


Database Schema

Full PostgreSQL + pgvector schema in configs/database_schema.sql:

Table Purpose
assets Raw asset registry (video/image/audio)
asset_metadata Rich metadata + CLIP 768-dim embedding
video_events Temporal sub-assets within videos
brand_configs Machine-readable brand rules
brochure_nodes Knowledge graph nodes (Gemini-extracted)
brochure_asset_map Semantic bridge (cosine similarity + rules)
captions_library Pre-generated brand-compliant captions
voiceover_library Pre-generated voiceover lines
reel_requests User query tracking
reel_scripts LLM orchestrator output
beat_asset_candidates Per-beat top-k retrieval log
reel_manifests Final assembly plan

API Endpoints (FastAPI)

Mount via src/api/main.py:

Endpoint Method Description
/api/v1/assets POST/GET Upload / list assets
/api/v1/brand POST/GET Brand config CRUD + compliance
/api/v1/brochures POST/GET Upload PDF β†’ Gemini parse β†’ nodes
/api/v1/brochures/{id}/mappings GET Brochure→asset mappings
/api/v1/captions GET/POST Caption library
/api/v1/reels/request POST Submit reel request
/api/v1/reels/{id}/script POST/GET Generate / get reel script
/api/v1/reels/{id}/manifest GET Get assembled manifest
/api/v1/reels/{id}/render POST Trigger FFmpeg render
/api/v1/reels/{id}/download/mp4 GET Download rendered reel
/api/v1/reels/{id}/download/remotion GET Download Remotion ZIP

Environment Variables

# Gemini (Brochure Parsing)
GEMINI_API_KEY=your_gemini_api_key

# OpenAI (LLM Orchestrator)
OPENAI_API_KEY=your_openai_api_key

# Anthropic (LLM Fallback)
ANTHROPIC_API_KEY=your_anthropic_api_key

# ElevenLabs (TTS)
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_voice_id

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=reel_creator
DB_USER=reel_user
DB_PASSWORD=your_password

Project Structure

ai-reel-creator-platform/
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ database_schema.sql       # Full PostgreSQL+pgvector schema
β”‚   └── settings.yaml             # App configuration
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ main.py               # FastAPI app factory
β”‚   β”‚   └── routes/
β”‚   β”‚       β”œβ”€β”€ assets.py         # Asset endpoints
β”‚   β”‚       β”œβ”€β”€ brand.py          # Brand config endpoints
β”‚   β”‚       β”œβ”€β”€ brochures.py      # Brochure upload + Gemini parse
β”‚   β”‚       β”œβ”€β”€ captions.py       # Caption/voiceover library
β”‚   β”‚       β”œβ”€β”€ preview.py        # Preview + swap + regenerate
β”‚   β”‚       └── reels.py          # Reel request/orchestrate/render
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ database.py           # SQLAlchemy ORM models
β”‚   β”‚   β”œβ”€β”€ embedding.py          # CLIP embedder (openai/clip-vit-large-patch14)
β”‚   β”‚   └── schemas.py            # Pydantic request/response schemas
β”‚   β”œβ”€β”€ pipelines/
β”‚   β”‚   β”œβ”€β”€ brochure_parsing.py   # **Gemini 1.5 Pro PDF parser**
β”‚   β”‚   β”œβ”€β”€ caption_generation.py # Caption + voiceover generator
β”‚   β”‚   └── reel_composition.py   # FFmpeg video assembly
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ asset_mapping.py      # Brochureβ†’asset cosine similarity
β”‚   β”‚   β”œβ”€β”€ asset_retrieval.py    # Beat-level top-k retrieval
β”‚   β”‚   β”œβ”€β”€ brand_compliance.py   # Text validation against brand rules
β”‚   β”‚   β”œβ”€β”€ llm_orchestrator.py   # GPT-4o / Claude script generation
β”‚   β”‚   └── tts_service.py        # ElevenLabs / OpenAI TTS
β”‚   └── utils/
β”‚       β”œβ”€β”€ ffmpeg.py             # FFmpeg command builder
β”‚       β”œβ”€β”€ file_utils.py         # Canonical storage paths
β”‚       └── pdf_utils.py          # PDF β†’ image conversion for Gemini
β”œβ”€β”€ tests/
β”‚   └── test_pipelines.py         # Unit tests
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ sample_brochure.md        # Sample automotive brochure
β”‚   └── sample_brand_guidelines.json
β”œβ”€β”€ TASKS_README.md              # **Detailed task-to-file mapping**
└── README.md                    # This file

Running Locally

# 1. Clone
git clone https://huggingface.co/acd23/ai-reel-creator-platform
cd ai-reel-creator-platform

# 2. Install
pip install -r requirements.txt

# 3. Set env vars
export GEMINI_API_KEY="..."
export OPENAI_API_KEY="..."

# 4. Start API
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

# 5. Test brochure parsing
curl -X POST http://localhost:8000/api/v1/brochures \
  -F "file=@your_brochure.pdf"

# 6. Submit reel request
curl -X POST http://localhost:8000/api/v1/reels/request \
  -H "Content-Type: application/json" \
  -d '{
    "user_query": "10s sporty reel highlighting the car front",
    "duration_target": "10s",
    "platform": "instagram_reels",
    "tone": "sporty"
  }'

Task-to-File Mapping

For a detailed mapping of every PRD task to its implementation file, see:

➑️ TASKS_README.md

This document contains:

  • Phase-by-phase task breakdown
  • Exact file paths for each component
  • Key classes and function names
  • API endpoint reference
  • Environment variable reference
  • curl examples for every endpoint

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support