--- license: apache-2.0 library_name: diffusers tags: - text-to-video - image-to-video - video-generation - diffusers pipeline_tag: text-to-video inference: true base_model: deathlegionteam/LEGION-Video-Gen widget: - text: "A serene mountain lake at sunset with colorful clouds reflecting on the water" # ⚔️ LEGION VIDEO GENERATION — The Ultimate AI Video Engine

State-of-the-art video generation with 8.3B parameters
Text-to-Video · Image-to-Video · QWatermark System

Parameters License GPU HuggingFace

## 📋 Table of Contents - [✨ Features](#-features) - [🚀 Quick Start](#-quick-start) - [🌐 API Documentation](#-api-documentation) - [💧 QWatermark System](#-qwatermark-system) - [🤗 HuggingFace](#-huggingface) - [🖥️ Project Structure](#️-project-structure) - [🎬 Example Prompts](#-example-prompts) - [📜 License](#-license) ## ✨ Features - **🎬 Text-to-Video Generation** — Create videos from any text prompt with cinematic quality - **🖼️ Image-to-Video Generation** — Animate still images with controlled motion - **💧 QWatermark System** — Configurable semi-transparent quality assurance watermark with position, size, opacity, and text controls - **🌐 Web Application** — Full Gradio UI with dark theme and FastAPI backend - **📡 REST API** — Programmatic video generation via HTTP endpoints - **🛡️ Graceful Fallback** — Mock generation mode when no GPU is available ## 🚀 Quick Start ### Prerequisites - **GPU (Recommended):** NVIDIA GPU with 16GB+ VRAM (RTX 4090, A100, H100) - **CPU (Fallback):** Works with mock generation mode (test pattern videos) - **Python 3.10+** - **~30GB free disk space** (model weights) ### Installation ```bash # Clone the repository git clone https://huggingface.co/deathlegionteam/LEGION-Video-Gen cd LEGION-Video-Gen # Create virtual environment python3 -m venv venv source venv/bin/activate # Install dependencies pip install --upgrade pip pip install -r requirements.txt # Verify installation python3 -c "import torch, diffusers, gradio, fastapi; print('OK')" ``` ### Quick Start — Generate Your First Video ```python from inference import LegionVideoGenerator generator = LegionVideoGenerator() video_path = generator.generate_from_text( prompt="A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality", num_frames=49, width=480, height=480, num_inference_steps=50, guidance_scale=6.0, watermark_strength=0.3, ) print(f"Video saved to: {video_path}") ``` ### Starting the Web UI ```bash # Start the API backend python3 backend/main.py & # Start the Gradio frontend python3 frontend/app.py # Open http://localhost:8080 in your browser ``` ## 🌐 API Documentation ### REST API Endpoints The backend runs on port **8081** by default. | Method | Endpoint | Description | |--------|----------|-------------| | `GET` | `/api/status` | Health check with model and device info | | `POST` | `/api/generate/text` | Generate video from text prompt | | `POST` | `/api/generate/image` | Generate video from image + text prompt | | `GET` | `/` | API root with endpoint listing | ### Text-to-Video Generation ```python import requests response = requests.post( "http://localhost:8081/api/generate/text", json={ "prompt": "A cyberpunk city street at night with neon lights reflecting on wet pavement", "negative_prompt": "warped, distorted, flickering, jittery, low quality, blurry, artifacts", "num_frames": 49, "width": 480, "height": 480, "num_inference_steps": 50, "guidance_scale": 6.0, "watermark_strength": 0.3, } ) with open("output.mp4", "wb") as f: f.write(response.content) ``` ### Image-to-Video Generation ```python import requests with open("input_image.jpg", "rb") as img: response = requests.post( "http://localhost:8081/api/generate/image", files={"file": img}, data={ "prompt": "Gentle motion, cinematic camera movement, atmospheric", "num_frames": 49, "width": 480, "height": 480, "num_inference_steps": 50, "guidance_scale": 6.0, "watermark_strength": 0.3, } ) with open("animated.mp4", "wb") as f: f.write(response.content) ``` ## 💧 QWatermark System The QWatermark (Quality Watermark) system imprints a configurable assurance marker on every generated video. | Parameter | Description | Default | |-----------|-------------|---------| | Text | Watermark text | "LEGION" | | Position | Placement on frame | bottom-right | | Font Size | Text size | 36 | | Opacity | Transparency | 0.3 | | Strength | Overall intensity | 0.0 (disabled) - 1.0 (full) | ## 🤗 HuggingFace - **Model Repository**: [deathlegionteam/LEGION-Video-Gen](https://huggingface.co/deathlegionteam/LEGION-Video-Gen) - **Space (Live Demo)**: [deathlegionteam/LEGION-Video-Gen-Space](https://huggingface.co/spaces/deathlegionteam/LEGION-Video-Gen-Space) ### Model Weights The model is available as a complete Diffusers pipeline on HuggingFace Hub. You can load it directly using the Diffusers library: ```python from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained( "deathlegionteam/LEGION-Video-Gen", torch_dtype=torch.float16, ) pipe = pipe.to("cuda") pipe.vae.enable_tiling() pipe.enable_attention_slicing() # Generate video video_frames = pipe( prompt="A serene mountain lake at sunset", num_frames=49, width=480, height=480, num_inference_steps=50, guidance_scale=6.0, ).frames[0] ``` ## 🖥️ Project Structure ``` /app/video_generation_pipeline_1006/ ├── inference.py # Core generation class (LegionVideoGenerator) ├── backend/ │ └── main.py # FastAPI backend (port 8081) ├── frontend/ │ ├── app.py # Gradio frontend (port 8080) │ └── streamlit_app.py # Streamlit frontend ├── models/ │ ├── t2v/ # T2V model weights (safetensor format) │ └── i2v/ # I2V model directory ├── outputs/ # Generated videos ├── requirements.txt # Python dependencies ├── README.md # This file └── .space/ # HuggingFace Space configuration ``` ## 🎬 Example Prompts ### Text-to-Video | Prompt | Style | |--------|-------| | "A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality" | Nature | | "A cyberpunk city street at night with neon lights reflecting on wet pavement, flying cars, cinematic, dramatic lighting" | Sci-Fi | | "A majestic eagle soaring through misty mountain peaks, golden hour lighting, slow motion, National Geographic quality" | Wildlife | | "An astronaut floating in space with Earth in the background, stars twinkling, cinematic, hyperrealistic" | Space | | "A cozy medieval tavern interior with fireplace, warm lighting, people chatting, fantasy RPG aesthetic" | Fantasy | ### Image-to-Video | Prompt | Motion Effect | |--------|---------------| | "Gentle motion, cinematic camera pan, atmospheric" | Camera movement | | "Flowing water, leaves rustling in the wind, peaceful" | Nature animation | | "Slow zoom in, dramatic reveal, cinematic lighting" | Zoom effect | | "Character breathing gently, subtle movement, portrait" | Portrait animation | ## 📊 Performance | Hardware | Resolution | Frames | Steps | Time | |----------|------------|--------|-------|------| | RTX 4090 (24GB) | 480p | 49 | 50 | ~2-3 min | | A100 (80GB) | 480p | 49 | 50 | ~1-2 min | | CPU (16+ cores) | N/A | Mock | — | ~20-30 sec | ## 📝 Notes - **GPU Required for Real Inference:** The 8.3B parameter model requires ~16GB VRAM for FP16 inference. Without a GPU, the system runs in mock mode. - **Disk Space:** Full model weights (T2V) are approximately 13GB. Additional I2V variant would add another ~13GB. ## 📜 License This project is licensed under **Apache 2.0**.

⚔️ LEGION VIDEO GENERATION
Built with ❤️ for the open-source AI community