File size: 8,705 Bytes

---
license: apache-2.0
library_name: diffusers
tags:
- text-to-video
- image-to-video
- video-generation
- diffusers
pipeline_tag: text-to-video
inference: true
base_model: deathlegionteam/LEGION-Video-Gen
widget:
- text: "A serene mountain lake at sunset with colorful clouds reflecting on the water"
# ⚔️ LEGION VIDEO GENERATION — The Ultimate AI Video Engine

<p align="center">
<strong>State-of-the-art video generation with 8.3B parameters</strong><br>
Text-to-Video · Image-to-Video · QWatermark System
</p>

<p align="center">
<img src="https://img.shields.io/badge/Params-8.3B-blue" alt="Parameters">
<img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
<img src="https://img.shields.io/badge/GPU-Recommended-red" alt="GPU">
<a href="https://huggingface.co/deathlegionteam/LEGION-Video-Gen"><img src="https://img.shields.io/badge/🤗%20HuggingFace-LEGION--Video--Gen-blue" alt="HuggingFace"></a>
</p>

## 📋 Table of Contents

- [✨ Features](#-features)
- [🚀 Quick Start](#-quick-start)
- [🌐 API Documentation](#-api-documentation)
- [💧 QWatermark System](#-qwatermark-system)
- [🤗 HuggingFace](#-huggingface)
- [🖥️ Project Structure](#️-project-structure)
- [🎬 Example Prompts](#-example-prompts)
- [📜 License](#-license)

## ✨ Features

- **🎬 Text-to-Video Generation** — Create videos from any text prompt with cinematic quality
- **🖼️ Image-to-Video Generation** — Animate still images with controlled motion
- **💧 QWatermark System** — Configurable semi-transparent quality assurance watermark with position, size, opacity, and text controls
- **🌐 Web Application** — Full Gradio UI with dark theme and FastAPI backend
- **📡 REST API** — Programmatic video generation via HTTP endpoints
- **🛡️ Graceful Fallback** — Mock generation mode when no GPU is available

## 🚀 Quick Start

### Prerequisites

- **GPU (Recommended):** NVIDIA GPU with 16GB+ VRAM (RTX 4090, A100, H100)
- **CPU (Fallback):** Works with mock generation mode (test pattern videos)
- **Python 3.10+**
- **~30GB free disk space** (model weights)

### Installation

```bash
# Clone the repository
git clone https://huggingface.co/deathlegionteam/LEGION-Video-Gen
cd LEGION-Video-Gen

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Verify installation
python3 -c "import torch, diffusers, gradio, fastapi; print('OK')"
```

### Quick Start — Generate Your First Video

```python
from inference import LegionVideoGenerator

generator = LegionVideoGenerator()
video_path = generator.generate_from_text(
    prompt="A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality",
    num_frames=49,
    width=480,
    height=480,
    num_inference_steps=50,
    guidance_scale=6.0,
    watermark_strength=0.3,
)
print(f"Video saved to: {video_path}")
```

### Starting the Web UI

```bash
# Start the API backend
python3 backend/main.py &

# Start the Gradio frontend
python3 frontend/app.py

# Open http://localhost:8080 in your browser
```

## 🌐 API Documentation

### REST API Endpoints

The backend runs on port **8081** by default.

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/api/status` | Health check with model and device info |
| `POST` | `/api/generate/text` | Generate video from text prompt |
| `POST` | `/api/generate/image` | Generate video from image + text prompt |
| `GET` | `/` | API root with endpoint listing |

### Text-to-Video Generation

```python
import requests

response = requests.post(
    "http://localhost:8081/api/generate/text",
    json={
        "prompt": "A cyberpunk city street at night with neon lights reflecting on wet pavement",
        "negative_prompt": "warped, distorted, flickering, jittery, low quality, blurry, artifacts",
        "num_frames": 49,
        "width": 480,
        "height": 480,
        "num_inference_steps": 50,
        "guidance_scale": 6.0,
        "watermark_strength": 0.3,
    }
)

with open("output.mp4", "wb") as f:
    f.write(response.content)
```

### Image-to-Video Generation

```python
import requests

with open("input_image.jpg", "rb") as img:
    response = requests.post(
        "http://localhost:8081/api/generate/image",
        files={"file": img},
        data={
            "prompt": "Gentle motion, cinematic camera movement, atmospheric",
            "num_frames": 49,
            "width": 480,
            "height": 480,
            "num_inference_steps": 50,
            "guidance_scale": 6.0,
            "watermark_strength": 0.3,
        }
    )

with open("animated.mp4", "wb") as f:
    f.write(response.content)
```

## 💧 QWatermark System

The QWatermark (Quality Watermark) system imprints a configurable assurance marker on every generated video.

| Parameter | Description | Default |
|-----------|-------------|---------|
| Text | Watermark text | "LEGION" |
| Position | Placement on frame | bottom-right |
| Font Size | Text size | 36 |
| Opacity | Transparency | 0.3 |
| Strength | Overall intensity | 0.0 (disabled) - 1.0 (full) |

## 🤗 HuggingFace

- **Model Repository**: [deathlegionteam/LEGION-Video-Gen](https://huggingface.co/deathlegionteam/LEGION-Video-Gen)
- **Space (Live Demo)**: [deathlegionteam/LEGION-Video-Gen-Space](https://huggingface.co/spaces/deathlegionteam/LEGION-Video-Gen-Space)

### Model Weights

The model is available as a complete Diffusers pipeline on HuggingFace Hub. You can load it directly using the Diffusers library:

```python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "deathlegionteam/LEGION-Video-Gen",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
pipe.vae.enable_tiling()
pipe.enable_attention_slicing()

# Generate video
video_frames = pipe(
    prompt="A serene mountain lake at sunset",
    num_frames=49,
    width=480,
    height=480,
    num_inference_steps=50,
    guidance_scale=6.0,
).frames[0]
```

## 🖥️ Project Structure

```
/app/video_generation_pipeline_1006/
├── inference.py           # Core generation class (LegionVideoGenerator)
├── backend/
│   └── main.py            # FastAPI backend (port 8081)
├── frontend/
│   ├── app.py             # Gradio frontend (port 8080)
│   └── streamlit_app.py   # Streamlit frontend
├── models/
│   ├── t2v/               # T2V model weights (safetensor format)
│   └── i2v/               # I2V model directory
├── outputs/               # Generated videos
├── requirements.txt       # Python dependencies
├── README.md              # This file
└── .space/                # HuggingFace Space configuration
```

## 🎬 Example Prompts

### Text-to-Video

| Prompt | Style |
|--------|-------|
| "A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality" | Nature |
| "A cyberpunk city street at night with neon lights reflecting on wet pavement, flying cars, cinematic, dramatic lighting" | Sci-Fi |
| "A majestic eagle soaring through misty mountain peaks, golden hour lighting, slow motion, National Geographic quality" | Wildlife |
| "An astronaut floating in space with Earth in the background, stars twinkling, cinematic, hyperrealistic" | Space |
| "A cozy medieval tavern interior with fireplace, warm lighting, people chatting, fantasy RPG aesthetic" | Fantasy |

### Image-to-Video

| Prompt | Motion Effect |
|--------|---------------|
| "Gentle motion, cinematic camera pan, atmospheric" | Camera movement |
| "Flowing water, leaves rustling in the wind, peaceful" | Nature animation |
| "Slow zoom in, dramatic reveal, cinematic lighting" | Zoom effect |
| "Character breathing gently, subtle movement, portrait" | Portrait animation |

## 📊 Performance

| Hardware | Resolution | Frames | Steps | Time |
|----------|------------|--------|-------|------|
| RTX 4090 (24GB) | 480p | 49 | 50 | ~2-3 min |
| A100 (80GB) | 480p | 49 | 50 | ~1-2 min |
| CPU (16+ cores) | N/A | Mock | — | ~20-30 sec |

## 📝 Notes

- **GPU Required for Real Inference:** The 8.3B parameter model requires ~16GB VRAM for FP16 inference. Without a GPU, the system runs in mock mode.
- **Disk Space:** Full model weights (T2V) are approximately 13GB. Additional I2V variant would add another ~13GB.

## 📜 License

This project is licensed under **Apache 2.0**.

<p align="center">
<strong>⚔️ LEGION VIDEO GENERATION</strong><br>
Built with ❤️ for the open-source AI community
</p>