YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

license: apache-2.0 library_name: diffusers tags:

text-to-video
image-to-video
video-generation
diffusers pipeline_tag: text-to-video inference: true base_model: deathlegionteam/LEGION-Video-Gen widget:
text: "A serene mountain lake at sunset with colorful clouds reflecting on the water"

⚔️ LEGION VIDEO GENERATION — The Ultimate AI Video Engine

State-of-the-art video generation with 8.3B parameters
Text-to-Video · Image-to-Video · QWatermark System

✨ Features

🎬 Text-to-Video Generation — Create videos from any text prompt with cinematic quality
🖼️ Image-to-Video Generation — Animate still images with controlled motion
💧 QWatermark System — Configurable semi-transparent quality assurance watermark with position, size, opacity, and text controls
🌐 Web Application — Full Gradio UI with dark theme and FastAPI backend
📡 REST API — Programmatic video generation via HTTP endpoints
🛡️ Graceful Fallback — Mock generation mode when no GPU is available

🚀 Quick Start

Prerequisites

GPU (Recommended): NVIDIA GPU with 16GB+ VRAM (RTX 4090, A100, H100)
CPU (Fallback): Works with mock generation mode (test pattern videos)
Python 3.10+
~30GB free disk space (model weights)

Installation

# Clone the repository
git clone https://huggingface.co/deathlegionteam/LEGION-Video-Gen
cd LEGION-Video-Gen

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Verify installation
python3 -c "import torch, diffusers, gradio, fastapi; print('OK')"

Quick Start — Generate Your First Video

from inference import LegionVideoGenerator

generator = LegionVideoGenerator()
video_path = generator.generate_from_text(
    prompt="A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality",
    num_frames=49,
    width=480,
    height=480,
    num_inference_steps=50,
    guidance_scale=6.0,
    watermark_strength=0.3,
)
print(f"Video saved to: {video_path}")

Starting the Web UI

# Start the API backend
python3 backend/main.py &

# Start the Gradio frontend
python3 frontend/app.py

# Open http://localhost:8080 in your browser

🌐 API Documentation

REST API Endpoints

The backend runs on port 8081 by default.

Method	Endpoint	Description
`GET`	`/api/status`	Health check with model and device info
`POST`	`/api/generate/text`	Generate video from text prompt
`POST`	`/api/generate/image`	Generate video from image + text prompt
`GET`	`/`	API root with endpoint listing

Text-to-Video Generation

import requests

response = requests.post(
    "http://localhost:8081/api/generate/text",
    json={
        "prompt": "A cyberpunk city street at night with neon lights reflecting on wet pavement",
        "negative_prompt": "warped, distorted, flickering, jittery, low quality, blurry, artifacts",
        "num_frames": 49,
        "width": 480,
        "height": 480,
        "num_inference_steps": 50,
        "guidance_scale": 6.0,
        "watermark_strength": 0.3,
    }
)

with open("output.mp4", "wb") as f:
    f.write(response.content)

Image-to-Video Generation

import requests

with open("input_image.jpg", "rb") as img:
    response = requests.post(
        "http://localhost:8081/api/generate/image",
        files={"file": img},
        data={
            "prompt": "Gentle motion, cinematic camera movement, atmospheric",
            "num_frames": 49,
            "width": 480,
            "height": 480,
            "num_inference_steps": 50,
            "guidance_scale": 6.0,
            "watermark_strength": 0.3,
        }
    )

with open("animated.mp4", "wb") as f:
    f.write(response.content)

💧 QWatermark System

The QWatermark (Quality Watermark) system imprints a configurable assurance marker on every generated video.

Parameter	Description	Default
Text	Watermark text	"LEGION"
Position	Placement on frame	bottom-right
Font Size	Text size	36
Opacity	Transparency	0.3
Strength	Overall intensity	0.0 (disabled) - 1.0 (full)

🤗 HuggingFace

Model Repository: deathlegionteam/LEGION-Video-Gen
Space (Live Demo): deathlegionteam/LEGION-Video-Gen-Space

Model Weights

The model is available as a complete Diffusers pipeline on HuggingFace Hub. You can load it directly using the Diffusers library:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "deathlegionteam/LEGION-Video-Gen",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
pipe.vae.enable_tiling()
pipe.enable_attention_slicing()

# Generate video
video_frames = pipe(
    prompt="A serene mountain lake at sunset",
    num_frames=49,
    width=480,
    height=480,
    num_inference_steps=50,
    guidance_scale=6.0,
).frames[0]

🖥️ Project Structure

/app/video_generation_pipeline_1006/
├── inference.py           # Core generation class (LegionVideoGenerator)
├── backend/
│   └── main.py            # FastAPI backend (port 8081)
├── frontend/
│   ├── app.py             # Gradio frontend (port 8080)
│   └── streamlit_app.py   # Streamlit frontend
├── models/
│   ├── t2v/               # T2V model weights (safetensor format)
│   └── i2v/               # I2V model directory
├── outputs/               # Generated videos
├── requirements.txt       # Python dependencies
├── README.md              # This file
└── .space/                # HuggingFace Space configuration

🎬 Example Prompts

Text-to-Video

Prompt	Style
"A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality"	Nature
"A cyberpunk city street at night with neon lights reflecting on wet pavement, flying cars, cinematic, dramatic lighting"	Sci-Fi
"A majestic eagle soaring through misty mountain peaks, golden hour lighting, slow motion, National Geographic quality"	Wildlife
"An astronaut floating in space with Earth in the background, stars twinkling, cinematic, hyperrealistic"	Space
"A cozy medieval tavern interior with fireplace, warm lighting, people chatting, fantasy RPG aesthetic"	Fantasy

Image-to-Video

Prompt	Motion Effect
"Gentle motion, cinematic camera pan, atmospheric"	Camera movement
"Flowing water, leaves rustling in the wind, peaceful"	Nature animation
"Slow zoom in, dramatic reveal, cinematic lighting"	Zoom effect
"Character breathing gently, subtle movement, portrait"	Portrait animation

📊 Performance

Hardware	Resolution	Frames	Steps	Time
RTX 4090 (24GB)	480p	49	50	~2-3 min
A100 (80GB)	480p	49	50	~1-2 min
CPU (16+ cores)	N/A	Mock	—	~20-30 sec

📝 Notes

GPU Required for Real Inference: The 8.3B parameter model requires ~16GB VRAM for FP16 inference. Without a GPU, the system runs in mock mode.
Disk Space: Full model weights (T2V) are approximately 13GB. Additional I2V variant would add another ~13GB.

📜 License

This project is licensed under Apache 2.0.

⚔️ LEGION VIDEO GENERATION
Built with ❤️ for the open-source AI community

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support