LEGION-Video-Gen / README.md
dineth554's picture
Upload README.md with huggingface_hub
c3783bc verified

license: apache-2.0 library_name: diffusers tags:

  • text-to-video
  • image-to-video
  • video-generation
  • diffusers pipeline_tag: text-to-video inference: true base_model: deathlegionteam/LEGION-Video-Gen widget:
  • text: "A serene mountain lake at sunset with colorful clouds reflecting on the water"

βš”οΈ LEGION VIDEO GENERATION β€” The Ultimate AI Video Engine

State-of-the-art video generation with 8.3B parameters
Text-to-Video Β· Image-to-Video Β· QWatermark System

Parameters License GPU HuggingFace

πŸ“‹ Table of Contents

✨ Features

  • 🎬 Text-to-Video Generation β€” Create videos from any text prompt with cinematic quality
  • πŸ–ΌοΈ Image-to-Video Generation β€” Animate still images with controlled motion
  • πŸ’§ QWatermark System β€” Configurable semi-transparent quality assurance watermark with position, size, opacity, and text controls
  • 🌐 Web Application β€” Full Gradio UI with dark theme and FastAPI backend
  • πŸ“‘ REST API β€” Programmatic video generation via HTTP endpoints
  • πŸ›‘οΈ Graceful Fallback β€” Mock generation mode when no GPU is available

πŸš€ Quick Start

Prerequisites

  • GPU (Recommended): NVIDIA GPU with 16GB+ VRAM (RTX 4090, A100, H100)
  • CPU (Fallback): Works with mock generation mode (test pattern videos)
  • Python 3.10+
  • ~30GB free disk space (model weights)

Installation

# Clone the repository
git clone https://huggingface.co/deathlegionteam/LEGION-Video-Gen
cd LEGION-Video-Gen

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Verify installation
python3 -c "import torch, diffusers, gradio, fastapi; print('OK')"

Quick Start β€” Generate Your First Video

from inference import LegionVideoGenerator

generator = LegionVideoGenerator()
video_path = generator.generate_from_text(
    prompt="A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality",
    num_frames=49,
    width=480,
    height=480,
    num_inference_steps=50,
    guidance_scale=6.0,
    watermark_strength=0.3,
)
print(f"Video saved to: {video_path}")

Starting the Web UI

# Start the API backend
python3 backend/main.py &

# Start the Gradio frontend
python3 frontend/app.py

# Open http://localhost:8080 in your browser

🌐 API Documentation

REST API Endpoints

The backend runs on port 8081 by default.

Method Endpoint Description
GET /api/status Health check with model and device info
POST /api/generate/text Generate video from text prompt
POST /api/generate/image Generate video from image + text prompt
GET / API root with endpoint listing

Text-to-Video Generation

import requests

response = requests.post(
    "http://localhost:8081/api/generate/text",
    json={
        "prompt": "A cyberpunk city street at night with neon lights reflecting on wet pavement",
        "negative_prompt": "warped, distorted, flickering, jittery, low quality, blurry, artifacts",
        "num_frames": 49,
        "width": 480,
        "height": 480,
        "num_inference_steps": 50,
        "guidance_scale": 6.0,
        "watermark_strength": 0.3,
    }
)

with open("output.mp4", "wb") as f:
    f.write(response.content)

Image-to-Video Generation

import requests

with open("input_image.jpg", "rb") as img:
    response = requests.post(
        "http://localhost:8081/api/generate/image",
        files={"file": img},
        data={
            "prompt": "Gentle motion, cinematic camera movement, atmospheric",
            "num_frames": 49,
            "width": 480,
            "height": 480,
            "num_inference_steps": 50,
            "guidance_scale": 6.0,
            "watermark_strength": 0.3,
        }
    )

with open("animated.mp4", "wb") as f:
    f.write(response.content)

πŸ’§ QWatermark System

The QWatermark (Quality Watermark) system imprints a configurable assurance marker on every generated video.

Parameter Description Default
Text Watermark text "LEGION"
Position Placement on frame bottom-right
Font Size Text size 36
Opacity Transparency 0.3
Strength Overall intensity 0.0 (disabled) - 1.0 (full)

πŸ€— HuggingFace

Model Weights

The model is available as a complete Diffusers pipeline on HuggingFace Hub. You can load it directly using the Diffusers library:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "deathlegionteam/LEGION-Video-Gen",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
pipe.vae.enable_tiling()
pipe.enable_attention_slicing()

# Generate video
video_frames = pipe(
    prompt="A serene mountain lake at sunset",
    num_frames=49,
    width=480,
    height=480,
    num_inference_steps=50,
    guidance_scale=6.0,
).frames[0]

πŸ–₯️ Project Structure

/app/video_generation_pipeline_1006/
β”œβ”€β”€ inference.py           # Core generation class (LegionVideoGenerator)
β”œβ”€β”€ backend/
β”‚   └── main.py            # FastAPI backend (port 8081)
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app.py             # Gradio frontend (port 8080)
β”‚   └── streamlit_app.py   # Streamlit frontend
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ t2v/               # T2V model weights (safetensor format)
β”‚   └── i2v/               # I2V model directory
β”œβ”€β”€ outputs/               # Generated videos
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md              # This file
└── .space/                # HuggingFace Space configuration

🎬 Example Prompts

Text-to-Video

Prompt Style
"A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality" Nature
"A cyberpunk city street at night with neon lights reflecting on wet pavement, flying cars, cinematic, dramatic lighting" Sci-Fi
"A majestic eagle soaring through misty mountain peaks, golden hour lighting, slow motion, National Geographic quality" Wildlife
"An astronaut floating in space with Earth in the background, stars twinkling, cinematic, hyperrealistic" Space
"A cozy medieval tavern interior with fireplace, warm lighting, people chatting, fantasy RPG aesthetic" Fantasy

Image-to-Video

Prompt Motion Effect
"Gentle motion, cinematic camera pan, atmospheric" Camera movement
"Flowing water, leaves rustling in the wind, peaceful" Nature animation
"Slow zoom in, dramatic reveal, cinematic lighting" Zoom effect
"Character breathing gently, subtle movement, portrait" Portrait animation

πŸ“Š Performance

Hardware Resolution Frames Steps Time
RTX 4090 (24GB) 480p 49 50 ~2-3 min
A100 (80GB) 480p 49 50 ~1-2 min
CPU (16+ cores) N/A Mock β€” ~20-30 sec

πŸ“ Notes

  • GPU Required for Real Inference: The 8.3B parameter model requires ~16GB VRAM for FP16 inference. Without a GPU, the system runs in mock mode.
  • Disk Space: Full model weights (T2V) are approximately 13GB. Additional I2V variant would add another ~13GB.

πŸ“œ License

This project is licensed under Apache 2.0.

βš”οΈ LEGION VIDEO GENERATION
Built with ❀️ for the open-source AI community