Upload README.md with huggingface_hub

c3783bc verified about 20 hours ago

8.71 kB

	---
	license: apache-2.0
	library_name: diffusers
	tags:
	- text-to-video
	- image-to-video
	- video-generation
	- diffusers
	pipeline_tag: text-to-video
	inference: true
	base_model: deathlegionteam/LEGION-Video-Gen
	widget:
	- text: "A serene mountain lake at sunset with colorful clouds reflecting on the water"
	# ⚔️ LEGION VIDEO GENERATION — The Ultimate AI Video Engine

	<p align="center">
	<strong>State-of-the-art video generation with 8.3B parameters</strong><br>
	Text-to-Video · Image-to-Video · QWatermark System
	</p>

	<p align="center">
	<img src="https://img.shields.io/badge/Params-8.3B-blue" alt="Parameters">
	<img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
	<img src="https://img.shields.io/badge/GPU-Recommended-red" alt="GPU">
	<a href="https://huggingface.co/deathlegionteam/LEGION-Video-Gen"><img src="https://img.shields.io/badge/🤗%20HuggingFace-LEGION--Video--Gen-blue" alt="HuggingFace"></a>
	</p>

	## 📋 Table of Contents

	- [✨ Features](#-features)
	- [🚀 Quick Start](#-quick-start)
	- [🌐 API Documentation](#-api-documentation)
	- [💧 QWatermark System](#-qwatermark-system)
	- [🤗 HuggingFace](#-huggingface)
	- [🖥️ Project Structure](#️-project-structure)
	- [🎬 Example Prompts](#-example-prompts)
	- [📜 License](#-license)

	## ✨ Features

	- 🎬 Text-to-Video Generation — Create videos from any text prompt with cinematic quality
	- 🖼️ Image-to-Video Generation — Animate still images with controlled motion
	- 💧 QWatermark System — Configurable semi-transparent quality assurance watermark with position, size, opacity, and text controls
	- 🌐 Web Application — Full Gradio UI with dark theme and FastAPI backend
	- 📡 REST API — Programmatic video generation via HTTP endpoints
	- 🛡️ Graceful Fallback — Mock generation mode when no GPU is available

	## 🚀 Quick Start

	### Prerequisites

	- GPU (Recommended): NVIDIA GPU with 16GB+ VRAM (RTX 4090, A100, H100)
	- CPU (Fallback): Works with mock generation mode (test pattern videos)
	- Python 3.10+
	- ~30GB free disk space (model weights)

	### Installation

	```bash
	# Clone the repository
	git clone https://huggingface.co/deathlegionteam/LEGION-Video-Gen
	cd LEGION-Video-Gen

	# Create virtual environment
	python3 -m venv venv
	source venv/bin/activate

	# Install dependencies
	pip install --upgrade pip
	pip install -r requirements.txt

	# Verify installation
	python3 -c "import torch, diffusers, gradio, fastapi; print('OK')"
	```

	### Quick Start — Generate Your First Video

	```python
	from inference import LegionVideoGenerator

	generator = LegionVideoGenerator()
	video_path = generator.generate_from_text(
	prompt="A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality",
	num_frames=49,
	width=480,
	height=480,
	num_inference_steps=50,
	guidance_scale=6.0,
	watermark_strength=0.3,
	)
	print(f"Video saved to: {video_path}")
	```

	### Starting the Web UI

	```bash
	# Start the API backend
	python3 backend/main.py &

	# Start the Gradio frontend
	python3 frontend/app.py

	# Open http://localhost:8080 in your browser
	```

	## 🌐 API Documentation

	### REST API Endpoints

	The backend runs on port 8081 by default.

	\| Method \| Endpoint \| Description \|
	\|--------\|----------\|-------------\|
	\| `GET` \| `/api/status` \| Health check with model and device info \|
	\| `POST` \| `/api/generate/text` \| Generate video from text prompt \|
	\| `POST` \| `/api/generate/image` \| Generate video from image + text prompt \|
	\| `GET` \| `/` \| API root with endpoint listing \|

	### Text-to-Video Generation

	```python
	import requests

	response = requests.post(
	"http://localhost:8081/api/generate/text",
	json={
	"prompt": "A cyberpunk city street at night with neon lights reflecting on wet pavement",
	"negative_prompt": "warped, distorted, flickering, jittery, low quality, blurry, artifacts",
	"num_frames": 49,
	"width": 480,
	"height": 480,
	"num_inference_steps": 50,
	"guidance_scale": 6.0,
	"watermark_strength": 0.3,
	}
	)

	with open("output.mp4", "wb") as f:
	f.write(response.content)
	```

	### Image-to-Video Generation

	```python
	import requests

	with open("input_image.jpg", "rb") as img:
	response = requests.post(
	"http://localhost:8081/api/generate/image",
	files={"file": img},
	data={
	"prompt": "Gentle motion, cinematic camera movement, atmospheric",
	"num_frames": 49,
	"width": 480,
	"height": 480,
	"num_inference_steps": 50,
	"guidance_scale": 6.0,
	"watermark_strength": 0.3,
	}
	)

	with open("animated.mp4", "wb") as f:
	f.write(response.content)
	```

	## 💧 QWatermark System

	The QWatermark (Quality Watermark) system imprints a configurable assurance marker on every generated video.

	\| Parameter \| Description \| Default \|
	\|-----------\|-------------\|---------\|
	\| Text \| Watermark text \| "LEGION" \|
	\| Position \| Placement on frame \| bottom-right \|
	\| Font Size \| Text size \| 36 \|
	\| Opacity \| Transparency \| 0.3 \|
	\| Strength \| Overall intensity \| 0.0 (disabled) - 1.0 (full) \|

	## 🤗 HuggingFace

	- Model Repository: [deathlegionteam/LEGION-Video-Gen](https://huggingface.co/deathlegionteam/LEGION-Video-Gen)
	- Space (Live Demo): [deathlegionteam/LEGION-Video-Gen-Space](https://huggingface.co/spaces/deathlegionteam/LEGION-Video-Gen-Space)

	### Model Weights

	The model is available as a complete Diffusers pipeline on HuggingFace Hub. You can load it directly using the Diffusers library:

	```python
	from diffusers import DiffusionPipeline
	import torch

	pipe = DiffusionPipeline.from_pretrained(
	"deathlegionteam/LEGION-Video-Gen",
	torch_dtype=torch.float16,
	)
	pipe = pipe.to("cuda")
	pipe.vae.enable_tiling()
	pipe.enable_attention_slicing()

	# Generate video
	video_frames = pipe(
	prompt="A serene mountain lake at sunset",
	num_frames=49,
	width=480,
	height=480,
	num_inference_steps=50,
	guidance_scale=6.0,
	).frames[0]
	```

	## 🖥️ Project Structure

	```
	/app/video_generation_pipeline_1006/
	├── inference.py # Core generation class (LegionVideoGenerator)
	├── backend/
	│ └── main.py # FastAPI backend (port 8081)
	├── frontend/
	│ ├── app.py # Gradio frontend (port 8080)
	│ └── streamlit_app.py # Streamlit frontend
	├── models/
	│ ├── t2v/ # T2V model weights (safetensor format)
	│ └── i2v/ # I2V model directory
	├── outputs/ # Generated videos
	├── requirements.txt # Python dependencies
	├── README.md # This file
	└── .space/ # HuggingFace Space configuration
	```

	## 🎬 Example Prompts

	### Text-to-Video

	\| Prompt \| Style \|
	\|--------\|-------\|
	\| "A serene mountain lake at sunset with colorful clouds reflecting on the water, gentle ripples, cinematic quality" \| Nature \|
	\| "A cyberpunk city street at night with neon lights reflecting on wet pavement, flying cars, cinematic, dramatic lighting" \| Sci-Fi \|
	\| "A majestic eagle soaring through misty mountain peaks, golden hour lighting, slow motion, National Geographic quality" \| Wildlife \|
	\| "An astronaut floating in space with Earth in the background, stars twinkling, cinematic, hyperrealistic" \| Space \|
	\| "A cozy medieval tavern interior with fireplace, warm lighting, people chatting, fantasy RPG aesthetic" \| Fantasy \|

	### Image-to-Video

	\| Prompt \| Motion Effect \|
	\|--------\|---------------\|
	\| "Gentle motion, cinematic camera pan, atmospheric" \| Camera movement \|
	\| "Flowing water, leaves rustling in the wind, peaceful" \| Nature animation \|
	\| "Slow zoom in, dramatic reveal, cinematic lighting" \| Zoom effect \|
	\| "Character breathing gently, subtle movement, portrait" \| Portrait animation \|

	## 📊 Performance

	\| Hardware \| Resolution \| Frames \| Steps \| Time \|
	\|----------\|------------\|--------\|-------\|------\|
	\| RTX 4090 (24GB) \| 480p \| 49 \| 50 \| ~2-3 min \|
	\| A100 (80GB) \| 480p \| 49 \| 50 \| ~1-2 min \|
	\| CPU (16+ cores) \| N/A \| Mock \| — \| ~20-30 sec \|

	## 📝 Notes

	- GPU Required for Real Inference: The 8.3B parameter model requires ~16GB VRAM for FP16 inference. Without a GPU, the system runs in mock mode.
	- Disk Space: Full model weights (T2V) are approximately 13GB. Additional I2V variant would add another ~13GB.

	## 📜 License

	This project is licensed under Apache 2.0.

	<p align="center">
	<strong>⚔️ LEGION VIDEO GENERATION</strong><br>
	Built with ❤️ for the open-source AI community
	</p>