HY-WorldPlay FP8 Quantized (48GB GPU Ready)

HY-WorldPlay (8B Dense DiT, 72GB VRAM at BF16) compressed to 37.4GB peak via:

Native FP8 weights (float8_e4m3fn, per-tensor scale) — 32GB → 8GB (4x)
turbo3 V cache compression (PolarQuant 3-bit) — runtime, no pre-saved data needed

Successfully runs on a single RTX 4090 48GB or L40S 48GB (SM89 required for FP8).

Results

Configuration	GPU	Peak VRAM	Status
BF16 baseline	A800 80GB	73.8 GB	✅
BF16 baseline	RTX 4090 48GB	OOM (46.5 GB)	❌
FP8 + turbo3	RTX 4090 48GB	37.4 GB	✅

Inference Speed (v3: _scaled_mm + SageAttention)

Chunk	Time/step
0	~0.86s
4	~2.4s
7	~2.8s
Total	196.8s (4 steps × 8 chunks)

Files

diffusion_pytorch_model.fp8.safetensors — FP8 quantized transformer weights (8GB)
scripts/native_fp8_patch.py — FP8 Linear layer with torch._scaled_mm
scripts/turbo3_integration.py — V cache PolarQuant compression (GPU optimized)
scripts/run_fp8_turbo3_gpu.py — inference wrapper
scripts/run_fp8_turbo3_gpu.sh — one-click launch script
scripts/batch_inference.py — batch inference with random WASD poses
videos/ — generated video samples

Usage

Requirements

GPU: SM89 (RTX 4090 / L40S) with ≥ 48GB VRAM
PyTorch ≥ 2.1 with FP8 support
HY-WorldPlay codebase
SageAttention (optional, 1.8x attention speedup)

Quick Start

# 1. Clone HY-WorldPlay
git clone https://github.com/Tencent/HunyuanVideo.git
cd HunyuanVideo

# 2. Download this repo's FP8 weights
# Place diffusion_pytorch_model.fp8.safetensors in your model directory

# 3. Run inference
bash scripts/run_fp8_turbo3_gpu.sh

Loading FP8 Weights

import safetensors.torch
import torch

# Load FP8 quantized weights
state_dict = safetensors.torch.load_file("diffusion_pytorch_model.fp8.safetensors")

# Weights with dtype float8_e4m3fn are quantized
# Corresponding *_scale tensors contain per-tensor scales
# Dequantize: weight_bf16 = fp8_weight.to(bfloat16) * weight_scale

Quality

Optimization	Cosine Similarity	Verified
FP8 weights	> 0.999	✅
V cache turbo3 (3-bit)	0.983	✅ (A800 real KV cache)
FP8 + turbo3 combined	end-to-end video generated	✅

Acknowledgments

Based on Tencent-Hunyuan/HY-WorldPlay (Apache 2.0).

Downloads last month: -