daVinci-MagiHuman / pkgs /MagiCompiler /docs /Wan2.2Benchmark.md
jiadisu
Switch back to Docker SDK with local pkgs
e6066e8

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Wan2.2 Benchmark

Executive Summary

This report presents a comprehensive performance evaluation of the Athena framework compared to the baseline LightX2V framework. The benchmarks were conducted using the Wan2.2-TI2V-5B model on NVIDIA H100 hardware.


🎯Test Environment & Versioning

Hardware & Settings

Parameter Value
Hardware NVIDIA H100
Model Wan2.2-TI2V-5B
Precision torch.bfloat16
Inference Steps 50
Resolution 704 Γ— 1280(720p)
FPS 24
CFG Enabled

Software Versioning

To ensure reproducibility, the following specific commits were used for this benchmark:

Framework Branch / Tag Commit
Athena main f676ae6
LightX2V main 33f0f67

πŸ† Performance Benchmarks

πŸ“Š We compared the iteration speed (seconds per iteration) between Athena and LightX2V across three distinct Context Parallel (CP) configurations.

Configuration Frames LightX2V (s/it) Athena (s/it) Speedup
CP1 121 1.928 1.69 1.14x πŸš€
CP2 121 1.197 1.06 1.13x πŸš€
CP4 241 1.767 1.32 1.34x πŸš€
CP8 241 1.507 1.35 1.12x πŸš€

πŸ’‘ Reproduction Guide

To reproduce the results presented in this report, follow the steps below using the specified commit hashes.

Setup

git clone https://github.com/world-sim-dev/athena
cd athena
git checkout f676ae6
pip install -r requirements.txt

# Clone and install LightX2V (for baseline comparison)
git clone https://github.com/ModelTC/LightX2V
cd lightx2v
git checkout 33f0f67
pip install -r requirements.txt

Running Benchmarks

For Athena, run:

bash ./scripts/run_wan2_2_ti2v_i2v.sh

For LightX2V: Clone the scripts from Benchmark for LightX2V and run:

git clone https://gist.github.com/wtr0504/629388f17ed38d1c12d5ef5c25a15197
bash run_wan.sh

πŸ”Ž MagiCompiler Optimization Methodology

Whole Graph Compilation Constant Folding & Dead Code Elimination: Streamlining the computation graph prior to execution. Coarse-grained Kernel Fusion MagiCompiler aggregates multiple smaller operators into larger, fused kernels. This optimization is critical for efficient execution on the GPU. All to All Communication MagiCompiler Uses all_to_all_single (1 communication op per sync point) while LightX2V Uses all_to_all x 3 (3 separate communication ops).