daVinci-MagiHuman / pkgs /MagiCompiler /docs /Hunyuan15Benchmark.md
jiadisu
Switch back to Docker SDK with local pkgs
e6066e8

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

Hunyuan1.5 Benchmark

Executive Summary

This report presents a comprehensive performance evaluation of the Athena framework compared to the baseline LightX2V framework. The benchmarks were conducted using the Hunyuan-1.5 model on NVIDIA H100 hardware.


🎯Test Environment & Versioning

Hardware & Settings

Parameter Value
Hardware NVIDIA H100
Model Hunyuan-1.5 480p_t2v_distilled
Precision torch.bfloat16
Inference Steps 20
Resolution 480p
FPS 24
CFG Disable

Software Versioning

To ensure reproducibility, the following specific commits were used for this benchmark:

Framework Branch / Tag Commit
Athena main 5e6086b
LightX2V main 5573905

πŸ† Performance Benchmarks

πŸ“Š We compared the iteration speed (seconds per iteration) between Athena and LightX2V across three distinct Context Parallel (CP) configurations.

Configuration Frames LightX2V (s/it) Athena (s/it) Speedup
CP1 121 2.42 2.06 1.17x πŸš€
CP2 121 1.38 1.13 1.22x πŸš€
CP4 241 2.25 1.85 1.22x πŸš€
CP8 241 1.28 1.01 1.27x πŸš€

πŸ“Ή Output Comparison

Framework Video Result
Athena
LightX2V

πŸ’‘ Reproduction Guide

To reproduce the results presented in this report, follow the steps below using the specified commit hashes.

Setup

git clone https://github.com/world-sim-dev/athena
cd athena
git checkout 5e6086b
pip install -r requirements.txt
pip install -r requirements-nodeps.txt
pip install -e ./pkgs/MagiCompiler --no-build-isolation --config-settings editable_mode=compat


# Clone and install LightX2V (for baseline comparison)
git clone https://github.com/ModelTC/LightX2V
cd lightx2v
git checkout 5573905
pip install -v .

Running Benchmarks

For Athena, run:

RESOLUTION=480p CFG_DISTILLED=true TASK=t2v CHECKPOINT_PATH=path/to/480p_t2v_distilled bash ./scripts/run_hunyuan.sh

For LightX2V: Clone the scripts from Benchmark for LightX2V and run:

git clone https://gist.github.com/wtr0504/d80bbebb7da1ef7b58f3e6faf1c68880
MODEL_PATH=path/to/HunyuanVideo-1.5 DISTILL_CKPT=path/to/480p_t2v_distilled/diffusion_pytorch_model.safetensors bash run_hunyuan.sh

πŸ”Ž MagiCompiler Optimization Methodology

Whole Graph Compilation Constant Folding & Dead Code Elimination: Streamlining the computation graph prior to execution.

Coarse-grained Kernel Fusion MagiCompiler aggregates multiple smaller operators into larger, fused kernels. This optimization is critical for efficient execution on the GPU.