## Hunyuan1.5 Benchmark ### Executive Summary This report presents a comprehensive performance evaluation of the **[Athena](https://github.com/world-sim-dev/athena)** framework compared to the baseline **[LightX2V](https://github.com/ModelTC/LightX2V)** framework. The benchmarks were conducted using the **[Hunyuan-1.5](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5)** model on NVIDIA H100 hardware. --- ### 🎯Test Environment & Versioning #### Hardware & Settings | Parameter | Value | | ------------------- | -------------- | | Hardware | NVIDIA H100 | | Model | Hunyuan-1.5 480p_t2v_distilled | | Precision | torch.bfloat16 | | Inference Steps | 20 | | Resolution | 480p | | FPS | 24 | | CFG | Disable | #### Software Versioning To ensure reproducibility, the following specific commits were used for this benchmark: | Framework | Branch / Tag | Commit | | --------- | ------------ | ------ | | Athena | main|[5e6086b](https://github.com/world-sim-dev/athena/commit/5e6086b4dc2ab60bc4d44dbe39745b4354075121) | | LightX2V | main | [5573905](https://github.com/ModelTC/LightX2V/commit/5573905f3f38d876d468b815f86d417a608975b6) | ### πŸ† Performance Benchmarks πŸ“Š We compared the iteration speed (seconds per iteration) between Athena and LightX2V across three distinct Context Parallel (CP) configurations. | Configuration | Frames | LightX2V (s/it) | Athena (s/it) | Speedup | | ------------- | ------ | -------------- | -------------- | ------- | | CP1 | 121 | 2.42 | **2.06** | **1.17x** πŸš€| | CP2 | 121 | 1.38 | **1.13** | **1.22x** πŸš€| | CP4 | 241 | 2.25 | **1.85** | **1.22x** πŸš€| | CP8 | 241 | 1.28 | **1.01** | **1.27x** πŸš€| --- ### πŸ“Ή Output Comparison | Framework | Video Result | | --------- | ---------------------------- | | Athena | | | LightX2V | | ### πŸ’‘ Reproduction Guide To reproduce the results presented in this report, follow the steps below using the specified commit hashes. #### Setup ```bash git clone https://github.com/world-sim-dev/athena cd athena git checkout 5e6086b pip install -r requirements.txt pip install -r requirements-nodeps.txt pip install -e ./pkgs/MagiCompiler --no-build-isolation --config-settings editable_mode=compat # Clone and install LightX2V (for baseline comparison) git clone https://github.com/ModelTC/LightX2V cd lightx2v git checkout 5573905 pip install -v . ``` #### Running Benchmarks For Athena, run: ``` RESOLUTION=480p CFG_DISTILLED=true TASK=t2v CHECKPOINT_PATH=path/to/480p_t2v_distilled bash ./scripts/run_hunyuan.sh ``` For LightX2V: Clone the scripts from [Benchmark for LightX2V](https://gist.github.com/wtr0504/d80bbebb7da1ef7b58f3e6faf1c68880) and run: ``` git clone https://gist.github.com/wtr0504/d80bbebb7da1ef7b58f3e6faf1c68880 MODEL_PATH=path/to/HunyuanVideo-1.5 DISTILL_CKPT=path/to/480p_t2v_distilled/diffusion_pytorch_model.safetensors bash run_hunyuan.sh ``` ### πŸ”Ž MagiCompiler Optimization Methodology **Whole Graph Compilation** Constant Folding & Dead Code Elimination: Streamlining the computation graph prior to execution. **Coarse-grained Kernel Fusion** MagiCompiler aggregates multiple smaller operators into larger, fused kernels. This optimization is critical for efficient execution on the GPU.