## Wan2.2 Benchmark ### Executive Summary This report presents a comprehensive performance evaluation of the **[Athena](https://github.com/world-sim-dev/athena)** framework compared to the baseline **[LightX2V](https://github.com/ModelTC/LightX2V)** framework. The benchmarks were conducted using the **[Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI)** model on NVIDIA H100 hardware. --- ### 🎯Test Environment & Versioning #### Hardware & Settings | Parameter | Value | | ------------------- | -------------- | | Hardware | NVIDIA H100 | | Model | Wan2.2-TI2V-5B | | Precision | torch.bfloat16 | | Inference Steps | 50 | | Resolution | 704 Γ— 1280(720p)| | FPS | 24 | | CFG | Enabled | #### Software Versioning To ensure reproducibility, the following specific commits were used for this benchmark: | Framework | Branch / Tag | Commit | | --------- | ------------ | ------ | | Athena | main|[f676ae6](https://github.com/world-sim-dev/athena/commit/f676ae64ad2fc581289d1c3ae5eb51c15ce76f1d) | | LightX2V | main | [33f0f67](https://github.com/ModelTC/LightX2V/commit/33f0f67f4ecdff86b1db676d3e0786628cc31c7b) | ### πŸ† Performance Benchmarks πŸ“Š We compared the iteration speed (seconds per iteration) between Athena and LightX2V across three distinct Context Parallel (CP) configurations. | Configuration | Frames | LightX2V (s/it) | Athena (s/it) | Speedup | | ------------- | ------ | -------------- | -------------- | ------- | | CP1 | 121 | 1.928 | **1.69** | **1.14x** πŸš€| | CP2 | 121 | 1.197 | **1.06** | **1.13x** πŸš€| | CP4 | 241 | 1.767 | **1.32** | **1.34x** πŸš€| | CP8 | 241 | 1.507 | **1.35** | **1.12x** πŸš€| --- ### πŸ’‘ Reproduction Guide To reproduce the results presented in this report, follow the steps below using the specified commit hashes. #### Setup ```bash git clone https://github.com/world-sim-dev/athena cd athena git checkout f676ae6 pip install -r requirements.txt # Clone and install LightX2V (for baseline comparison) git clone https://github.com/ModelTC/LightX2V cd lightx2v git checkout 33f0f67 pip install -r requirements.txt ``` #### Running Benchmarks For Athena, run: ``` bash ./scripts/run_wan2_2_ti2v_i2v.sh ``` For LightX2V: Clone the scripts from [Benchmark for LightX2V](https://gist.github.com/wtr0504/629388f17ed38d1c12d5ef5c25a15197) and run: ``` git clone https://gist.github.com/wtr0504/629388f17ed38d1c12d5ef5c25a15197 bash run_wan.sh ``` ### πŸ”Ž MagiCompiler Optimization Methodology **Whole Graph Compilation** Constant Folding & Dead Code Elimination: Streamlining the computation graph prior to execution. **Coarse-grained Kernel Fusion** MagiCompiler aggregates multiple smaller operators into larger, fused kernels. This optimization is critical for efficient execution on the GPU. **All to All Communication** MagiCompiler Uses ``all_to_all_single`` (1 communication op per sync point) while LightX2V Uses all_to_all x 3 (3 separate communication ops).