File size: 3,214 Bytes
e6066e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
## Wan2.2 Benchmark

### Executive Summary
This report presents a comprehensive performance evaluation of the **[Athena](https://github.com/world-sim-dev/athena)** framework compared to the baseline **[LightX2V](https://github.com/ModelTC/LightX2V)** framework. The benchmarks were conducted using the **[Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI)** model on NVIDIA H100 hardware.

---
### 🎯Test Environment & Versioning
#### Hardware & Settings

| Parameter           | Value           |
| ------------------- | --------------  |
| Hardware            | NVIDIA H100     |
| Model               | Wan2.2-TI2V-5B  |
| Precision           | torch.bfloat16  |
| Inference Steps     | 50              |
| Resolution          | 704 Γ— 1280(720p)|
| FPS                 | 24              |
| CFG                 | Enabled         |
#### Software Versioning
To ensure reproducibility, the following specific commits were used for this benchmark:
| Framework | Branch / Tag | Commit |
| --------- | ------------ | ------ |
| Athena | main|[f676ae6](https://github.com/world-sim-dev/athena/commit/f676ae64ad2fc581289d1c3ae5eb51c15ce76f1d) |
| LightX2V | main | [33f0f67](https://github.com/ModelTC/LightX2V/commit/33f0f67f4ecdff86b1db676d3e0786628cc31c7b) |

### πŸ† Performance Benchmarks
πŸ“Š We compared the iteration speed (seconds per iteration) between Athena and LightX2V across three distinct Context Parallel (CP) configurations.
| Configuration | Frames | LightX2V (s/it) | Athena (s/it)     | Speedup     |
| ------------- | ------ | -------------- | --------------     | -------     |
| CP1           | 121    | 1.928          | **1.69**           | **1.14x** πŸš€|
| CP2           | 121    | 1.197          | **1.06**           | **1.13x** πŸš€|
| CP4           | 241    | 1.767          | **1.32**           | **1.34x** πŸš€|
| CP8           | 241    | 1.507          | **1.35**           | **1.12x** πŸš€|

---

### πŸ’‘ Reproduction Guide
To reproduce the results presented in this report, follow the steps below using the specified commit hashes.
#### Setup
```bash
git clone https://github.com/world-sim-dev/athena
cd athena
git checkout f676ae6
pip install -r requirements.txt

# Clone and install LightX2V (for baseline comparison)
git clone https://github.com/ModelTC/LightX2V
cd lightx2v
git checkout 33f0f67
pip install -r requirements.txt

```

#### Running Benchmarks
For Athena, run:
```
bash ./scripts/run_wan2_2_ti2v_i2v.sh
```
For LightX2V:
Clone the scripts from [Benchmark for LightX2V](https://gist.github.com/wtr0504/629388f17ed38d1c12d5ef5c25a15197) and run:
```
git clone https://gist.github.com/wtr0504/629388f17ed38d1c12d5ef5c25a15197
bash run_wan.sh
```

### πŸ”Ž MagiCompiler Optimization Methodology
**Whole Graph Compilation**
Constant Folding & Dead Code Elimination: Streamlining the computation graph prior to execution.
**Coarse-grained Kernel Fusion**
MagiCompiler aggregates multiple smaller operators into larger, fused kernels. This optimization is critical for efficient execution on the GPU.
**All to All Communication**
MagiCompiler Uses ``all_to_all_single`` (1 communication op per sync point) while LightX2V Uses all_to_all x 3 (3 separate communication ops).