File size: 3,727 Bytes
e6066e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
## Hunyuan1.5 Benchmark

### Executive Summary
This report presents a comprehensive performance evaluation of the **[Athena](https://github.com/world-sim-dev/athena)** framework compared to the baseline **[LightX2V](https://github.com/ModelTC/LightX2V)** framework. The benchmarks were conducted using the **[Hunyuan-1.5](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5)** model on NVIDIA H100 hardware.

---
### 🎯Test Environment & Versioning
#### Hardware & Settings

| Parameter           | Value           |
| ------------------- | --------------  |
| Hardware            | NVIDIA H100     |
| Model               | Hunyuan-1.5 480p_t2v_distilled  |
| Precision           | torch.bfloat16  |
| Inference Steps     | 20              |
| Resolution          | 480p            |
| FPS                 | 24              |
| CFG                 | Disable         |
#### Software Versioning
To ensure reproducibility, the following specific commits were used for this benchmark:
| Framework | Branch / Tag | Commit |
| --------- | ------------ | ------ |
| Athena | main|[5e6086b](https://github.com/world-sim-dev/athena/commit/5e6086b4dc2ab60bc4d44dbe39745b4354075121) |
| LightX2V | main | [5573905](https://github.com/ModelTC/LightX2V/commit/5573905f3f38d876d468b815f86d417a608975b6) |

### πŸ† Performance Benchmarks
πŸ“Š We compared the iteration speed (seconds per iteration) between Athena and LightX2V across three distinct Context Parallel (CP) configurations.
| Configuration | Frames | LightX2V (s/it) | Athena (s/it)     | Speedup     |
| ------------- | ------ | -------------- | --------------     | -------     |
| CP1           | 121    | 2.42          | **2.06**           | **1.17x** πŸš€|
| CP2           | 121    | 1.38          | **1.13**           | **1.22x** πŸš€|
| CP4           | 241    | 2.25         | **1.85**           | **1.22x** πŸš€|
| CP8           | 241    | 1.28       | **1.01**           | **1.27x** πŸš€|

---
### πŸ“Ή Output Comparison
| Framework | Video Result |
| --------- | ---------------------------- |
| Athena    | <img src="../../../assets/athena_hunyuan_1_5_test_videos_20260213_155842_idx0_A_close-up815965.gif" width="480" /> |
| LightX2V  | <img src="../../../assets/lightx2v_hunyuan_1_5_result_A_close-up122526.gif" width="480" /> |


### πŸ’‘ Reproduction Guide
To reproduce the results presented in this report, follow the steps below using the specified commit hashes.
#### Setup
```bash
git clone https://github.com/world-sim-dev/athena
cd athena
git checkout 5e6086b
pip install -r requirements.txt
pip install -r requirements-nodeps.txt
pip install -e ./pkgs/MagiCompiler --no-build-isolation --config-settings editable_mode=compat


# Clone and install LightX2V (for baseline comparison)
git clone https://github.com/ModelTC/LightX2V
cd lightx2v
git checkout 5573905
pip install -v .
```

#### Running Benchmarks
For Athena, run:
```
RESOLUTION=480p CFG_DISTILLED=true TASK=t2v CHECKPOINT_PATH=path/to/480p_t2v_distilled bash ./scripts/run_hunyuan.sh
```
For LightX2V:
Clone the scripts from [Benchmark for LightX2V](https://gist.github.com/wtr0504/d80bbebb7da1ef7b58f3e6faf1c68880) and run:
```
git clone https://gist.github.com/wtr0504/d80bbebb7da1ef7b58f3e6faf1c68880
MODEL_PATH=path/to/HunyuanVideo-1.5 DISTILL_CKPT=path/to/480p_t2v_distilled/diffusion_pytorch_model.safetensors bash run_hunyuan.sh
```

### πŸ”Ž MagiCompiler Optimization Methodology
**Whole Graph Compilation**
Constant Folding & Dead Code Elimination: Streamlining the computation graph prior to execution.

**Coarse-grained Kernel Fusion**
MagiCompiler aggregates multiple smaller operators into larger, fused kernels. This optimization is critical for efficient execution on the GPU.