File size: 5,693 Bytes
17eb8eb
d9cbf02
 
17eb8eb
 
d9cbf02
 
17eb8eb
73d328e
 
 
 
 
d9cbf02
 
 
 
17eb8eb
 
 
 
23c4b8d
 
 
 
 
618f6f3
23c4b8d
 
 
618f6f3
23c4b8d
 
618f6f3
23c4b8d
 
 
 
 
 
 
17eb8eb
599270b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b28bbfc
 
1c2ecb6
e980025
 
1c2ecb6
 
 
e980025
 
 
 
1c2ecb6
 
 
b28bbfc
 
1c2ecb6
 
e14bdff
b28bbfc
 
1c2ecb6
 
e14bdff
e980025
b28bbfc
1c2ecb6
 
e14bdff
b28bbfc
 
1c2ecb6
 
e14bdff
b28bbfc
e980025
2902535
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9cbf02
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
base_model:
- Wan-AI/Wan2.2-TI2V-5B
language:
- en
license: mit
pipeline_tag: video-to-video
tags:
- Chain-of-Frames
- Video-Reasoning
- Visual-Planning
- Maze
- Wan
task_categories:
- video-classification
- reinforcement-learning
- robotics
size_categories:
- 10K<n<100K
---

<h2 align="center">
  <strong>Wan-R1: A Reasoning-via-Video Maze-Solving Model</strong>
</h2>

<p align="center">
  Fine-tuned on <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">VR-Bench</a> to evaluate and enhance video-based reasoning ability across structured maze environments.
</p>

<p align="center">
  <a href="https://imyangc7.github.io/VRBench_Web/">
    <img alt="Project" src="https://img.shields.io/badge/Project-Homepage-blue?logo=windowsterminal&logoColor=white" />
  </a>
  <a href="https://github.com/ImYangC7/VR-Bench">
    <img alt="GitHub" src="https://img.shields.io/badge/GitHub-black?logo=github" />
  </a>
  <a href="https://huggingface.co/HY-Wan/Wan-R1">
    <img alt="HuggingFace" src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=black" />
  </a>
</p>


<h2>πŸ“° News</h2>

<ul>
  <li><strong>2025-11-20</strong>: Released 5 fine-tuned Wan-R1 models (3D, Regular, Irregular, Sokoban, Trapfield) trained on VR-Bench.</li>
  <li><strong>2025-12</strong>: In-progress: preparing codebase for fine-tuning and evaluation release.</li>
</ul>

<h2>πŸ”§ Future Work</h2>

<ul>
  <li>πŸ“¦ Release <strong>LoRA fine-tuning scripts</strong> based on VR-Bench.</li>
  <li>πŸ“Š Open-source <strong>evaluation toolkit</strong> for reasoning via video.</li>
  <li>πŸ“ Provide <strong>training logs & hyperparameters</strong> for full reproducibility.</li>
</ul>


<h2>🧠 Models</h2>

<table>
  <thead>
    <tr>
      <th style="text-align: center;">Model</th>
      <th style="text-align: center;">Download</th>
      <th style="text-align: left;">Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_3d_maze_5B</strong></td>
      <td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_3d_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
      <td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Maze3D</a> tasks (easy, medium, and hard) from the base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
    </tr>
    <tr>
      <td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_irregular_maze_5B</strong></td>
      <td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_irregular_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
      <td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">PathFinder</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
    </tr>
    <tr>
      <td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_regular_maze_5B</strong></td>
      <td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_regular_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
      <td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Maze</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
    </tr>
    <tr>
      <td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_sokoban_5B</strong></td>
      <td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_sokoban_wan22_5b_lora.safetensors">HuggingFace</a></td>
      <td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Sokoban</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
    </tr>
    <tr>
      <td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_trapfield_5B</strong></td>
      <td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_trapfield_wan22_5b_lora.safetensors">HuggingFace</a></td>
      <td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">TrapField</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
    </tr>
  </tbody>
</table>

<h2 align="center">πŸ“‘ Citation</h2>

<p align="center">
  If you use this model or the VR-Bench dataset in your work, please cite:
</p>

<p align="center">
  πŸ“„ <a href="https://arxiv.org/abs/2511.15065">
    Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
  </a>
</p>

<pre>
<code>
@misc{yang2025reasoningvideoevaluationvideo,
      title={Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks}, 
      author={Cheng Yang and Haiyuan Wan and Yiran Peng and Xin Cheng and Zhaoyang Yu and Jiayi Zhang and Junchi Yu and Xinlei Yu and Xiawu Zheng and Dongzhan Zhou and Chenglin Wu},
      year={2025},
      eprint={2511.15065},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.15065}, 
}
</code>
</pre>