Wan-R1 / README.md
HY-Wan's picture
Update README.md
e887d90 verified
---
license: mit
task_categories:
- video-classification
- reinforcement-learning
- robotics
language:
- en
tags:
- Chain-of-Frames
- Video-Reasoning
- Visual-Planning
- Maze
- Wan
size_categories:
- 10K<n<100K
base_model:
- Wan-AI/Wan2.2-TI2V-5B
pipeline_tag: image-to-video
---
<h2 align="center">
<strong>Wan-R1: A Reasoning-via-Video Maze-Solving Model</strong>
</h2>
<p align="center">
Fine-tuned on <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">VR-Bench</a> to evaluate and enhance video-based reasoning ability across structured maze environments.
</p>
<p align="center">
<a href="https://imyangc7.github.io/VRBench_Web/">
<img alt="Project" src="https://img.shields.io/badge/Project-Homepage-blue?logo=windowsterminal&logoColor=white" />
</a>
<a href="https://github.com/ImYangC7/VR-Bench">
<img alt="GitHub" src="https://img.shields.io/badge/GitHub-black?logo=github" />
</a>
<a href="https://huggingface.co/HY-Wan/Wan-R1">
<img alt="HuggingFace" src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=black" />
</a>
</p>
<h2>πŸ“° News</h2>
<ul>
<li><strong>2026-01-04</strong>: πŸš€ Released <b>Wan_R1_General_5B</b>, a general-purpose model fine-tuned on the <b>entire VR-Bench suite</b> (all sub-tasks combined).</li>
<li><strong>2025-11-20</strong>: Released 5 fine-tuned Wan-R1 models (3D, Regular, Irregular, Sokoban, Trapfield) trained on VR-Bench.</li>
<li><strong>2025-12</strong>: In-progress: preparing codebase for fine-tuning and evaluation release.</li>
</ul>
<h2>πŸ”§ Future Work</h2>
<ul>
<li>πŸ“¦ Release <strong>LoRA fine-tuning scripts</strong> based on VR-Bench.</li>
<li>πŸ“Š Open-source <strong>evaluation toolkit</strong> for reasoning via video.</li>
<li>πŸ“ Provide <strong>training logs & hyperparameters</strong> for full reproducibility.</li>
</ul>
<h2>🧠 Models</h2>
<table>
<thead>
<tr>
<th style="text-align: center;">Model</th>
<th style="text-align: center;">Download</th>
<th style="text-align: left;">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_General_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_5B.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;"><strong>New!</strong> Full LoRA fine-tuned on <b>all VR-Bench tasks</b>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_3d_maze_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_3d_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Maze3D</a> tasks (easy, medium, and hard) from the base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_irregular_maze_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_irregular_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">PathFinder</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_regular_maze_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_regular_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Maze</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_sokoban_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_sokoban_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Sokoban</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_trapfield_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">πŸ€— <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_trapfield_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">TrapField</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
</tbody>
</table>
<h2 align="center">πŸ“‘ Citation</h2>
<p align="center">
If you use this model or the VR-Bench dataset in your work, please cite:
</p>
<p align="center">
πŸ“„ <a href="https://arxiv.org/abs/2511.15065">
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
</a>
</p>
<pre>
<code>
@misc{yang2025reasoningvideoevaluationvideo,
title={Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks},
author={Cheng Yang and Haiyuan Wan and Yiran Peng and Xin Cheng and Zhaoyang Yu and Jiayi Zhang and Junchi Yu and Xinlei Yu and Xiawu Zheng and Dongzhan Zhou and Chenglin Wu},
year={2025},
eprint={2511.15065},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.15065},
}
</code>
</pre>