File size: 5,693 Bytes
17eb8eb d9cbf02 17eb8eb d9cbf02 17eb8eb 73d328e d9cbf02 17eb8eb 23c4b8d 618f6f3 23c4b8d 618f6f3 23c4b8d 618f6f3 23c4b8d 17eb8eb 599270b b28bbfc 1c2ecb6 e980025 1c2ecb6 e980025 1c2ecb6 b28bbfc 1c2ecb6 e14bdff b28bbfc 1c2ecb6 e14bdff e980025 b28bbfc 1c2ecb6 e14bdff b28bbfc 1c2ecb6 e14bdff b28bbfc e980025 2902535 d9cbf02 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
base_model:
- Wan-AI/Wan2.2-TI2V-5B
language:
- en
license: mit
pipeline_tag: video-to-video
tags:
- Chain-of-Frames
- Video-Reasoning
- Visual-Planning
- Maze
- Wan
task_categories:
- video-classification
- reinforcement-learning
- robotics
size_categories:
- 10K<n<100K
---
<h2 align="center">
<strong>Wan-R1: A Reasoning-via-Video Maze-Solving Model</strong>
</h2>
<p align="center">
Fine-tuned on <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">VR-Bench</a> to evaluate and enhance video-based reasoning ability across structured maze environments.
</p>
<p align="center">
<a href="https://imyangc7.github.io/VRBench_Web/">
<img alt="Project" src="https://img.shields.io/badge/Project-Homepage-blue?logo=windowsterminal&logoColor=white" />
</a>
<a href="https://github.com/ImYangC7/VR-Bench">
<img alt="GitHub" src="https://img.shields.io/badge/GitHub-black?logo=github" />
</a>
<a href="https://huggingface.co/HY-Wan/Wan-R1">
<img alt="HuggingFace" src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=black" />
</a>
</p>
<h2>π° News</h2>
<ul>
<li><strong>2025-11-20</strong>: Released 5 fine-tuned Wan-R1 models (3D, Regular, Irregular, Sokoban, Trapfield) trained on VR-Bench.</li>
<li><strong>2025-12</strong>: In-progress: preparing codebase for fine-tuning and evaluation release.</li>
</ul>
<h2>π§ Future Work</h2>
<ul>
<li>π¦ Release <strong>LoRA fine-tuning scripts</strong> based on VR-Bench.</li>
<li>π Open-source <strong>evaluation toolkit</strong> for reasoning via video.</li>
<li>π Provide <strong>training logs & hyperparameters</strong> for full reproducibility.</li>
</ul>
<h2>π§ Models</h2>
<table>
<thead>
<tr>
<th style="text-align: center;">Model</th>
<th style="text-align: center;">Download</th>
<th style="text-align: left;">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_3d_maze_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">π€ <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_3d_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Maze3D</a> tasks (easy, medium, and hard) from the base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_irregular_maze_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">π€ <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_irregular_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">PathFinder</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_regular_maze_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">π€ <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_regular_maze_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Maze</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_sokoban_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">π€ <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_sokoban_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">Sokoban</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"><strong>Wan_R1_trapfield_5B</strong></td>
<td style="text-align: center; vertical-align: middle;">π€ <a href="https://huggingface.co/HY-Wan/Wan-R1/blob/main/Wan_R1_trapfield_wan22_5b_lora.safetensors">HuggingFace</a></td>
<td style="vertical-align: middle;">Fine-tuned LoRA for <a href="https://huggingface.co/datasets/amagipeng/VR-Bench">TrapField</a> tasks (easy, medium, and hard) from base model <a href="https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B">Wan2.2-TI2V-5B</a>.</td>
</tr>
</tbody>
</table>
<h2 align="center">π Citation</h2>
<p align="center">
If you use this model or the VR-Bench dataset in your work, please cite:
</p>
<p align="center">
π <a href="https://arxiv.org/abs/2511.15065">
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
</a>
</p>
<pre>
<code>
@misc{yang2025reasoningvideoevaluationvideo,
title={Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks},
author={Cheng Yang and Haiyuan Wan and Yiran Peng and Xin Cheng and Zhaoyang Yu and Jiayi Zhang and Junchi Yu and Xinlei Yu and Xiawu Zheng and Dongzhan Zhou and Chenglin Wu},
year={2025},
eprint={2511.15065},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.15065},
}
</code>
</pre> |