Update README.md
#2
by
yyuncong
- opened
README.md
CHANGED
|
@@ -4,29 +4,30 @@ pipeline_tag: image-to-video
|
|
| 4 |
library_name: diffusers
|
| 5 |
---
|
| 6 |
|
| 7 |
-
<
|
| 8 |
-
<h1 align="center">
|
| 9 |
<p align="center">
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
<a href="https://
|
|
|
|
|
|
|
| 14 |
<a href="https://rainbow979.github.io/">Siyuan Zhou</a>,
|
|
|
|
|
|
|
| 15 |
<a href="https://yilundu.github.io/">Yilun Du</a>,
|
| 16 |
<a href="https://people.csail.mit.edu/ganchuang">Chuang Gan</a>
|
| 17 |
</p>
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
<
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
| 27 |
-
<a href="https://github.com/UMass-Embodied-AGI/TesserAct">Code</a>
|
| 28 |
</p>
|
| 29 |
|
| 30 |
|
| 31 |
-
|
| 32 |
-
and normal videos, reconstructing a 4D scene and predicting actions.
|
|
|
|
| 4 |
library_name: diffusers
|
| 5 |
---
|
| 6 |
|
| 7 |
+
<br/>
|
| 8 |
+
<h1 align="center" style="font-size: 1.7rem">MindJourney: Test-Time Scaling with World Models for Spatial Reasoning</h1>
|
| 9 |
<p align="center">
|
| 10 |
+
NeurIPS 2025
|
| 11 |
+
</p>
|
| 12 |
+
<p align="center">
|
| 13 |
+
<a href="https://yyuncong.github.io/">Yuncong Yang</a>,
|
| 14 |
+
<a href="https://jiagengliu02.github.io/">Jiageng Liu</a>,
|
| 15 |
+
<a href="https://cozheyuanzhangde.github.io/">Zheyuan Zhang</a>,
|
| 16 |
<a href="https://rainbow979.github.io/">Siyuan Zhou</a>,
|
| 17 |
+
<a href="https://cs-people.bu.edu/rxtan/">Reuben Tan</a>,
|
| 18 |
+
<a href="https://jwyang.github.io/">Jianwei Yang</a>,
|
| 19 |
<a href="https://yilundu.github.io/">Yilun Du</a>,
|
| 20 |
<a href="https://people.csail.mit.edu/ganchuang">Chuang Gan</a>
|
| 21 |
</p>
|
| 22 |
+
<p align="center">
|
| 23 |
+
<a href="https://arxiv.org/abs/2507.12508">
|
| 24 |
+
<img src='https://img.shields.io/badge/Paper-PDF-red?style=flat&logo=arXiv&logoColor=red' alt='Paper PDF'>
|
| 25 |
+
</a>
|
| 26 |
+
<a href='https://umass-embodied-agi.github.io/MindJourney/' style='padding-left: 0.5rem;'>
|
| 27 |
+
<img src='https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=blue' alt='Project Page'>
|
| 28 |
+
</a>
|
| 29 |
+
</p>
|
|
|
|
|
|
|
| 30 |
</p>
|
| 31 |
|
| 32 |
|
| 33 |
+
MindJourney is a test-time scaling framework that leverages the 3D imagination capability of World Models to strengthen spatial reasoning in Vision-Language Models (VLMs). We evaluate on the SAT dataset and provide a baseline pipeline, a Stable Virtual Camera (SVC) based spatial beam search pipeline, and a Search World Model (SWM) based spatial beam search pipeline.
|
|
|