Files changed (1) hide show
  1. README.md +19 -18
README.md CHANGED
@@ -4,29 +4,30 @@ pipeline_tag: image-to-video
4
  library_name: diffusers
5
  ---
6
 
7
- <p align="center">
8
- <h1 align="center">TesserAct: Learning 4D Embodied World Models</h1>
9
  <p align="center">
10
- <a href="https://haoyuzhen.com">Haoyu Zhen*</a>,
11
- <a href="https://qiaosun22.github.io/">Qiao Sun*</a>,
12
- <a href="https://icefoxzhx.github.io/">Hongxin Zhang</a>,
13
- <a href="https://senfu.github.io/">Junyan Li</a>,
 
 
14
  <a href="https://rainbow979.github.io/">Siyuan Zhou</a>,
 
 
15
  <a href="https://yilundu.github.io/">Yilun Du</a>,
16
  <a href="https://people.csail.mit.edu/ganchuang">Chuang Gan</a>
17
  </p>
18
- </p>
19
-
20
- <p align="center">
21
- <a href="https://arxiv.org/abs/2504.20995">Paper PDF</a>
22
- &nbsp;|&nbsp;
23
- <a href="https://tesseractworld.github.io">Project Page</a>
24
- &nbsp;|&nbsp;
25
- <a href="https://huggingface.co/anyeZHY/tesseract">Model on Hugging Face</a>
26
- &nbsp;|&nbsp;
27
- <a href="https://github.com/UMass-Embodied-AGI/TesserAct">Code</a>
28
  </p>
29
 
30
 
31
- We propose TesserAct, the 4D Embodied World Model, which takes input images and text instruction to generate RGB, depth,
32
- and normal videos, reconstructing a 4D scene and predicting actions.
 
4
  library_name: diffusers
5
  ---
6
 
7
+ <br/>
8
+ <h1 align="center" style="font-size: 1.7rem">MindJourney: Test-Time Scaling with World Models for Spatial Reasoning</h1>
9
  <p align="center">
10
+ NeurIPS 2025
11
+ </p>
12
+ <p align="center">
13
+ <a href="https://yyuncong.github.io/">Yuncong Yang</a>,
14
+ <a href="https://jiagengliu02.github.io/">Jiageng Liu</a>,
15
+ <a href="https://cozheyuanzhangde.github.io/">Zheyuan Zhang</a>,
16
  <a href="https://rainbow979.github.io/">Siyuan Zhou</a>,
17
+ <a href="https://cs-people.bu.edu/rxtan/">Reuben Tan</a>,
18
+ <a href="https://jwyang.github.io/">Jianwei Yang</a>,
19
  <a href="https://yilundu.github.io/">Yilun Du</a>,
20
  <a href="https://people.csail.mit.edu/ganchuang">Chuang Gan</a>
21
  </p>
22
+ <p align="center">
23
+ <a href="https://arxiv.org/abs/2507.12508">
24
+ <img src='https://img.shields.io/badge/Paper-PDF-red?style=flat&logo=arXiv&logoColor=red' alt='Paper PDF'>
25
+ </a>
26
+ <a href='https://umass-embodied-agi.github.io/MindJourney/' style='padding-left: 0.5rem;'>
27
+ <img src='https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=blue' alt='Project Page'>
28
+ </a>
29
+ </p>
 
 
30
  </p>
31
 
32
 
33
+ MindJourney is a test-time scaling framework that leverages the 3D imagination capability of World Models to strengthen spatial reasoning in Vision-Language Models (VLMs). We evaluate on the SAT dataset and provide a baseline pipeline, a Stable Virtual Camera (SVC) based spatial beam search pipeline, and a Search World Model (SWM) based spatial beam search pipeline.