Instructions to use anyeZHY/tesseract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use anyeZHY/tesseract with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("anyeZHY/tesseract", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Update README.md
#2
by yyuncong - opened
README.md
CHANGED
|
@@ -4,29 +4,30 @@ pipeline_tag: image-to-video
|
|
| 4 |
library_name: diffusers
|
| 5 |
---
|
| 6 |
|
| 7 |
-
<
|
| 8 |
-
<h1 align="center">
|
| 9 |
<p align="center">
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
<a href="https://
|
|
|
|
|
|
|
| 14 |
<a href="https://rainbow979.github.io/">Siyuan Zhou</a>,
|
|
|
|
|
|
|
| 15 |
<a href="https://yilundu.github.io/">Yilun Du</a>,
|
| 16 |
<a href="https://people.csail.mit.edu/ganchuang">Chuang Gan</a>
|
| 17 |
</p>
|
| 18 |
-
<
|
| 19 |
-
|
| 20 |
-
<
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
<
|
| 26 |
-
|
|
| 27 |
-
<a href="https://github.com/UMass-Embodied-AGI/TesserAct">Code</a>
|
| 28 |
</p>
|
| 29 |
|
| 30 |
|
| 31 |
-
|
| 32 |
-
and normal videos, reconstructing a 4D scene and predicting actions.
|
|
|
|
| 4 |
library_name: diffusers
|
| 5 |
---
|
| 6 |
|
| 7 |
+
<br/>
|
| 8 |
+
<h1 align="center" style="font-size: 1.7rem">MindJourney: Test-Time Scaling with World Models for Spatial Reasoning</h1>
|
| 9 |
<p align="center">
|
| 10 |
+
NeurIPS 2025
|
| 11 |
+
</p>
|
| 12 |
+
<p align="center">
|
| 13 |
+
<a href="https://yyuncong.github.io/">Yuncong Yang</a>,
|
| 14 |
+
<a href="https://jiagengliu02.github.io/">Jiageng Liu</a>,
|
| 15 |
+
<a href="https://cozheyuanzhangde.github.io/">Zheyuan Zhang</a>,
|
| 16 |
<a href="https://rainbow979.github.io/">Siyuan Zhou</a>,
|
| 17 |
+
<a href="https://cs-people.bu.edu/rxtan/">Reuben Tan</a>,
|
| 18 |
+
<a href="https://jwyang.github.io/">Jianwei Yang</a>,
|
| 19 |
<a href="https://yilundu.github.io/">Yilun Du</a>,
|
| 20 |
<a href="https://people.csail.mit.edu/ganchuang">Chuang Gan</a>
|
| 21 |
</p>
|
| 22 |
+
<p align="center">
|
| 23 |
+
<a href="https://arxiv.org/abs/2507.12508">
|
| 24 |
+
<img src='https://img.shields.io/badge/Paper-PDF-red?style=flat&logo=arXiv&logoColor=red' alt='Paper PDF'>
|
| 25 |
+
</a>
|
| 26 |
+
<a href='https://umass-embodied-agi.github.io/MindJourney/' style='padding-left: 0.5rem;'>
|
| 27 |
+
<img src='https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=blue' alt='Project Page'>
|
| 28 |
+
</a>
|
| 29 |
+
</p>
|
|
|
|
|
|
|
| 30 |
</p>
|
| 31 |
|
| 32 |
|
| 33 |
+
MindJourney is a test-time scaling framework that leverages the 3D imagination capability of World Models to strengthen spatial reasoning in Vision-Language Models (VLMs). We evaluate on the SAT dataset and provide a baseline pipeline, a Stable Virtual Camera (SVC) based spatial beam search pipeline, and a Search World Model (SWM) based spatial beam search pipeline.
|
|
|