SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL
Paper
•
2602.09432
•
Published
Please fill out the form below. Access will be granted automatically after submission.
By requesting access to SceneReVis-7B, you agree to the following terms: 1. You will use this model only for academic research purposes. 2. You will not redistribute the model weights without permission. 3. You will cite our paper in any published work that uses this model.
Log in or Sign Up to review the conditions and access this model content.
SceneReVis-7B is a vision-language model fine-tuned for iterative 3D indoor scene generation and editing.
See the SceneReVis repository for inference instructions.
@article{zhao2026scenerevis,
title={SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL},
author={Yang Zhao and Shizhao Sun and Meisheng Zhang and Yingdong Shi and Xubo Yang and Jiang Bian},
journal={arXiv preprint arXiv:2602.09432},
year={2026}
}
Base model
Qwen/Qwen2.5-VL-7B-Instruct