Request access to SceneReVis-7B

Please fill out the form below. Access will be granted automatically after submission.

By requesting access to SceneReVis-7B, you agree to the following terms: 1. You will use this model only for academic research purposes. 2. You will not redistribute the model weights without permission. 3. You will cite our paper in any published work that uses this model.

SceneReVis-7B

SceneReVis-7B is a vision-language model fine-tuned for iterative 3D indoor scene generation and editing.

Model Details

Base Model: Qwen2.5-VL-7B-Instruct
Training: SFT on SceneChain-12K + GRPO reinforcement learning with voxel-based physics rewards
Architecture: Vision-Language Model with tool-calling capabilities

Usage

See the SceneReVis repository for inference instructions.

Citation

@article{zhao2026scenerevis,
  title={SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL},
  author={Yang Zhao and Shizhao Sun and Meisheng Zhang and Yingdong Shi and Xubo Yang and Jiang Bian},
  journal={arXiv preprint arXiv:2602.09432},
  year={2026}
}