SceneReVis-7B / README.md

runder1

Update GitHub repo link to Runder-sun/SceneReVis

f7bb8f2 verified about 23 hours ago

preview code

raw

history blame contribute delete

1.84 kB

metadata

license: mit
language:
  - en
tags:
  - 3d-scene-generation
  - indoor-scene
  - vision-language
  - reinforcement-learning
base_model: Qwen/Qwen2.5-VL-7B-Instruct
gated: auto
extra_gated_prompt: >-
  By requesting access to SceneReVis-7B, you agree to the following terms: 1.
  You will use this model only for academic research purposes. 2. You will not
  redistribute the model weights without permission. 3. You will cite our paper
  in any published work that uses this model.
extra_gated_fields:
  Name: text
  Affiliation: text
  I want to use this model for:
    type: select
    options:
      - Academic Research
      - Education
      - label: Commercial Use
        value: commercial
      - label: Other
        value: other
  I agree to use this model for non-commercial research only: checkbox
extra_gated_heading: Request access to SceneReVis-7B
extra_gated_description: >-
  Please fill out the form below. Access will be granted automatically after
  submission.
extra_gated_button_content: Submit & Get Access

SceneReVis-7B

SceneReVis-7B is a vision-language model fine-tuned for iterative 3D indoor scene generation and editing.

Model Details

Base Model: Qwen2.5-VL-7B-Instruct
Training: SFT on SceneChain-12K + GRPO reinforcement learning with voxel-based physics rewards
Architecture: Vision-Language Model with tool-calling capabilities

Usage

See the SceneReVis repository for inference instructions.

Citation

@article{zhao2026scenerevis,
  title={SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL},
  author={Yang Zhao and Shizhao Sun and Meisheng Zhang and Yingdong Shi and Xubo Yang and Jiang Bian},
  journal={arXiv preprint arXiv:2602.09432},
  year={2026}
}