Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D
Paper | Project Page | Code
Steer3D is a novel feedforward method that introduces text steerability to image-to-3D models, enabling the editing of generated 3D assets using natural language instructions. Inspired by ControlNet, this approach adapts the architecture for image-to-3D generation, facilitating direct text steering in a forward pass. Steer3D demonstrates faithful adherence to language instructions and maintains better consistency with the original 3D asset, while being significantly faster than competing methods.
Overview
Steer3D adapts the ControlNet architecture to add text steerability to image-to-3D models. It is trained on a 100k-scale synthetic dataset generated by a custom data engine. This project shares code for both the data engine and the model, with scripts for various steps explained in the dataengine/README.md within the GitHub repository.
Environment Setup
To set up the environment for the model, please follow the instructions below. Note that the data engine requires a separate environment setup, detailed in dataengine/README.md.
conda env create -f environment.yml
conda activate steer3d
Libraries such as kaolin, nvdiffrast, diffoctreerast, mip-splatting, and vox2seq may require manual installation. Refer to the setup script from TRELLIS for guidance on installing these dependencies.
Usage: Inference in the Wild
This section demonstrates how to perform text-steerable 3D editing on user-provided images and text prompts. The flags are similar to benchmark evaluation. You can directly pass an image path via --image_path and editing text via --text (which can also be a .txt file with multiple editing texts, separated by linebreaks). Set --texture_only for better geometry consistency during texture-only edits. A visualization PNG will be generated in the output directory. If --export_glb is set, GLB files of the 3D objects will also be generated.
Important: First, set PYTHONPATH=[path to Steer3D] to include the project directory in your Python path. Model checkpoints must be downloaded from the Hugging Face repository, and [path-to-checkpoints] should be replaced with their actual location.
Here are 3 example edits demonstrating removal, addition, and texture changes for a traffic cone based on a natural photo.
Removal Example
python evaluation/eval_wild.py \
--image_path media/cone.jpg \
--text "Remove the entire bottom base" \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/single_image \
--num_seeds 1
Texture Example
python evaluation/eval_wild.py \
--image_path media/cone.jpg \
--text "Turn the entire cone into a metallic silver texture" \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/single_image \
--texture_only \
--num_seeds 1
Addition Example
python evaluation/eval_wild.py \
--image_path media/cone.jpg \
--text "Add a cap shaped light on top of the cone" \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/single_image \
--num_seeds 1
Citation
If you find our work helpful, please use the following BibTeX entry to cite it:
@misc{ma2025feedforward3deditingtextsteerable,
title={Feedforward 3D Editing via Text-Steerable Image-to-3D},
author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
year={2025},
eprint={2512.13678},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.13678},
}
