Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D

Paper | Project Page | Code

Steer3D is a novel feedforward method that introduces text steerability to image-to-3D models, enabling the editing of generated 3D assets using natural language instructions. Inspired by ControlNet, this approach adapts the architecture for image-to-3D generation, facilitating direct text steering in a forward pass. Steer3D demonstrates faithful adherence to language instructions and maintains better consistency with the original 3D asset, while being significantly faster than competing methods.

Overview

Steer3D adapts the ControlNet architecture to add text steerability to image-to-3D models. It is trained on a 100k-scale synthetic dataset generated by a custom data engine. This project shares code for both the data engine and the model, with scripts for various steps explained in the dataengine/README.md within the GitHub repository.

Environment Setup

To set up the environment for the model, please follow the instructions below. Note that the data engine requires a separate environment setup, detailed in dataengine/README.md.

conda env create -f environment.yml
conda activate steer3d

Libraries such as kaolin, nvdiffrast, diffoctreerast, mip-splatting, and vox2seq may require manual installation. Refer to the setup script from TRELLIS for guidance on installing these dependencies.

Usage: Inference in the Wild

This section demonstrates how to perform text-steerable 3D editing on user-provided images and text prompts. The flags are similar to benchmark evaluation. You can directly pass an image path via --image_path and editing text via --text (which can also be a .txt file with multiple editing texts, separated by linebreaks). Set --texture_only for better geometry consistency during texture-only edits. A visualization PNG will be generated in the output directory. If --export_glb is set, GLB files of the 3D objects will also be generated.

Important: First, set PYTHONPATH=[path to Steer3D] to include the project directory in your Python path. Model checkpoints must be downloaded from the Hugging Face repository, and [path-to-checkpoints] should be replaced with their actual location.

Here are 3 example edits demonstrating removal, addition, and texture changes for a traffic cone based on a natural photo.

Removal Example

python evaluation/eval_wild.py \
        --image_path media/cone.jpg \
        --text "Remove the entire bottom base" \
        --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
        --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
        --stage1_config configs/stage1_controlnet.json \
        --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
        --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
        --stage2_config configs/stage2_controlnet.json \
        --output_dir visualizations/single_image \
        --num_seeds 1

Texture Example

python evaluation/eval_wild.py \
        --image_path media/cone.jpg \
        --text "Turn the entire cone into a metallic silver texture" \
        --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
        --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
        --stage1_config configs/stage1_controlnet.json \
        --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
        --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
        --stage2_config configs/stage2_controlnet.json \
        --output_dir visualizations/single_image \
        --texture_only \
        --num_seeds 1

Addition Example

python evaluation/eval_wild.py \
        --image_path media/cone.jpg \
        --text "Add a cap shaped light on top of the cone" \
        --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
        --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
        --stage1_config configs/stage1_controlnet.json \
        --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
        --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
        --stage2_config configs/stage2_controlnet.json \
        --output_dir visualizations/single_image \
        --num_seeds 1

Citation

If you find our work helpful, please use the following BibTeX entry to cite it:

@misc{ma2025feedforward3deditingtextsteerable,
      title={Feedforward 3D Editing via Text-Steerable Image-to-3D}, 
      author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
      year={2025},
      eprint={2512.13678},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.13678}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ziqima/Steer3D

Feedforward 3D Editing via Text-Steerable Image-to-3D

Paper • 2512.13678 • Published Dec 15, 2025 • 14