Multiview DROID image-edit (FLUX.2-klein-4B)

Full finetune of black-forest-labs/FLUX.2-klein-4B for the multiview DROID image-edit task. Given a wrist-camera reference (cond 1) and a side-camera target (cond 2), the model inserts the prompted object into the side view at a plausible location, scale, orientation, and occlusion using the wrist view as reference.

Files

  • config.json โ€” Flux2Transformer2DModel architecture config.
  • diffusion_pytorch_model.safetensors โ€” finetuned transformer weights.

Load directly with Flux2Transformer2DModel.from_pretrained("<this repo or local dir>").

Usage

See the IROM docs and inference script in the source repo:

# Download
uv run examples/inference/download_multiview_droid.py

# Inference
uv run examples/inference/multiview_droid.py \
    --checkpoint_path checkpoints/multiview_droid_v0 \
    --wrist_cond assets/droid/wrist_banana.jpg \
    --side_cond  assets/droid/side1.jpg \
    --prompt     "banana" \
    --output_dir outputs/multiview_droid/banana

--prompt is the object phrase (e.g. "banana", "blue mug"); it is slotted into the prompt template the model was trained on. Pass --full_prompt to override the template verbatim.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tennyyyin/multiview_droid_v0

Finetuned
(12)
this model