Multiview DROID image-edit (FLUX.2-klein-4B)
Full finetune of black-forest-labs/FLUX.2-klein-4B
for the multiview DROID image-edit task. Given a wrist-camera reference
(cond 1) and a side-camera target (cond 2), the model inserts the prompted
object into the side view at a plausible location, scale, orientation, and
occlusion using the wrist view as reference.
Files
config.jsonโFlux2Transformer2DModelarchitecture config.diffusion_pytorch_model.safetensorsโ finetuned transformer weights.
Load directly with Flux2Transformer2DModel.from_pretrained("<this repo or local dir>").
Usage
See the IROM docs and inference script in the source repo:
# Download
uv run examples/inference/download_multiview_droid.py
# Inference
uv run examples/inference/multiview_droid.py \
--checkpoint_path checkpoints/multiview_droid_v0 \
--wrist_cond assets/droid/wrist_banana.jpg \
--side_cond assets/droid/side1.jpg \
--prompt "banana" \
--output_dir outputs/multiview_droid/banana
--prompt is the object phrase (e.g. "banana", "blue mug"); it is
slotted into the prompt template the model was trained on. Pass
--full_prompt to override the template verbatim.
- Downloads last month
- 16
Model tree for tennyyyin/multiview_droid_v0
Base model
black-forest-labs/FLUX.2-klein-4B