Image-to-Image
Diffusers
Safetensors
English
diffusion
image-harmonization
lighting
3d-gaussian-splatting
computer-vision
flux
Instructions to use nviolante/dot3d with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use nviolante/dot3d with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nviolante/dot3d", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: other | |
| language: | |
| - en | |
| tags: | |
| - diffusion | |
| - image-harmonization | |
| - lighting | |
| - 3d-gaussian-splatting | |
| - computer-vision | |
| - flux | |
| pipeline_tag: image-to-image | |
| # Lighting-Consistent Object Transfer Across Radiance Fields | |
| **Paper:** [PDF](https://repo-sam.inria.fr/nerphys/dot3d/dot3d.pdf) | **Project Page:** [dot3d](https://repo-sam.inria.fr/nerphys/dot3d) | **HAL:** [hal-05657202](https://inria.hal.science/hal-05657202v1) | |
| **Authors:** Nicolas Violante¹ · George Kopanas² · Linus Franke¹ · Julien Philip³ · George Drettakis¹ | |
| > ¹Inria, Université Côte d'Azur · ²Google DeepMind · ³Eyeline Labs | |
| --- | |
| ## Model Description | |
| DOT3D is a diffusion-based image harmonization model that corrects lighting inconsistencies when compositing objects from one scene into another. It is the 2D harmonization backbone of a complete 3D pipeline for transferring objects between 3D Gaussian Splatting (3DGS) captures. | |
| When an object is extracted from a source scene and naively pasted into a target scene, the result is unrealistic due to mismatched lighting. DOT3D harmonizes each rendered view of the composite to match the target scene's lighting, and these views are then consolidated via 3DGS post-optimization. | |
| The model is built on [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) and fine-tuned on a heterogeneous dataset combining synthetic, generated, and real image pairs (inconsistent composite input → consistent output). | |
| Two variants are available: | |
| | Variant | Inputs | Checkpoint | | |
| |---|---|---| | |
| | **Image-Mask** | Composite image + binary mask of inserted object | `image-mask/` | | |
| | **Image-Background** | Composite image + original background | `image-background/` | | |
| --- | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| git clone https://github.com/graphdeco-inria/dot3d | |
| cd dot3d | |
| conda create -n dot3d python=3.10 -y | |
| conda activate dot3d | |
| export CUDA_VERSION=cu121 # adjust to your system | |
| bash install.sh | |
| ``` | |
| ### Image-Mask variant | |
| ```python | |
| from PIL import Image | |
| from huggingface_hub import snapshot_download | |
| from wrappers import DOT3DHarmonizationWrapper | |
| snapshot_download(repo_id="nviolante/dot3d", allow_patterns="image-mask/*", local_dir="checkpoints") | |
| wrapper = DOT3DHarmonizationWrapper("checkpoints/image-mask") | |
| image = Image.open("composite.png").convert("RGB") | |
| mask = Image.open("mask.png").convert("RGB") | |
| result = wrapper.predict_image(image, mask, num_inference_steps=4) | |
| result["prediction"].save("harmonized.png") | |
| ``` | |
| ### Image-Background variant | |
| ```python | |
| snapshot_download(repo_id="nviolante/dot3d", allow_patterns="image-background/*", local_dir="checkpoints") | |
| wrapper = DOT3DHarmonizationWrapper("checkpoints/image-background") | |
| image = Image.open("composite.png").convert("RGB") | |
| background = Image.open("background.png").convert("RGB") | |
| result = wrapper.predict_image(image, background, num_inference_steps=4) | |
| result["prediction"].save("harmonized.png") | |
| ``` | |
| --- | |
| ## Training Data | |
| The model was trained on a heterogeneous mixture of image pairs (inconsistent composite, consistent ground truth): | |
| - **Blender** — synthetic renders with controlled relighting | |
| - **FLUX-generated** — synthetically composited pairs produced with a generative model | |
| - **ORIDA** — real image pairs from [ORIDA](https://hello-jinwoo.github.io/orida/) with relighting supervision | |
| Full dataset: [`nviolante/dot3d`](https://huggingface.co/datasets/nviolante/dot3d) | |
| --- | |
| ## Evaluation | |
| Evaluated on PSNR, SSIM, LPIPS, FID, and KID on the test splits of all three datasets. Pre-computed 3D results are available at [results_3d](https://huggingface.co/datasets/nviolante/dot3d/tree/main/results_3d). | |
| --- | |
| ## Hardware Requirements | |
| | Task | Hardware | | |
| |---|---| | |
| | Training | 4× H100 (96 GB VRAM) | | |
| | 3D post-optimization | 1× H100 (96 GB VRAM) | | |
| | Inference (harmonization only) | Consumer GPU | | |
| --- | |
| ## BibTeX | |
| ```bibtex | |
| @article{violante2026dot3d, | |
| author = {Violante, Nicolás and Kopanas, George and Franke, Linus and Philip, Julien and Drettakis, George}, | |
| title = {Lighting-Consistent Object Transfer Across Radiance Fields}, | |
| journal = {Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering)}, | |
| year = {2026}, | |
| volume = 45, | |
| number = 4 | |
| } | |
| ``` | |
| --- | |
| ## Acknowledgments | |
| Funded by the European Union ERC Advanced Grants [NERPHYS](https://project.inria.fr/nerphys) (101141721) and [EXPLORER](https://cordis.europa.eu/project/id/101097259) (101097259). Experiments used the [Grid'5000](https://www.grid5000.fr) testbed. The authors thank Adobe and NVIDIA for software and hardware donations. | |