MObI: Multimodal Object Inpainting Using Diffusion Models
Paper β’ 2501.03173 β’ Published β’ 1
Pretrained weights for MObI, a diffusion-based model for joint multimodal object inpainting across camera and lidar, conditioned on a single reference image and a 3D bounding box.
π Paper: arXiv:2501.03173 π» Code: github.com/alexbuburuzan/MObI Venue: CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS), 2025
MObI extends Paint-by-Example to:
This combines the realism of reference-based inpainting with the controllability of 3D-aware methods.
| File | Description |
|---|---|
mobi_nuscenes_epoch28.ckpt |
MObI trained on nuScenes |
autoencoders/range_autoencoder.ckpt |
Range-view VAE for lidar |
| Reference Type | FID β | LPIPS β | CLIP β | D-LPIPS β | I-LPIPS β |
|---|---|---|---|---|---|
| id-ref | 6.503 | 0.114 | 84.9 | 0.130 | 0.147 |
| track-ref | 6.703 | 0.115 | 83.5 | 0.129 | 0.149 |
| in-domain-ref | 8.947 | 0.127 | 77.5 | 0.132 | 0.154 |
| cross-domain-ref | 9.046 | 0.130 | 76.0 | 0.132 | 0.153 |
See the GitHub repository for installation, data preprocessing, inference, and training instructions.
git clone https://github.com/alexbuburuzan/MObI.git
cd MObI
bash scripts/download_models.sh
bash scripts/realism_test_bench.sh
@InProceedings{Buburuzan_2025_CVPR,
author = {Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K. and Mueller, Romain},
title = {MObI: Multimodal Object Inpainting Using Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2025},
pages = {1999-2009}
}
Released under CC BY-NC 4.0. Note that this work builds on Paint-by-Example and BEVFusion, which have their own licenses.