ALI β RADIO 2.5-H relighting ControlNet (stage 2)
Reference-based image relighting: given an input image and a reference image, re-render the input's content under the reference's lighting. Built on Stable Diffusion 2.1 ControlNet with a frozen Latent-Intrinsics prior whose intrinsic features are enriched by a frozen RADIO v2.5-H vision foundation model through a trained multi-scale adapter.
This checkpoint is the stage-2 model (check_model_epoch=01.ckpt): the
ControlNet + SD cross-attention layers trained with the intrinsics prior frozen,
after a stage-1 round that trained the VFM adapter and light decoders.
Training
- Backbone:
radio_v2.5-h(frozen), Latent-Intrinsics UNet (frozen,last.pth.tar). - Data: BigTime + Multi-Illumination (MIIW) multi-lighting pairs.
- Stages: stage 1 (prior/adapter) β stage 2 (ControlNet + cross-attn).
MIIW test metrics (cross-scene, color-corrected)
| RMSE | PSNR | SSIM | LPIPS |
|---|---|---|---|
| 0.130 | 18.04 | 0.718 | 0.223 |
Usage
Use with the release code (inference.py --vfm radio_v2.5-h --vae-ckpt <modi-vae>).
Requires the Latent-Intrinsics checkpoint and (optionally) the skip-connection
"modi-vae" for best detail. See the project repository for details.