F-ViTA: Foundation Model Guided Visible to Thermal Translation
Paper • 2504.02801 • Published
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("jay-jnp/F-ViTA_KAIST", dtype=torch.bfloat16, device_map="cuda")
prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
image = pipe(image=input_image, prompt=prompt).images[0]This repository contains the model described in the paper F-ViTA: Foundation Model Guided Visible to Thermal Translation.
F-ViTA leverages foundation models (SAM and Grounded DINO) to guide the visible-to-thermal image translation process using an InstructPix2Pix diffusion model. This approach improves translation accuracy and generalizes well to out-of-distribution scenarios.
Code: https://github.com/jay-jnp/F-ViTA
Pre-trained checkpoints are available for several datasets: