| pipeline_tag: image-to-image | |
| library_name: diffusers | |
| license: mit | |
| # F-ViTA: Foundation Model Guided Visible to Thermal Translation | |
| This repository contains the model described in the paper [F-ViTA: Foundation Model Guided Visible to Thermal Translation](https://huggingface.co/papers/2504.02801). | |
| F-ViTA leverages foundation models (SAM and Grounded DINO) to guide the visible-to-thermal image translation process using an InstructPix2Pix diffusion model. This approach improves translation accuracy and generalizes well to out-of-distribution scenarios. | |
| Code: https://github.com/jay-jnp/F-ViTA | |
| Pre-trained checkpoints are available for several datasets: | |
| * **KAIST:** [huggingface.co/jay-jnp/F-ViTA\_KAIST](https://huggingface.co/jay-jnp/F-ViTA_KAIST) | |
| * **FLIR:** [huggingface.co/jay-jnp/F-VITA\_FLIR](https://huggingface.co/jay-jnp/F-VITA_FLIR) | |
| * **NIRSCENE:** [huggingface.co/jay-jnp/F-VITA\_NIRSCENE](https://huggingface.co/jay-jnp/F-VITA_NIRSCENE) | |
| * **OSU:** [huggingface.co/jay-jnp/F-VITA\_OSU](https://huggingface.co/jay-jnp/F-VITA_OSU) |