hash-map
/

conditional_gan_vis_to_ir

Image-to-Image

Model card Files Files and versions

xet

Community

hash-map commited on Oct 29, 2025

Commit

cd5ce2a

verified ·

1 Parent(s): 6f3ce53

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -25

README.md CHANGED Viewed

@@ -41,8 +41,7 @@ A higher emphasis is given to **L1 loss**, ensuring that overall brightness and
 - Output: single-channel IR image
 ### ⚔️ Discriminator
-- PatchGAN (70×70 receptive field)
-- Evaluates realism of local patches for fine detail learning
 ---
@@ -55,7 +54,7 @@ A higher emphasis is given to **L1 loss**, ensuring that overall brightness and
 | **Batch Size** | 4 |
 | **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
 | **Learning Rate** | 2e-4 |
-| **Precision** | Mixed (FP16/32) |
 | **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
 ---
@@ -94,7 +93,10 @@ L_{G} = \lambda_{L1} L_{L1} + \lambda_{perc} L_{perc} + \lambda_{adv} L_{adv} +
 | **Combined GAN** | ![GAN Architecture Combined](gan_architecture_combined.png) |
 ---
 ## 🖼️ Visual Results
@@ -141,13 +143,13 @@ All training metrics are logged in:
 - The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
 - **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
 - Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
 ---
 ## 🚀 Future Work
 - Apply **feature matching loss** for smoother discriminator gradients
-- Introduce **spectral normalization** for training stability
 - Add **temporal or sequence consistency** for video IR translation
 - Adaptive loss balancing with epoch-based dynamic weighting
@@ -160,23 +162,3 @@ TensorFlow and VGG-19 for perceptual feature extraction
 Kaggle GPU for high-performance model training
-## 📜 License
-**MIT License © 2025**
-Author: **Sai Sumanth Appala**
----
-## 🧾 Citation
-If you use this work, please cite:
-```bibtex
-@misc{appala2025visible2ir,
-  author = {Appala, Sai Sumanth},
-  title = {Conditional GAN for Visible-to-Infrared Translation with Multi-Loss Training},
-  year = {2025},
-  license = {MIT},
-  dataset = {UserNae3/LLVIP},
-  framework = {TensorFlow},
-}

 - Output: single-channel IR image
 ### ⚔️ Discriminator
+- Evaluates realism  for fine detail learning
 ---
 | **Batch Size** | 4 |
 | **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
 | **Learning Rate** | 2e-4 |
+| **Precision** | Mixed (32) |
 | **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
 ---
 | **Combined GAN** | ![GAN Architecture Combined](gan_architecture_combined.png) |
 ---
+Data Exploration
+We analysed the LLVIP dataset and found that ~70% of image pairs are captured at < 50 lux lighting and ~30% at 50-200 lux.
+The average pedestrian height in IR channel was X pixels; outliers with <20 pixels height were excluded.
 ## 🖼️ Visual Results
 - The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
 - **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
 - Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
+- We compared three variants: (i) U-Net regression (L1 only) → SSIM = 0.80;
+- (ii) cGAN with L1+adv → SSIM = 0.83; (iii) cGAN with L1+adv+perc+edge (our final) → SSIM = 0.8386
 ---
 ## 🚀 Future Work
 - Apply **feature matching loss** for smoother discriminator gradients
 - Add **temporal or sequence consistency** for video IR translation
 - Adaptive loss balancing with epoch-based dynamic weighting
 Kaggle GPU for high-performance model training