Naka-guided Chroma-Correction Model
This repository hosts the Naka-guided Chroma-correction model used in the Naka-GS pipeline.
The model is designed to refine a Naka-enhanced low-light image by suppressing color distortion in bright regions while preserving edge and texture details. In the released implementation, the network predicts a single-channel multiplicative correction map and a three-channel additive correction map, and applies them only to the low-frequency component of the Naka-enhanced image before adding the preserved high-frequency details back to the final output. The model input is an 18-channel representation built from the low-light image, the Naka-enhanced image, their residual, and standardized counterparts.
Associated resources
- Project page / code:
https://github.com/RunyuZhu/Naka-GS - Paper page:
https://huggingface.co/papers/2604.11142 - ArXiv:
https://arxiv.org/abs/2604.11142
What this model does
Given a low-light RGB image:
- a Naka phototransduction transform is applied,
- the correction network predicts
mul_mapandadd_map, - the low-frequency component of the Naka image is corrected,
- the high-frequency component is added back,
- the final enhanced image is saved.
In the provided code, inference saves the corrected result as <image_name>_enhanced.JPG.
Model details
Architecture
The released model is a U-Net-style encoder-decoder with residual blocks and SE attention. The core model class is ChromaGuidedUNet. Its forward pass takes low and naka tensors as input, constructs an 18-channel feature tensor, predicts mul_map and add_map, and performs frequency-decoupled correction on the Naka image.
Input
- RGB low-light image
- Automatically generated Naka-enhanced intermediate image
Output
- corrected RGB image:
enhanced - optional intermediate maps:
mul_mapadd_map
Checkpoints
Recommended checkpoint filenames:
best.pth: the best one during evallatest.pth: latest training state. We usually use this one during inference
The training script saves latest.pth every epoch and updates best.pth whenever validation PSNR improves.
Intended use
This model is intended to be used as the color-correction / enhancement stage in the Naka-GS low-light 3D reconstruction pipeline, or as a standalone low-light image refinement module when a Naka-style phototransduction preprocessing step is available.
Limitations
- This repository contains custom PyTorch code and is not a Transformers-native model.
- The script depends on a custom
Phototransductionimplementation and tries to import it from eitherretina.phototransductionorphototransduction. For a standalone release, placephototransduction.pynext tonaka_color_correction.py, or preserve the original package layout. - The model card does not claim broad robustness outside the training setting used by the original project.
Repository layout
A minimal Hugging Face release layout is:
.
βββ README.md
βββ requirements.txt
βββ naka_color_correction.py
βββ phototransduction.py
βββ best.pth
βββ latest.pth
βββ assets/
βββ *** # optional
βββ results.png # optional
Installation
git clone https://huggingface.co/<your-username-or-org>/<your-model-repo>
cd <your-model-repo>
pip install -r requirements.txt
Requirements
Core dependencies used directly in the provided script:
torchtorchvisionnumpyopencv-pythonPillow
The script also uses torchvision.models.vgg19 for the perceptual loss branch during training.
Quick start: inference
1. Prepare files
Place the following files in the same directory:
naka_color_correction.pyphototransduction.pylatest.pth/best.pth
Create an input folder such as:
./test_images/
and put your test images inside.
2. Run inference
python naka_color_correction.py \
--mode infer \
--input_dir ./test_images \
--output_dir ./outputs/infer_results \
--ckpt ./latest.pth
3. Inference on large images
The code supports tiled forwarding for large inputs:
python naka_color_correction.py \
--mode infer \
--input_dir ./test_images \
--output_dir ./outputs/infer_results \
--ckpt ./best.pth \
--tile_size 512 \
--tile_overlap 32
The command-line parser exposes --mode, --input_dir, --output_dir, --ckpt, --tile_size, and --tile_overlap for inference.
Training
Dataset format
The training and validation data must follow this layout:
datasets/
βββ train/
β βββ low/
β βββ normal/
βββ val/
βββ low/
βββ normal/
Files are paired by identical filename between low/ and normal/.
Basic training command
python naka_color_correction.py \
--mode train \
--data_root ./datasets \
--output_dir ./outputs/naka_color_correction_v2 \
--epochs 200 \
--batch_size 8 \
--num_workers 4 \
--crop_size 256 \
--lr 2e-4 \
--weight_decay 1e-4 \
--base_ch 32 \
--amp
Resume training
python naka_color_correction.py \
--mode train \
--data_root ./datasets \
--output_dir ./outputs/naka_color_correction_v2 \
--resume_ckpt ./outputs/naka_color_correction_v2/checkpoints/latest.pth \
--amp
Initialize from a checkpoint
python naka_color_correction.py \
--mode train \
--data_root ./datasets \
--output_dir ./outputs/naka_color_correction_v2 \
--init_ckpt ./best.pth \
--amp
The parser defaults include epochs=200, batch_size=8, crop_size=256, lr=2e-4, weight_decay=1e-4, base_ch=32, mul_range=0.6, add_range=0.25, hf_kernel_size=5, and hf_sigma=1.0.
Training objective
The provided implementation combines:
- RGB reconstruction loss
- YCbCr chroma/luma consistency loss
- SSIM loss
- edge loss
- VGG perceptual loss
- map regularization
- gray-edge masked loss
- bright-region masked loss
These are implemented through NakaCorrectionLoss and NakaCorrectionLossWithMasks.
Notes on reproducibility
- Validation uses full-resolution images with
batch_size=1. - Mixed precision is enabled with
--ampon CUDA. - Checkpoint loading is backward-compatible with older 3-channel
mul_headweights viaadapt_mul_head_to_single_channel().
Suggested requirements.txt
torch>=2.1.0
torchvision>=0.16.0
numpy>=1.24.0
opencv-python>=4.8.0
Pillow>=10.0.0