DDColor — Comic Book Colorization

Fine-tuned weights of DDColor for automatic colorization of grayscale comic book pages and covers.
Developed as part of the Digital Image Processing (PID) course at the Universidad de Sevilla.


Model Description

DDColor is a state-of-the-art image colorization model presented at ICCV 2023 ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a ConvNeXt backbone extracts image features, a pixel decoder restores spatial resolution at multiple scales, and a transformer-based color decoder learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.

This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.


Intended Uses

  • Automatic colorization of black-and-white comic pages and covers
  • Restoration or re-colorization of vintage comics
  • Research and education in digital image processing

Out-of-scope Uses

  • General natural photo colorization (use the original DDColor weights for that)
  • High-resolution medical or satellite imagery

Training Details

Field Value
Base model DDColor (piddnad/ddcolor_modelscope)
Framework PyTorch + BasicSR
Input size 256 × 256 (L channel)
Output ab channels → combined LAB → RGB
Dataset Comic Books Images (Kaggle)
Total images 52,156 RGB images across 86 classes
Training images ~41,725 (80% split per class)
Test images ~10,431 (20% split per class)
Original resolution 1988 × 3056 px
Resized resolution 288 × 432 px (via OpenCV)
Total iterations 200,000 (warmup: 1,000)
Batch size 1 per GPU
Optimizer (G) AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99)
Optimizer (D) Adam — lr 1e-4, betas (0.9, 0.99)
LR scheduler MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5
Training input size 128 × 128 (gt_size)
Validation input size 256 × 256 (gt_size)
Hardware 1× NVIDIA GeForce RTX 4060

Dataset

The model was fine-tuned on the Comic Books Images dataset published on Kaggle by Cenk Bircanoglu. It contains 52,156 RGB images spanning 86 comic book classes (covers and interior pages). The original images at 1988 × 3056 px were downscaled to 288 × 432 px using OpenCV before training. The dataset was split per class with an 80/20 train/test ratio, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.

Loss Functions

Loss Weight
L1 Pixel Loss 0.1
Perceptual Loss (VGG16-BN, layers conv1–conv5) 5.0
GAN Loss (vanilla) 1.0
Colorfulness Loss 0.5

A color enhancement factor of 1.2 was applied during training to boost output vibrancy.


Requirements

pip install torch torchvision
pip install modelscope
pip install basicsr

Inference

import cv2
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

colorizer = pipeline(
    Tasks.image_colorization,
    model="AlejandroParody/PID_Proyect" 
)

result = colorizer("path/to/grayscale_comic_page.jpg")
cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])

Or load the weights manually with PyTorch:

import torch
from ddcolor.ddcolor_arch import DDColor  # from the original DDColor repo

model = DDColor(...)  # use the same config as base model
model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
model.eval()

Limitations

  • Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
  • The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
  • Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.

Citation

If you use this work, please cite the original DDColor paper:

@inproceedings{kang2023ddcolor,
  title     = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
  author    = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2023}
}

Authors

Developed by students of the Universidad de Sevilla:


Acknowledgements

This project was developed for the Procesamiento de Imágenes Digitales (PID) course at the Universidad de Sevilla.
Base model and training code: piddnad/DDColor.
Training dataset: Comic Books Images by Cenk Bircanoglu on Kaggle.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Aleparqui/PID_proyect

Finetuned
(1)
this model