DDColor — Comic Book Colorization
Fine-tuned weights of DDColor for automatic colorization of grayscale comic book pages and covers.
Developed as part of the Digital Image Processing (PID) course at the Universidad de Sevilla.
Model Description
DDColor is a state-of-the-art image colorization model presented at ICCV 2023 ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a ConvNeXt backbone extracts image features, a pixel decoder restores spatial resolution at multiple scales, and a transformer-based color decoder learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.
This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.
Intended Uses
- Automatic colorization of black-and-white comic pages and covers
- Restoration or re-colorization of vintage comics
- Research and education in digital image processing
Out-of-scope Uses
- General natural photo colorization (use the original DDColor weights for that)
- High-resolution medical or satellite imagery
Training Details
| Field | Value |
|---|---|
| Base model | DDColor (piddnad/ddcolor_modelscope) |
| Framework | PyTorch + BasicSR |
| Input size | 256 × 256 (L channel) |
| Output | ab channels → combined LAB → RGB |
| Dataset | Comic Books Images (Kaggle) |
| Total images | 52,156 RGB images across 86 classes |
| Training images | ~41,725 (80% split per class) |
| Test images | ~10,431 (20% split per class) |
| Original resolution | 1988 × 3056 px |
| Resized resolution | 288 × 432 px (via OpenCV) |
| Total iterations | 200,000 (warmup: 1,000) |
| Batch size | 1 per GPU |
| Optimizer (G) | AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) |
| Optimizer (D) | Adam — lr 1e-4, betas (0.9, 0.99) |
| LR scheduler | MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 |
| Training input size | 128 × 128 (gt_size) |
| Validation input size | 256 × 256 (gt_size) |
| Hardware | 1× NVIDIA GeForce RTX 4060 |
Dataset
The model was fine-tuned on the Comic Books Images dataset published on Kaggle by Cenk Bircanoglu. It contains 52,156 RGB images spanning 86 comic book classes (covers and interior pages). The original images at 1988 × 3056 px were downscaled to 288 × 432 px using OpenCV before training. The dataset was split per class with an 80/20 train/test ratio, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.
Loss Functions
| Loss | Weight |
|---|---|
| L1 Pixel Loss | 0.1 |
| Perceptual Loss (VGG16-BN, layers conv1–conv5) | 5.0 |
| GAN Loss (vanilla) | 1.0 |
| Colorfulness Loss | 0.5 |
A color enhancement factor of 1.2 was applied during training to boost output vibrancy.
Requirements
pip install torch torchvision
pip install modelscope
pip install basicsr
Inference
import cv2
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
colorizer = pipeline(
Tasks.image_colorization,
model="AlejandroParody/PID_Proyect"
)
result = colorizer("path/to/grayscale_comic_page.jpg")
cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
Or load the weights manually with PyTorch:
import torch
from ddcolor.ddcolor_arch import DDColor # from the original DDColor repo
model = DDColor(...) # use the same config as base model
model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
model.eval()
Limitations
- Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
- The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
- Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.
Citation
If you use this work, please cite the original DDColor paper:
@inproceedings{kang2023ddcolor,
title = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
author = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2023}
}
Authors
Developed by students of the Universidad de Sevilla:
Acknowledgements
This project was developed for the Procesamiento de Imágenes Digitales (PID) course at the Universidad de Sevilla.
Base model and training code: piddnad/DDColor.
Training dataset: Comic Books Images by Cenk Bircanoglu on Kaggle.
Model tree for Aleparqui/PID_proyect
Base model
piddnad/ddcolor_modelscope