File size: 6,060 Bytes
8b5b8cd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | ---
license: other
tags:
- image-colorization
- ddcolor
- comic-books
- computer-vision
- pytorch
datasets:
- cenkbircanoglu/comic-books-classification
base_model: piddnad/ddcolor_modelscope
pipeline_tag: image-to-image
---
# DDColor — Comic Book Colorization
Fine-tuned weights of [DDColor](https://github.com/piddnad/DDColor) for automatic colorization of grayscale comic book pages and covers.
Developed as part of the **Digital Image Processing (PID)** course at the **Universidad de Sevilla**.
---
## Model Description
DDColor is a state-of-the-art image colorization model presented at **ICCV 2023** ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a **ConvNeXt backbone** extracts image features, a **pixel decoder** restores spatial resolution at multiple scales, and a **transformer-based color decoder** learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.
This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.
---
## Intended Uses
- Automatic colorization of black-and-white comic pages and covers
- Restoration or re-colorization of vintage comics
- Research and education in digital image processing
### Out-of-scope Uses
- General natural photo colorization (use the original DDColor weights for that)
- High-resolution medical or satellite imagery
---
## Training Details
| Field | Value |
|---|---|
| **Base model** | DDColor (piddnad/ddcolor_modelscope) |
| **Framework** | PyTorch + BasicSR |
| **Input size** | 256 × 256 (L channel) |
| **Output** | ab channels → combined LAB → RGB |
| **Dataset** | [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) (Kaggle) |
| **Total images** | 52,156 RGB images across 86 classes |
| **Training images** | ~41,725 (80% split per class) |
| **Test images** | ~10,431 (20% split per class) |
| **Original resolution** | 1988 × 3056 px |
| **Resized resolution** | 288 × 432 px (via OpenCV) |
| **Total iterations** | 200,000 (warmup: 1,000) |
| **Batch size** | 1 per GPU |
| **Optimizer (G)** | AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) |
| **Optimizer (D)** | Adam — lr 1e-4, betas (0.9, 0.99) |
| **LR scheduler** | MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 |
| **Training input size** | 128 × 128 (gt_size) |
| **Validation input size** | 256 × 256 (gt_size) |
| **Hardware** | 1× NVIDIA GeForce RTX 4060 |
### Dataset
The model was fine-tuned on the **Comic Books Images** dataset published on Kaggle by Cenk Bircanoglu. It contains **52,156 RGB images** spanning **86 comic book classes** (covers and interior pages). The original images at 1988 × 3056 px were downscaled to **288 × 432 px** using OpenCV before training. The dataset was split per class with an **80/20 train/test ratio**, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.
### Loss Functions
| Loss | Weight |
|---|---|
| L1 Pixel Loss | 0.1 |
| Perceptual Loss (VGG16-BN, layers conv1–conv5) | 5.0 |
| GAN Loss (vanilla) | 1.0 |
| Colorfulness Loss | 0.5 |
A **color enhancement factor of 1.2** was applied during training to boost output vibrancy.
---
### Requirements
```bash
pip install torch torchvision
pip install modelscope
pip install basicsr
```
### Inference
```python
import cv2
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
colorizer = pipeline(
Tasks.image_colorization,
model="AlejandroParody/PID_Proyect"
)
result = colorizer("path/to/grayscale_comic_page.jpg")
cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
```
Or load the weights manually with PyTorch:
```python
import torch
from ddcolor.ddcolor_arch import DDColor # from the original DDColor repo
model = DDColor(...) # use the same config as base model
model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
model.eval()
```
---
## Limitations
- Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
- The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
- Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.
---
## Citation
If you use this work, please cite the original DDColor paper:
```bibtex
@inproceedings{kang2023ddcolor,
title = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
author = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2023}
}
```
---
## Authors
Developed by students of the **Universidad de Sevilla**:
- [@grnln](https://github.com/grnln)
- [@AlejandroParody](https://github.com/AlejandroParody)
- [@josmorlop10](https://github.com/josmorlop10)
---
## Acknowledgements
This project was developed for the **Procesamiento de Imágenes Digitales (PID)** course at the **Universidad de Sevilla**.
Base model and training code: [piddnad/DDColor](https://github.com/piddnad/DDColor).
Training dataset: [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) by Cenk Bircanoglu on Kaggle. |