File size: 6,060 Bytes
8b5b8cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---

license: other
tags:
  - image-colorization
  - ddcolor
  - comic-books
  - computer-vision
  - pytorch
datasets:
  - cenkbircanoglu/comic-books-classification
base_model: piddnad/ddcolor_modelscope
pipeline_tag: image-to-image
---


# DDColor — Comic Book Colorization

Fine-tuned weights of [DDColor](https://github.com/piddnad/DDColor) for automatic colorization of grayscale comic book pages and covers.  
Developed as part of the **Digital Image Processing (PID)** course at the **Universidad de Sevilla**.

---

## Model Description

DDColor is a state-of-the-art image colorization model presented at **ICCV 2023** ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a **ConvNeXt backbone** extracts image features, a **pixel decoder** restores spatial resolution at multiple scales, and a **transformer-based color decoder** learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.

This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.

---

## Intended Uses

- Automatic colorization of black-and-white comic pages and covers
- Restoration or re-colorization of vintage comics
- Research and education in digital image processing

### Out-of-scope Uses

- General natural photo colorization (use the original DDColor weights for that)
- High-resolution medical or satellite imagery

---

## Training Details

| Field | Value |
|---|---|
| **Base model** | DDColor (piddnad/ddcolor_modelscope) |

| **Framework** | PyTorch + BasicSR |

| **Input size** | 256 × 256 (L channel) |

| **Output** | ab channels → combined LAB → RGB |

| **Dataset** | [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) (Kaggle) |

| **Total images** | 52,156 RGB images across 86 classes |

| **Training images** | ~41,725 (80% split per class) |

| **Test images** | ~10,431 (20% split per class) |

| **Original resolution** | 1988 × 3056 px |

| **Resized resolution** | 288 × 432 px (via OpenCV) |

| **Total iterations** | 200,000 (warmup: 1,000) |

| **Batch size** | 1 per GPU |

| **Optimizer (G)** | AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) |

| **Optimizer (D)** | Adam — lr 1e-4, betas (0.9, 0.99) |

| **LR scheduler** | MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 |

| **Training input size** | 128 × 128 (gt_size) |
| **Validation input size** | 256 × 256 (gt_size) |

| **Hardware** | 1× NVIDIA GeForce RTX 4060 |



### Dataset



The model was fine-tuned on the **Comic Books Images** dataset published on Kaggle by Cenk Bircanoglu. It contains **52,156 RGB images** spanning **86 comic book classes** (covers and interior pages). The original images at 1988 × 3056 px were downscaled to **288 × 432 px** using OpenCV before training. The dataset was split per class with an **80/20 train/test ratio**, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.



### Loss Functions



| Loss | Weight |

|---|---|

| L1 Pixel Loss | 0.1 |

| Perceptual Loss (VGG16-BN, layers conv1–conv5) | 5.0 |

| GAN Loss (vanilla) | 1.0 |

| Colorfulness Loss | 0.5 |



A **color enhancement factor of 1.2** was applied during training to boost output vibrancy.



---



### Requirements



```bash

pip install torch torchvision

pip install modelscope

pip install basicsr

```



### Inference



```python

import cv2

from modelscope.outputs import OutputKeys

from modelscope.pipelines import pipeline

from modelscope.utils.constant import Tasks



colorizer = pipeline(

    Tasks.image_colorization,
    model="AlejandroParody/PID_Proyect" 

)


result = colorizer("path/to/grayscale_comic_page.jpg")
cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
```



Or load the weights manually with PyTorch:



```python

import torch

from ddcolor.ddcolor_arch import DDColor  # from the original DDColor repo



model = DDColor(...)  # use the same config as base model

model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))

model.eval()

```

---

## Limitations

- Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
- The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
- Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.

---

## Citation

If you use this work, please cite the original DDColor paper:

```bibtex

@inproceedings{kang2023ddcolor,

  title     = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},

  author    = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},

  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},

  year      = {2023}

}

```

---

## Authors

Developed by students of the **Universidad de Sevilla**:

- [@grnln](https://github.com/grnln)
- [@AlejandroParody](https://github.com/AlejandroParody)
- [@josmorlop10](https://github.com/josmorlop10)

---

## Acknowledgements

This project was developed for the **Procesamiento de Imágenes Digitales (PID)** course at the **Universidad de Sevilla**.  
Base model and training code: [piddnad/DDColor](https://github.com/piddnad/DDColor).  
Training dataset: [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) by Cenk Bircanoglu on Kaggle.