Aleparqui commited on
Commit
8b5b8cd
·
verified ·
1 Parent(s): 26fe2d4

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +158 -0
  2. net_g_latest.pth +3 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ tags:
4
+ - image-colorization
5
+ - ddcolor
6
+ - comic-books
7
+ - computer-vision
8
+ - pytorch
9
+ datasets:
10
+ - cenkbircanoglu/comic-books-classification
11
+ base_model: piddnad/ddcolor_modelscope
12
+ pipeline_tag: image-to-image
13
+ ---
14
+
15
+ # DDColor — Comic Book Colorization
16
+
17
+ Fine-tuned weights of [DDColor](https://github.com/piddnad/DDColor) for automatic colorization of grayscale comic book pages and covers.
18
+ Developed as part of the **Digital Image Processing (PID)** course at the **Universidad de Sevilla**.
19
+
20
+ ---
21
+
22
+ ## Model Description
23
+
24
+ DDColor is a state-of-the-art image colorization model presented at **ICCV 2023** ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a **ConvNeXt backbone** extracts image features, a **pixel decoder** restores spatial resolution at multiple scales, and a **transformer-based color decoder** learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.
25
+
26
+ This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.
27
+
28
+ ---
29
+
30
+ ## Intended Uses
31
+
32
+ - Automatic colorization of black-and-white comic pages and covers
33
+ - Restoration or re-colorization of vintage comics
34
+ - Research and education in digital image processing
35
+
36
+ ### Out-of-scope Uses
37
+
38
+ - General natural photo colorization (use the original DDColor weights for that)
39
+ - High-resolution medical or satellite imagery
40
+
41
+ ---
42
+
43
+ ## Training Details
44
+
45
+ | Field | Value |
46
+ |---|---|
47
+ | **Base model** | DDColor (piddnad/ddcolor_modelscope) |
48
+ | **Framework** | PyTorch + BasicSR |
49
+ | **Input size** | 256 × 256 (L channel) |
50
+ | **Output** | ab channels → combined LAB → RGB |
51
+ | **Dataset** | [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) (Kaggle) |
52
+ | **Total images** | 52,156 RGB images across 86 classes |
53
+ | **Training images** | ~41,725 (80% split per class) |
54
+ | **Test images** | ~10,431 (20% split per class) |
55
+ | **Original resolution** | 1988 × 3056 px |
56
+ | **Resized resolution** | 288 × 432 px (via OpenCV) |
57
+ | **Total iterations** | 200,000 (warmup: 1,000) |
58
+ | **Batch size** | 1 per GPU |
59
+ | **Optimizer (G)** | AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) |
60
+ | **Optimizer (D)** | Adam — lr 1e-4, betas (0.9, 0.99) |
61
+ | **LR scheduler** | MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 |
62
+ | **Training input size** | 128 × 128 (gt_size) |
63
+ | **Validation input size** | 256 × 256 (gt_size) |
64
+ | **Hardware** | 1× NVIDIA GeForce RTX 4060 |
65
+
66
+ ### Dataset
67
+
68
+ The model was fine-tuned on the **Comic Books Images** dataset published on Kaggle by Cenk Bircanoglu. It contains **52,156 RGB images** spanning **86 comic book classes** (covers and interior pages). The original images at 1988 × 3056 px were downscaled to **288 × 432 px** using OpenCV before training. The dataset was split per class with an **80/20 train/test ratio**, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.
69
+
70
+ ### Loss Functions
71
+
72
+ | Loss | Weight |
73
+ |---|---|
74
+ | L1 Pixel Loss | 0.1 |
75
+ | Perceptual Loss (VGG16-BN, layers conv1–conv5) | 5.0 |
76
+ | GAN Loss (vanilla) | 1.0 |
77
+ | Colorfulness Loss | 0.5 |
78
+
79
+ A **color enhancement factor of 1.2** was applied during training to boost output vibrancy.
80
+
81
+ ---
82
+
83
+ ### Requirements
84
+
85
+ ```bash
86
+ pip install torch torchvision
87
+ pip install modelscope
88
+ pip install basicsr
89
+ ```
90
+
91
+ ### Inference
92
+
93
+ ```python
94
+ import cv2
95
+ from modelscope.outputs import OutputKeys
96
+ from modelscope.pipelines import pipeline
97
+ from modelscope.utils.constant import Tasks
98
+
99
+ colorizer = pipeline(
100
+ Tasks.image_colorization,
101
+ model="AlejandroParody/PID_Proyect"
102
+ )
103
+
104
+ result = colorizer("path/to/grayscale_comic_page.jpg")
105
+ cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
106
+ ```
107
+
108
+ Or load the weights manually with PyTorch:
109
+
110
+ ```python
111
+ import torch
112
+ from ddcolor.ddcolor_arch import DDColor # from the original DDColor repo
113
+
114
+ model = DDColor(...) # use the same config as base model
115
+ model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
116
+ model.eval()
117
+ ```
118
+
119
+ ---
120
+
121
+ ## Limitations
122
+
123
+ - Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
124
+ - The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
125
+ - Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.
126
+
127
+ ---
128
+
129
+ ## Citation
130
+
131
+ If you use this work, please cite the original DDColor paper:
132
+
133
+ ```bibtex
134
+ @inproceedings{kang2023ddcolor,
135
+ title = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
136
+ author = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
137
+ booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
138
+ year = {2023}
139
+ }
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Authors
145
+
146
+ Developed by students of the **Universidad de Sevilla**:
147
+
148
+ - [@grnln](https://github.com/grnln)
149
+ - [@AlejandroParody](https://github.com/AlejandroParody)
150
+ - [@josmorlop10](https://github.com/josmorlop10)
151
+
152
+ ---
153
+
154
+ ## Acknowledgements
155
+
156
+ This project was developed for the **Procesamiento de Imágenes Digitales (PID)** course at the **Universidad de Sevilla**.
157
+ Base model and training code: [piddnad/DDColor](https://github.com/piddnad/DDColor).
158
+ Training dataset: [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) by Cenk Bircanoglu on Kaggle.
net_g_latest.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbf10952e5c0fcf97bbf2f9cc46dc8196bd929df221c9813117eeab14c1665b3
3
+ size 911915115