Upload folder using huggingface_hub
Browse files- README.md +158 -0
- net_g_latest.pth +3 -0
README.md
ADDED
|
@@ -0,0 +1,158 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
tags:
|
| 4 |
+
- image-colorization
|
| 5 |
+
- ddcolor
|
| 6 |
+
- comic-books
|
| 7 |
+
- computer-vision
|
| 8 |
+
- pytorch
|
| 9 |
+
datasets:
|
| 10 |
+
- cenkbircanoglu/comic-books-classification
|
| 11 |
+
base_model: piddnad/ddcolor_modelscope
|
| 12 |
+
pipeline_tag: image-to-image
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# DDColor — Comic Book Colorization
|
| 16 |
+
|
| 17 |
+
Fine-tuned weights of [DDColor](https://github.com/piddnad/DDColor) for automatic colorization of grayscale comic book pages and covers.
|
| 18 |
+
Developed as part of the **Digital Image Processing (PID)** course at the **Universidad de Sevilla**.
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Model Description
|
| 23 |
+
|
| 24 |
+
DDColor is a state-of-the-art image colorization model presented at **ICCV 2023** ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a **ConvNeXt backbone** extracts image features, a **pixel decoder** restores spatial resolution at multiple scales, and a **transformer-based color decoder** learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.
|
| 25 |
+
|
| 26 |
+
This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Intended Uses
|
| 31 |
+
|
| 32 |
+
- Automatic colorization of black-and-white comic pages and covers
|
| 33 |
+
- Restoration or re-colorization of vintage comics
|
| 34 |
+
- Research and education in digital image processing
|
| 35 |
+
|
| 36 |
+
### Out-of-scope Uses
|
| 37 |
+
|
| 38 |
+
- General natural photo colorization (use the original DDColor weights for that)
|
| 39 |
+
- High-resolution medical or satellite imagery
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## Training Details
|
| 44 |
+
|
| 45 |
+
| Field | Value |
|
| 46 |
+
|---|---|
|
| 47 |
+
| **Base model** | DDColor (piddnad/ddcolor_modelscope) |
|
| 48 |
+
| **Framework** | PyTorch + BasicSR |
|
| 49 |
+
| **Input size** | 256 × 256 (L channel) |
|
| 50 |
+
| **Output** | ab channels → combined LAB → RGB |
|
| 51 |
+
| **Dataset** | [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) (Kaggle) |
|
| 52 |
+
| **Total images** | 52,156 RGB images across 86 classes |
|
| 53 |
+
| **Training images** | ~41,725 (80% split per class) |
|
| 54 |
+
| **Test images** | ~10,431 (20% split per class) |
|
| 55 |
+
| **Original resolution** | 1988 × 3056 px |
|
| 56 |
+
| **Resized resolution** | 288 × 432 px (via OpenCV) |
|
| 57 |
+
| **Total iterations** | 200,000 (warmup: 1,000) |
|
| 58 |
+
| **Batch size** | 1 per GPU |
|
| 59 |
+
| **Optimizer (G)** | AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) |
|
| 60 |
+
| **Optimizer (D)** | Adam — lr 1e-4, betas (0.9, 0.99) |
|
| 61 |
+
| **LR scheduler** | MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 |
|
| 62 |
+
| **Training input size** | 128 × 128 (gt_size) |
|
| 63 |
+
| **Validation input size** | 256 × 256 (gt_size) |
|
| 64 |
+
| **Hardware** | 1× NVIDIA GeForce RTX 4060 |
|
| 65 |
+
|
| 66 |
+
### Dataset
|
| 67 |
+
|
| 68 |
+
The model was fine-tuned on the **Comic Books Images** dataset published on Kaggle by Cenk Bircanoglu. It contains **52,156 RGB images** spanning **86 comic book classes** (covers and interior pages). The original images at 1988 × 3056 px were downscaled to **288 × 432 px** using OpenCV before training. The dataset was split per class with an **80/20 train/test ratio**, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.
|
| 69 |
+
|
| 70 |
+
### Loss Functions
|
| 71 |
+
|
| 72 |
+
| Loss | Weight |
|
| 73 |
+
|---|---|
|
| 74 |
+
| L1 Pixel Loss | 0.1 |
|
| 75 |
+
| Perceptual Loss (VGG16-BN, layers conv1–conv5) | 5.0 |
|
| 76 |
+
| GAN Loss (vanilla) | 1.0 |
|
| 77 |
+
| Colorfulness Loss | 0.5 |
|
| 78 |
+
|
| 79 |
+
A **color enhancement factor of 1.2** was applied during training to boost output vibrancy.
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
### Requirements
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
pip install torch torchvision
|
| 87 |
+
pip install modelscope
|
| 88 |
+
pip install basicsr
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Inference
|
| 92 |
+
|
| 93 |
+
```python
|
| 94 |
+
import cv2
|
| 95 |
+
from modelscope.outputs import OutputKeys
|
| 96 |
+
from modelscope.pipelines import pipeline
|
| 97 |
+
from modelscope.utils.constant import Tasks
|
| 98 |
+
|
| 99 |
+
colorizer = pipeline(
|
| 100 |
+
Tasks.image_colorization,
|
| 101 |
+
model="AlejandroParody/PID_Proyect"
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
result = colorizer("path/to/grayscale_comic_page.jpg")
|
| 105 |
+
cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
Or load the weights manually with PyTorch:
|
| 109 |
+
|
| 110 |
+
```python
|
| 111 |
+
import torch
|
| 112 |
+
from ddcolor.ddcolor_arch import DDColor # from the original DDColor repo
|
| 113 |
+
|
| 114 |
+
model = DDColor(...) # use the same config as base model
|
| 115 |
+
model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
|
| 116 |
+
model.eval()
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
## Limitations
|
| 122 |
+
|
| 123 |
+
- Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
|
| 124 |
+
- The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
|
| 125 |
+
- Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.
|
| 126 |
+
|
| 127 |
+
---
|
| 128 |
+
|
| 129 |
+
## Citation
|
| 130 |
+
|
| 131 |
+
If you use this work, please cite the original DDColor paper:
|
| 132 |
+
|
| 133 |
+
```bibtex
|
| 134 |
+
@inproceedings{kang2023ddcolor,
|
| 135 |
+
title = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
|
| 136 |
+
author = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
|
| 137 |
+
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
|
| 138 |
+
year = {2023}
|
| 139 |
+
}
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
---
|
| 143 |
+
|
| 144 |
+
## Authors
|
| 145 |
+
|
| 146 |
+
Developed by students of the **Universidad de Sevilla**:
|
| 147 |
+
|
| 148 |
+
- [@grnln](https://github.com/grnln)
|
| 149 |
+
- [@AlejandroParody](https://github.com/AlejandroParody)
|
| 150 |
+
- [@josmorlop10](https://github.com/josmorlop10)
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
## Acknowledgements
|
| 155 |
+
|
| 156 |
+
This project was developed for the **Procesamiento de Imágenes Digitales (PID)** course at the **Universidad de Sevilla**.
|
| 157 |
+
Base model and training code: [piddnad/DDColor](https://github.com/piddnad/DDColor).
|
| 158 |
+
Training dataset: [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) by Cenk Bircanoglu on Kaggle.
|
net_g_latest.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dbf10952e5c0fcf97bbf2f9cc46dc8196bd929df221c9813117eeab14c1665b3
|
| 3 |
+
size 911915115
|