| ---
|
| license: other
|
| tags:
|
| - image-colorization
|
| - ddcolor
|
| - comic-books
|
| - computer-vision
|
| - pytorch
|
| datasets:
|
| - cenkbircanoglu/comic-books-classification
|
| base_model: piddnad/ddcolor_modelscope
|
| pipeline_tag: image-to-image
|
| ---
|
|
|
| # DDColor — Comic Book Colorization
|
|
|
| Fine-tuned weights of [DDColor](https://github.com/piddnad/DDColor) for automatic colorization of grayscale comic book pages and covers.
|
| Developed as part of the **Digital Image Processing (PID)** course at the **Universidad de Sevilla**.
|
|
|
| ---
|
|
|
| ## Model Description
|
|
|
| DDColor is a state-of-the-art image colorization model presented at **ICCV 2023** ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a **ConvNeXt backbone** extracts image features, a **pixel decoder** restores spatial resolution at multiple scales, and a **transformer-based color decoder** learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.
|
|
|
| This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.
|
|
|
| ---
|
|
|
| ## Intended Uses
|
|
|
| - Automatic colorization of black-and-white comic pages and covers
|
| - Restoration or re-colorization of vintage comics
|
| - Research and education in digital image processing
|
|
|
| ### Out-of-scope Uses
|
|
|
| - General natural photo colorization (use the original DDColor weights for that)
|
| - High-resolution medical or satellite imagery
|
|
|
| ---
|
|
|
| ## Training Details
|
|
|
| | Field | Value |
|
| |---|---|
|
| | **Base model** | DDColor (piddnad/ddcolor_modelscope) |
|
| | **Framework** | PyTorch + BasicSR |
|
| | **Input size** | 256 × 256 (L channel) |
|
| | **Output** | ab channels → combined LAB → RGB |
|
| | **Dataset** | [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) (Kaggle) |
|
| | **Total images** | 52,156 RGB images across 86 classes |
|
| | **Training images** | ~41,725 (80% split per class) |
|
| | **Test images** | ~10,431 (20% split per class) |
|
| | **Original resolution** | 1988 × 3056 px |
|
| | **Resized resolution** | 288 × 432 px (via OpenCV) |
|
| | **Total iterations** | 200,000 (warmup: 1,000) |
|
| | **Batch size** | 1 per GPU |
|
| | **Optimizer (G)** | AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) |
|
| | **Optimizer (D)** | Adam — lr 1e-4, betas (0.9, 0.99) |
|
| | **LR scheduler** | MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 |
|
| | **Training input size** | 128 × 128 (gt_size) |
|
| | **Validation input size** | 256 × 256 (gt_size) |
|
| | **Hardware** | 1× NVIDIA GeForce RTX 4060 |
|
|
|
| ### Dataset
|
|
|
| The model was fine-tuned on the **Comic Books Images** dataset published on Kaggle by Cenk Bircanoglu. It contains **52,156 RGB images** spanning **86 comic book classes** (covers and interior pages). The original images at 1988 × 3056 px were downscaled to **288 × 432 px** using OpenCV before training. The dataset was split per class with an **80/20 train/test ratio**, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.
|
|
|
| ### Loss Functions
|
|
|
| | Loss | Weight |
|
| |---|---|
|
| | L1 Pixel Loss | 0.1 |
|
| | Perceptual Loss (VGG16-BN, layers conv1–conv5) | 5.0 |
|
| | GAN Loss (vanilla) | 1.0 |
|
| | Colorfulness Loss | 0.5 |
|
|
|
| A **color enhancement factor of 1.2** was applied during training to boost output vibrancy.
|
|
|
| ---
|
|
|
| ### Requirements
|
|
|
| ```bash
|
| pip install torch torchvision
|
| pip install modelscope
|
| pip install basicsr
|
| ```
|
|
|
| ### Inference
|
|
|
| ```python
|
| import cv2
|
| from modelscope.outputs import OutputKeys
|
| from modelscope.pipelines import pipeline
|
| from modelscope.utils.constant import Tasks
|
|
|
| colorizer = pipeline(
|
| Tasks.image_colorization,
|
| model="AlejandroParody/PID_Proyect"
|
| )
|
|
|
| result = colorizer("path/to/grayscale_comic_page.jpg")
|
| cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
|
| ```
|
|
|
| Or load the weights manually with PyTorch:
|
|
|
| ```python
|
| import torch
|
| from ddcolor.ddcolor_arch import DDColor # from the original DDColor repo
|
|
|
| model = DDColor(...) # use the same config as base model
|
| model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
|
| model.eval()
|
| ```
|
|
|
| ---
|
|
|
| ## Limitations
|
|
|
| - Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
|
| - The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
|
| - Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.
|
|
|
| ---
|
|
|
| ## Citation
|
|
|
| If you use this work, please cite the original DDColor paper:
|
|
|
| ```bibtex
|
| @inproceedings{kang2023ddcolor,
|
| title = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
|
| author = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
|
| booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
|
| year = {2023}
|
| }
|
| ```
|
|
|
| ---
|
|
|
| ## Authors
|
|
|
| Developed by students of the **Universidad de Sevilla**:
|
|
|
| - [@grnln](https://github.com/grnln)
|
| - [@AlejandroParody](https://github.com/AlejandroParody)
|
| - [@josmorlop10](https://github.com/josmorlop10)
|
|
|
| ---
|
|
|
| ## Acknowledgements
|
|
|
| This project was developed for the **Procesamiento de Imágenes Digitales (PID)** course at the **Universidad de Sevilla**.
|
| Base model and training code: [piddnad/DDColor](https://github.com/piddnad/DDColor).
|
| Training dataset: [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) by Cenk Bircanoglu on Kaggle. |