Upload folder using huggingface_hub

8b5b8cd verified 16 days ago

6.06 kB

	---
	license: other
	tags:
	- image-colorization
	- ddcolor
	- comic-books
	- computer-vision
	- pytorch
	datasets:
	- cenkbircanoglu/comic-books-classification
	base_model: piddnad/ddcolor_modelscope
	pipeline_tag: image-to-image
	---

	# DDColor — Comic Book Colorization

	Fine-tuned weights of [DDColor](https://github.com/piddnad/DDColor) for automatic colorization of grayscale comic book pages and covers.
	Developed as part of the Digital Image Processing (PID) course at the Universidad de Sevilla.

	---

	## Model Description

	DDColor is a state-of-the-art image colorization model presented at ICCV 2023 ("DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders", Kang et al.). It uses an encoder–dual-decoder architecture: a ConvNeXt backbone extracts image features, a pixel decoder restores spatial resolution at multiple scales, and a transformer-based color decoder learns semantic-aware color queries via cross-attention. The two decoders are fused to produce vivid, photo-realistic colorized outputs in the LAB color space.

	This checkpoint fine-tunes the base DDColor weights specifically on comic book imagery — covers and interior pages — enabling the model to handle the distinctive flat shading, high-contrast line art, and stylized aesthetics typical of comics.

	---

	## Intended Uses

	- Automatic colorization of black-and-white comic pages and covers
	- Restoration or re-colorization of vintage comics
	- Research and education in digital image processing

	### Out-of-scope Uses

	- General natural photo colorization (use the original DDColor weights for that)
	- High-resolution medical or satellite imagery

	---

	## Training Details

	\| Field \| Value \|
	\|---\|---\|
	\| Base model \| DDColor (piddnad/ddcolor_modelscope) \|
	\| Framework \| PyTorch + BasicSR \|
	\| Input size \| 256 × 256 (L channel) \|
	\| Output \| ab channels → combined LAB → RGB \|
	\| Dataset \| [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) (Kaggle) \|
	\| Total images \| 52,156 RGB images across 86 classes \|
	\| Training images \| ~41,725 (80% split per class) \|
	\| Test images \| ~10,431 (20% split per class) \|
	\| Original resolution \| 1988 × 3056 px \|
	\| Resized resolution \| 288 × 432 px (via OpenCV) \|
	\| Total iterations \| 200,000 (warmup: 1,000) \|
	\| Batch size \| 1 per GPU \|
	\| Optimizer (G) \| AdamW — lr 1e-4, weight decay 0.01, betas (0.9, 0.99) \|
	\| Optimizer (D) \| Adam — lr 1e-4, betas (0.9, 0.99) \|
	\| LR scheduler \| MultiStepLR — milestones at 50k / 75k / 100k / 125k / 150k iters, γ 0.5 \|
	\| Training input size \| 128 × 128 (gt_size) \|
	\| Validation input size \| 256 × 256 (gt_size) \|
	\| Hardware \| 1× NVIDIA GeForce RTX 4060 \|

	### Dataset

	The model was fine-tuned on the Comic Books Images dataset published on Kaggle by Cenk Bircanoglu. It contains 52,156 RGB images spanning 86 comic book classes (covers and interior pages). The original images at 1988 × 3056 px were downscaled to 288 × 432 px using OpenCV before training. The dataset was split per class with an 80/20 train/test ratio, yielding ~41,725 training images and ~10,431 test images. Images were converted to grayscale (L channel) at training time; the original color images served as ground truth for the ab channels.

	### Loss Functions

	\| Loss \| Weight \|
	\|---\|---\|
	\| L1 Pixel Loss \| 0.1 \|
	\| Perceptual Loss (VGG16-BN, layers conv1–conv5) \| 5.0 \|
	\| GAN Loss (vanilla) \| 1.0 \|
	\| Colorfulness Loss \| 0.5 \|

	A color enhancement factor of 1.2 was applied during training to boost output vibrancy.

	---

	### Requirements

	```bash
	pip install torch torchvision
	pip install modelscope
	pip install basicsr
	```

	### Inference

	```python
	import cv2
	from modelscope.outputs import OutputKeys
	from modelscope.pipelines import pipeline
	from modelscope.utils.constant import Tasks

	colorizer = pipeline(
	Tasks.image_colorization,
	model="AlejandroParody/PID_Proyect"
	)

	result = colorizer("path/to/grayscale_comic_page.jpg")
	cv2.imwrite("colorized_output.png", result[OutputKeys.OUTPUT_IMG])
	```

	Or load the weights manually with PyTorch:

	```python
	import torch
	from ddcolor.ddcolor_arch import DDColor # from the original DDColor repo

	model = DDColor(...) # use the same config as base model
	model.load_state_dict(torch.load("pytorch_model.pt", map_location="cpu"))
	model.eval()
	```

	---

	## Limitations

	- Performance may degrade on images with very dense text or complex panel layouts not well represented in the training data.
	- The model inherits DDColor's fixed input resolution of 256 × 256; output is then upsampled to the original image dimensions.
	- Color choices are learned from the training distribution — unusual or highly stylized comic styles may yield inconsistent results.

	---

	## Citation

	If you use this work, please cite the original DDColor paper:

	```bibtex
	@inproceedings{kang2023ddcolor,
	title = {DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders},
	author = {Kang, Xiaoyang and Yang, Tao and Ouyang, Wenqi and Ren, Peiran and Li, Lingzhi and Xie, Xuansong},
	booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
	year = {2023}
	}
	```

	---

	## Authors

	Developed by students of the Universidad de Sevilla:

	- [@grnln](https://github.com/grnln)
	- [@AlejandroParody](https://github.com/AlejandroParody)
	- [@josmorlop10](https://github.com/josmorlop10)

	---

	## Acknowledgements

	This project was developed for the Procesamiento de Imágenes Digitales (PID) course at the Universidad de Sevilla.
	Base model and training code: [piddnad/DDColor](https://github.com/piddnad/DDColor).
	Training dataset: [Comic Books Images](https://www.kaggle.com/datasets/cenkbircanoglu/comic-books-classification) by Cenk Bircanoglu on Kaggle.