|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: ultralytics |
|
|
tags: |
|
|
- object-detection |
|
|
- yolo |
|
|
- yolov12 |
|
|
- comic-books |
|
|
- comic |
|
|
- computer-vision |
|
|
- ultralytics |
|
|
- pytorch |
|
|
widget: |
|
|
- modelId: mosesb/best-comic-panel-detection |
|
|
title: YOLOv12 Comic Panel Detection |
|
|
url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg |
|
|
datasets: |
|
|
- Custom-Object-Detection |
|
|
metrics: |
|
|
- mAP50 |
|
|
- mAP50-95 |
|
|
--- |
|
|
|
|
|
# YOLOv12 for Comic Panel Detection |
|
|
|
|
|
This repository contains a **YOLOv12x** object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis. |
|
|
|
|
|
This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages. |
|
|
|
|
|
*Visit this space to try out the model right now: [`The_Best_Comic_Panel_Detection`](https://huggingface.co/spaces/mosesb/best-comic-panel-detection).* |
|
|
|
|
|
## Model Details |
|
|
* **Architecture:** `YOLOv12x` (the extra-large variant) |
|
|
* **Fine-tuned on:** A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1". |
|
|
* **Classes:** `Comic Panel` |
|
|
* **Frameworks:** PyTorch, Ultralytics |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required. |
|
|
|
|
|
```python |
|
|
# 1. Install Ultralytics |
|
|
!pip install ultralytics |
|
|
|
|
|
from ultralytics import YOLO |
|
|
from PIL import Image |
|
|
|
|
|
# 2. Load the fine-tuned model |
|
|
# Make sure 'best.pt' is in your current directory |
|
|
model = YOLO('best.pt') |
|
|
|
|
|
# 3. Run inference on an image |
|
|
image_path = 'path/to/your/comic_page.jpg' |
|
|
results = model.predict(source=image_path) |
|
|
|
|
|
# 4. Process and visualize results |
|
|
# The 'results' object contains bounding boxes, classes, and confidence scores |
|
|
for result in results: |
|
|
# Plotting will draw the bounding boxes on the image |
|
|
im_array = result.plot() |
|
|
im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB |
|
|
im.show() # Display the image |
|
|
# or |
|
|
# im.save('prediction_result.jpg') |
|
|
|
|
|
# You can also access bounding box data directly |
|
|
for box in results[0].boxes: |
|
|
print("Class:", model.names[int(box.cls)]) |
|
|
print("Confidence:", box.conf.item()) |
|
|
print("Coordinates (xyxy):", box.xyxy[0].tolist()) |
|
|
print("-" * 20) |
|
|
``` |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset. |
|
|
|
|
|
### Training Hyperparameters |
|
|
* **Image Size:** 640x640 |
|
|
* **Batch Size:** 16 |
|
|
* **Optimizer:** AdamW (lr=0.002) |
|
|
* **Epochs:** 200 |
|
|
* **Patience:** 100 epochs for early stopping |
|
|
|
|
|
 |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest **mAP50-95**. |
|
|
|
|
|
### Key Performance Metrics |
|
|
| Metric | Value | Description | |
|
|
| :---------- | :---- | :--------------------------------------------------- | |
|
|
| **mAP50** | 0.991 | Mean Average Precision at IoU threshold 0.50. | |
|
|
| **mAP50-95**| 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. | |
|
|
|
|
|
The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset. |
|
|
|
|
|
 |
|
|
|
|
|
### Qualitative Results |
|
|
|
|
|
The model correctly identifies panels of various sizes and layouts in the validation set. |
|
|
|
|
|
 |
|
|
|
|
|
## Intended Use and Limitations |
|
|
This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for: |
|
|
- Creating structured digital reading experiences. |
|
|
- Extracting text or characters from individual panels. |
|
|
- Analyzing comic book layouts and artistic styles. |
|
|
|
|
|
**The model has been tested in real world applications and has shown promising results.** |
|
|
|
|
|
### Limitations |
|
|
* **Non-Rectangular Panels:** The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes. |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
* **Ultralytics** for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library. |
|
|
* **Roboflow:** for their dataset hosting platform and **custom-workflow-3-object-detection-g24r5-fmfkb** for compiling and annotating this incredible dataset. |
|
|
|
|
|
|
|
|
*This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).* |