mosesb's picture
Update README.md
bbab115 verified
---
license: apache-2.0
library_name: ultralytics
tags:
- object-detection
- yolo
- yolov12
- comic-books
- comic
- computer-vision
- ultralytics
- pytorch
widget:
- modelId: mosesb/best-comic-panel-detection
title: YOLOv12 Comic Panel Detection
url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg
datasets:
- Custom-Object-Detection
metrics:
- mAP50
- mAP50-95
---
# YOLOv12 for Comic Panel Detection
This repository contains a **YOLOv12x** object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.
This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages.
*Visit this space to try out the model right now: [`The_Best_Comic_Panel_Detection`](https://huggingface.co/spaces/mosesb/best-comic-panel-detection).*
## Model Details
* **Architecture:** `YOLOv12x` (the extra-large variant)
* **Fine-tuned on:** A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
* **Classes:** `Comic Panel`
* **Frameworks:** PyTorch, Ultralytics
## How to Get Started
You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required.
```python
# 1. Install Ultralytics
!pip install ultralytics
from ultralytics import YOLO
from PIL import Image
# 2. Load the fine-tuned model
# Make sure 'best.pt' is in your current directory
model = YOLO('best.pt')
# 3. Run inference on an image
image_path = 'path/to/your/comic_page.jpg'
results = model.predict(source=image_path)
# 4. Process and visualize results
# The 'results' object contains bounding boxes, classes, and confidence scores
for result in results:
# Plotting will draw the bounding boxes on the image
im_array = result.plot()
im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
im.show() # Display the image
# or
# im.save('prediction_result.jpg')
# You can also access bounding box data directly
for box in results[0].boxes:
print("Class:", model.names[int(box.cls)])
print("Confidence:", box.conf.item())
print("Coordinates (xyxy):", box.xyxy[0].tolist())
print("-" * 20)
```
## Training Procedure
The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.
### Training Hyperparameters
* **Image Size:** 640x640
* **Batch Size:** 16
* **Optimizer:** AdamW (lr=0.002)
* **Epochs:** 200
* **Patience:** 100 epochs for early stopping
![Training and Validation Metrics](results.png)
## Evaluation
The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest **mAP50-95**.
### Key Performance Metrics
| Metric | Value | Description |
| :---------- | :---- | :--------------------------------------------------- |
| **mAP50** | 0.991 | Mean Average Precision at IoU threshold 0.50. |
| **mAP50-95**| 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. |
The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.
![Confusion Matrix](confusion_matrix.png)
### Qualitative Results
The model correctly identifies panels of various sizes and layouts in the validation set.
![Validation Predictions](val_batch0_pred.jpg)
## Intended Use and Limitations
This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:
- Creating structured digital reading experiences.
- Extracting text or characters from individual panels.
- Analyzing comic book layouts and artistic styles.
**The model has been tested in real world applications and has shown promising results.**
### Limitations
* **Non-Rectangular Panels:** The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.
## Acknowledgements
* **Ultralytics** for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library.
* **Roboflow:** for their dataset hosting platform and **custom-workflow-3-object-detection-g24r5-fmfkb** for compiling and annotating this incredible dataset.
*This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).*