File size: 4,752 Bytes

---
license: apache-2.0
library_name: ultralytics
tags:
- object-detection
- yolo
- yolov12
- comic-books
- comic
- computer-vision
- ultralytics
- pytorch
widget:
- modelId: mosesb/best-comic-panel-detection
  title: YOLOv12 Comic Panel Detection
  url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg
datasets:
- Custom-Object-Detection
metrics:
- mAP50
- mAP50-95
---

# YOLOv12 for Comic Panel Detection

This repository contains a **YOLOv12x** object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.

This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages.

*Visit this space to try out the model right now: [`The_Best_Comic_Panel_Detection`](https://huggingface.co/spaces/mosesb/best-comic-panel-detection).*

## Model Details
*   **Architecture:** `YOLOv12x` (the extra-large variant)
*   **Fine-tuned on:** A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
*   **Classes:** `Comic Panel`
*   **Frameworks:** PyTorch, Ultralytics

## How to Get Started

You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required.

```python
# 1. Install Ultralytics
!pip install ultralytics

from ultralytics import YOLO
from PIL import Image

# 2. Load the fine-tuned model
# Make sure 'best.pt' is in your current directory
model = YOLO('best.pt')

# 3. Run inference on an image
image_path = 'path/to/your/comic_page.jpg'
results = model.predict(source=image_path)

# 4. Process and visualize results
# The 'results' object contains bounding boxes, classes, and confidence scores
for result in results:
    # Plotting will draw the bounding boxes on the image
    im_array = result.plot()
    im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
    im.show() # Display the image
    # or
    # im.save('prediction_result.jpg')

# You can also access bounding box data directly
for box in results[0].boxes:
    print("Class:", model.names[int(box.cls)])
    print("Confidence:", box.conf.item())
    print("Coordinates (xyxy):", box.xyxy[0].tolist())
    print("-" * 20)
```

## Training Procedure

The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.

### Training Hyperparameters
*   **Image Size:** 640x640
*   **Batch Size:** 16
*   **Optimizer:** AdamW (lr=0.002)
*   **Epochs:** 200
*   **Patience:** 100 epochs for early stopping

![Training and Validation Metrics](results.png)

## Evaluation

The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest **mAP50-95**.

### Key Performance Metrics
| Metric      | Value | Description                                          |
| :---------- | :---- | :--------------------------------------------------- |
| **mAP50**   | 0.991 | Mean Average Precision at IoU threshold 0.50.        |
| **mAP50-95**| 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. |

The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.

![Confusion Matrix](confusion_matrix.png)

### Qualitative Results

The model correctly identifies panels of various sizes and layouts in the validation set.

![Validation Predictions](val_batch0_pred.jpg)

## Intended Use and Limitations
This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:
-   Creating structured digital reading experiences.
-   Extracting text or characters from individual panels.
-   Analyzing comic book layouts and artistic styles.

**The model has been tested in real world applications and has shown promising results.**

### Limitations
*   **Non-Rectangular Panels:** The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.

## Acknowledgements

*   **Ultralytics** for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library.
*   **Roboflow:** for their dataset hosting platform and **custom-workflow-3-object-detection-g24r5-fmfkb** for compiling and annotating this incredible dataset.


*This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).*