Duplicate from mosesb/best-comic-panel-detection

83ba1db 28 days ago

4.75 kB

	---
	license: apache-2.0
	library_name: ultralytics
	tags:
	- object-detection
	- yolo
	- yolov12
	- comic-books
	- comic
	- computer-vision
	- ultralytics
	- pytorch
	widget:
	- modelId: mosesb/best-comic-panel-detection
	title: YOLOv12 Comic Panel Detection
	url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg
	datasets:
	- Custom-Object-Detection
	metrics:
	- mAP50
	- mAP50-95
	---

	# YOLOv12 for Comic Panel Detection

	This repository contains a YOLOv12x object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.

	This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages.

	Visit this space to try out the model right now: [`The_Best_Comic_Panel_Detection`](https://huggingface.co/spaces/mosesb/best-comic-panel-detection).

	## Model Details
	* Architecture: `YOLOv12x` (the extra-large variant)
	* Fine-tuned on: A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
	* Classes: `Comic Panel`
	* Frameworks: PyTorch, Ultralytics

	## How to Get Started

	You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required.

	```python
	# 1. Install Ultralytics
	!pip install ultralytics

	from ultralytics import YOLO
	from PIL import Image

	# 2. Load the fine-tuned model
	# Make sure 'best.pt' is in your current directory
	model = YOLO('best.pt')

	# 3. Run inference on an image
	image_path = 'path/to/your/comic_page.jpg'
	results = model.predict(source=image_path)

	# 4. Process and visualize results
	# The 'results' object contains bounding boxes, classes, and confidence scores
	for result in results:
	# Plotting will draw the bounding boxes on the image
	im_array = result.plot()
	im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
	im.show() # Display the image
	# or
	# im.save('prediction_result.jpg')

	# You can also access bounding box data directly
	for box in results[0].boxes:
	print("Class:", model.names[int(box.cls)])
	print("Confidence:", box.conf.item())
	print("Coordinates (xyxy):", box.xyxy[0].tolist())
	print("-" * 20)
	```

	## Training Procedure

	The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.

	### Training Hyperparameters
	* Image Size: 640x640
	* Batch Size: 16
	* Optimizer: AdamW (lr=0.002)
	* Epochs: 200
	* Patience: 100 epochs for early stopping

	![Training and Validation Metrics](results.png)

	## Evaluation

	The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest mAP50-95.

	### Key Performance Metrics
	\| Metric \| Value \| Description \|
	\| :---------- \| :---- \| :--------------------------------------------------- \|
	\| mAP50 \| 0.991 \| Mean Average Precision at IoU threshold 0.50. \|
	\| mAP50-95\| 0.985 \| Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. \|

	The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.

	![Confusion Matrix](confusion_matrix.png)

	### Qualitative Results

	The model correctly identifies panels of various sizes and layouts in the validation set.

	![Validation Predictions](val_batch0_pred.jpg)

	## Intended Use and Limitations
	This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:
	- Creating structured digital reading experiences.
	- Extracting text or characters from individual panels.
	- Analyzing comic book layouts and artistic styles.

	The model has been tested in real world applications and has shown promising results.

	### Limitations
	* Non-Rectangular Panels: The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.

	## Acknowledgements

	* Ultralytics for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library.
	* Roboflow: for their dataset hosting platform and custom-workflow-3-object-detection-g24r5-fmfkb for compiling and annotating this incredible dataset.


	This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).