--- license: apache-2.0 library_name: ultralytics tags: - object-detection - yolo - yolov12 - comic-books - comic - computer-vision - ultralytics - pytorch widget: - modelId: mosesb/best-comic-panel-detection title: YOLOv12 Comic Panel Detection url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg datasets: - Custom-Object-Detection metrics: - mAP50 - mAP50-95 --- # YOLOv12 for Comic Panel Detection This repository contains a **YOLOv12x** object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis. This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages. *Visit this space to try out the model right now: [`The_Best_Comic_Panel_Detection`](https://huggingface.co/spaces/mosesb/best-comic-panel-detection).* ## Model Details * **Architecture:** `YOLOv12x` (the extra-large variant) * **Fine-tuned on:** A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1". * **Classes:** `Comic Panel` * **Frameworks:** PyTorch, Ultralytics ## How to Get Started You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required. ```python # 1. Install Ultralytics !pip install ultralytics from ultralytics import YOLO from PIL import Image # 2. Load the fine-tuned model # Make sure 'best.pt' is in your current directory model = YOLO('best.pt') # 3. Run inference on an image image_path = 'path/to/your/comic_page.jpg' results = model.predict(source=image_path) # 4. Process and visualize results # The 'results' object contains bounding boxes, classes, and confidence scores for result in results: # Plotting will draw the bounding boxes on the image im_array = result.plot() im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB im.show() # Display the image # or # im.save('prediction_result.jpg') # You can also access bounding box data directly for box in results[0].boxes: print("Class:", model.names[int(box.cls)]) print("Confidence:", box.conf.item()) print("Coordinates (xyxy):", box.xyxy[0].tolist()) print("-" * 20) ``` ## Training Procedure The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset. ### Training Hyperparameters * **Image Size:** 640x640 * **Batch Size:** 16 * **Optimizer:** AdamW (lr=0.002) * **Epochs:** 200 * **Patience:** 100 epochs for early stopping ![Training and Validation Metrics](results.png) ## Evaluation The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest **mAP50-95**. ### Key Performance Metrics | Metric | Value | Description | | :---------- | :---- | :--------------------------------------------------- | | **mAP50** | 0.991 | Mean Average Precision at IoU threshold 0.50. | | **mAP50-95**| 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. | The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset. ![Confusion Matrix](confusion_matrix.png) ### Qualitative Results The model correctly identifies panels of various sizes and layouts in the validation set. ![Validation Predictions](val_batch0_pred.jpg) ## Intended Use and Limitations This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for: - Creating structured digital reading experiences. - Extracting text or characters from individual panels. - Analyzing comic book layouts and artistic styles. **The model has been tested in real world applications and has shown promising results.** ### Limitations * **Non-Rectangular Panels:** The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes. ## Acknowledgements * **Ultralytics** for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library. * **Roboflow:** for their dataset hosting platform and **custom-workflow-3-object-detection-g24r5-fmfkb** for compiling and annotating this incredible dataset. *This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).*