mosesb commited on
Commit
bbab115
·
verified ·
1 Parent(s): ab746d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -122
README.md CHANGED
@@ -1,123 +1,125 @@
1
- ---
2
- license: apache-2.0
3
- library_name: ultralytics
4
- tags:
5
- - object-detection
6
- - yolo
7
- - yolov12
8
- - comic-books
9
- - comic
10
- - computer-vision
11
- - ultralytics
12
- - pytorch
13
- widget:
14
- - modelId: mosesb/best-comic-panel-detection
15
- title: YOLOv12 Comic Panel Detection
16
- url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg
17
- datasets:
18
- - Custom-Object-Detection
19
- metrics:
20
- - mAP50
21
- - mAP50-95
22
- ---
23
-
24
- # YOLOv12 for Comic Panel Detection
25
-
26
- This repository contains a **YOLOv12x** object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.
27
-
28
- This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages.
29
-
30
- ## Model Details
31
- * **Architecture:** `YOLOv12x` (the extra-large variant)
32
- * **Fine-tuned on:** A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
33
- * **Classes:** `Comic Panel`
34
- * **Frameworks:** PyTorch, Ultralytics
35
-
36
- ## How to Get Started
37
-
38
- You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required.
39
-
40
- ```python
41
- # 1. Install Ultralytics
42
- !pip install ultralytics
43
-
44
- from ultralytics import YOLO
45
- from PIL import Image
46
-
47
- # 2. Load the fine-tuned model
48
- # Make sure 'best.pt' is in your current directory
49
- model = YOLO('best.pt')
50
-
51
- # 3. Run inference on an image
52
- image_path = 'path/to/your/comic_page.jpg'
53
- results = model.predict(source=image_path)
54
-
55
- # 4. Process and visualize results
56
- # The 'results' object contains bounding boxes, classes, and confidence scores
57
- for result in results:
58
- # Plotting will draw the bounding boxes on the image
59
- im_array = result.plot()
60
- im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
61
- im.show() # Display the image
62
- # or
63
- # im.save('prediction_result.jpg')
64
-
65
- # You can also access bounding box data directly
66
- for box in results[0].boxes:
67
- print("Class:", model.names[int(box.cls)])
68
- print("Confidence:", box.conf.item())
69
- print("Coordinates (xyxy):", box.xyxy[0].tolist())
70
- print("-" * 20)
71
- ```
72
-
73
- ## Training Procedure
74
-
75
- The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.
76
-
77
- ### Training Hyperparameters
78
- * **Image Size:** 640x640
79
- * **Batch Size:** 16
80
- * **Optimizer:** AdamW (lr=0.002)
81
- * **Epochs:** 200
82
- * **Patience:** 100 epochs for early stopping
83
-
84
- ![Training and Validation Metrics](results.png)
85
-
86
- ## Evaluation
87
-
88
- The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest **mAP50-95**.
89
-
90
- ### Key Performance Metrics
91
- | Metric | Value | Description |
92
- | :---------- | :---- | :--------------------------------------------------- |
93
- | **mAP50** | 0.991 | Mean Average Precision at IoU threshold 0.50. |
94
- | **mAP50-95**| 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. |
95
-
96
- The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.
97
-
98
- ![Confusion Matrix](confusion_matrix.png)
99
-
100
- ### Qualitative Results
101
-
102
- The model correctly identifies panels of various sizes and layouts in the validation set.
103
-
104
- ![Validation Predictions](val_batch0_pred.jpg)
105
-
106
- ## Intended Use and Limitations
107
- This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:
108
- - Creating structured digital reading experiences.
109
- - Extracting text or characters from individual panels.
110
- - Analyzing comic book layouts and artistic styles.
111
-
112
- **The model has been tested in real world applications and has shown promising results.**
113
-
114
- ### Limitations
115
- * **Non-Rectangular Panels:** The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.
116
-
117
- ## Acknowledgements
118
-
119
- * **Ultralytics** for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library.
120
- * **Roboflow:** for their dataset hosting platform and **custom-workflow-3-object-detection-g24r5-fmfkb** for compiling and annotating this incredible dataset.
121
-
122
-
 
 
123
  *This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).*
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: ultralytics
4
+ tags:
5
+ - object-detection
6
+ - yolo
7
+ - yolov12
8
+ - comic-books
9
+ - comic
10
+ - computer-vision
11
+ - ultralytics
12
+ - pytorch
13
+ widget:
14
+ - modelId: mosesb/best-comic-panel-detection
15
+ title: YOLOv12 Comic Panel Detection
16
+ url: https://huggingface.co/mosesb/best-comic-panel-detection/blob/main/prediction.jpg
17
+ datasets:
18
+ - Custom-Object-Detection
19
+ metrics:
20
+ - mAP50
21
+ - mAP50-95
22
+ ---
23
+
24
+ # YOLOv12 for Comic Panel Detection
25
+
26
+ This repository contains a **YOLOv12x** object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.
27
+
28
+ This model was trained in PyTorch using the powerful `ultralytics` library and demonstrates high performance on a custom-annotated dataset of comic pages.
29
+
30
+ *Visit this space to try out the model right now: [`The_Best_Comic_Panel_Detection`](https://huggingface.co/spaces/mosesb/best-comic-panel-detection).*
31
+
32
+ ## Model Details
33
+ * **Architecture:** `YOLOv12x` (the extra-large variant)
34
+ * **Fine-tuned on:** A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
35
+ * **Classes:** `Comic Panel`
36
+ * **Frameworks:** PyTorch, Ultralytics
37
+
38
+ ## How to Get Started
39
+
40
+ You can easily use this model with the `ultralytics` library. The model file `best.pt` from this repository is required.
41
+
42
+ ```python
43
+ # 1. Install Ultralytics
44
+ !pip install ultralytics
45
+
46
+ from ultralytics import YOLO
47
+ from PIL import Image
48
+
49
+ # 2. Load the fine-tuned model
50
+ # Make sure 'best.pt' is in your current directory
51
+ model = YOLO('best.pt')
52
+
53
+ # 3. Run inference on an image
54
+ image_path = 'path/to/your/comic_page.jpg'
55
+ results = model.predict(source=image_path)
56
+
57
+ # 4. Process and visualize results
58
+ # The 'results' object contains bounding boxes, classes, and confidence scores
59
+ for result in results:
60
+ # Plotting will draw the bounding boxes on the image
61
+ im_array = result.plot()
62
+ im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
63
+ im.show() # Display the image
64
+ # or
65
+ # im.save('prediction_result.jpg')
66
+
67
+ # You can also access bounding box data directly
68
+ for box in results[0].boxes:
69
+ print("Class:", model.names[int(box.cls)])
70
+ print("Confidence:", box.conf.item())
71
+ print("Coordinates (xyxy):", box.xyxy[0].tolist())
72
+ print("-" * 20)
73
+ ```
74
+
75
+ ## Training Procedure
76
+
77
+ The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.
78
+
79
+ ### Training Hyperparameters
80
+ * **Image Size:** 640x640
81
+ * **Batch Size:** 16
82
+ * **Optimizer:** AdamW (lr=0.002)
83
+ * **Epochs:** 200
84
+ * **Patience:** 100 epochs for early stopping
85
+
86
+ ![Training and Validation Metrics](results.png)
87
+
88
+ ## Evaluation
89
+
90
+ The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest **mAP50-95**.
91
+
92
+ ### Key Performance Metrics
93
+ | Metric | Value | Description |
94
+ | :---------- | :---- | :--------------------------------------------------- |
95
+ | **mAP50** | 0.991 | Mean Average Precision at IoU threshold 0.50. |
96
+ | **mAP50-95**| 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. |
97
+
98
+ The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.
99
+
100
+ ![Confusion Matrix](confusion_matrix.png)
101
+
102
+ ### Qualitative Results
103
+
104
+ The model correctly identifies panels of various sizes and layouts in the validation set.
105
+
106
+ ![Validation Predictions](val_batch0_pred.jpg)
107
+
108
+ ## Intended Use and Limitations
109
+ This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:
110
+ - Creating structured digital reading experiences.
111
+ - Extracting text or characters from individual panels.
112
+ - Analyzing comic book layouts and artistic styles.
113
+
114
+ **The model has been tested in real world applications and has shown promising results.**
115
+
116
+ ### Limitations
117
+ * **Non-Rectangular Panels:** The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.
118
+
119
+ ## Acknowledgements
120
+
121
+ * **Ultralytics** for the amazing [YOLOv12 model](https://github.com/ultralytics/ultralytics) and library.
122
+ * **Roboflow:** for their dataset hosting platform and **custom-workflow-3-object-detection-g24r5-fmfkb** for compiling and annotating this incredible dataset.
123
+
124
+
125
  *This model card is based on the training notebook [`YOLOV12-Comic-Panel-Detection`](https://github.com/mosesab/YOLOV12-Comic-Panel-Detection).*