anirudhvyas commited on
Commit
da0e658
·
verified ·
1 Parent(s): 188eda8

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +64 -119
README.md CHANGED
@@ -1,152 +1,97 @@
1
  ---
2
  license: other
3
- license_name: bsl-1.1
4
- license_link: https://mariadb.com/bsl11/
5
- library_name: ultralytics
6
  tags:
7
- - onnx
8
  - yolo
9
- - yolov8
10
  - object-detection
11
  - whiteboard
12
  - diagram
13
- - shapes
14
- pipeline_tag: object-detection
 
15
  ---
16
 
17
- # Whiteboard Detector
18
-
19
- **Detects hand-drawn shapes on whiteboards.**
20
 
21
- YOLOv8-nano fine-tuned to recognize 30 diagram shape classes.
22
 
23
- ## Quick Stats
24
 
25
- | Spec | Value |
26
- |------|-------|
27
- | Architecture | YOLOv8-nano |
28
- | Format | ONNX |
29
- | Size | ~12 MB |
30
- | Input | 640×640 RGB |
31
- | Classes | 30 |
32
- | Training | 100 epochs, 211 images |
33
- | Hardware | M3 Max, 1.4 hours |
34
-
35
- ## Classes (30)
36
 
37
- ```
38
- rectangle, rounded_rectangle, oval, circle, diamond, hexagon,
39
- parallelogram, triangle, star, cloud, cylinder, stick_figure,
40
- arrow_box, document_shape, database_icon, square, ellipse,
41
- pentagon, cross, heart, lightning, banner, callout, bracket,
42
- solid_arrow, dashed_arrow, bidirectional_arrow, dotted_line,
43
- curved_arrow, curved_line
44
- ```
45
-
46
- ## Usage
47
 
48
- ### Python (ultralytics)
 
 
 
49
 
50
- ```python
51
- from ultralytics import YOLO
52
 
53
- model = YOLO("best.onnx")
54
- results = model("whiteboard.jpg")
 
 
 
 
 
 
 
 
 
 
55
 
56
- for box in results[0].boxes:
57
- cls = int(box.cls[0])
58
- conf = float(box.conf[0])
59
- x1, y1, x2, y2 = box.xyxy[0].tolist()
60
- print(f"{model.names[cls]}: {conf:.2f} at ({x1:.0f}, {y1:.0f})")
 
 
 
61
  ```
62
 
63
- ### Python (onnxruntime)
64
-
65
  ```python
66
  import onnxruntime as ort
 
67
  import numpy as np
68
- from PIL import Image
69
 
70
  # Load model
71
  session = ort.InferenceSession("best.onnx")
72
 
73
- # Preprocess
74
- img = Image.open("whiteboard.jpg").resize((640, 640))
75
- input_tensor = np.array(img).transpose(2, 0, 1).astype(np.float32) / 255.0
76
- input_tensor = input_tensor[np.newaxis, ...]
77
-
78
- # Inference
79
- outputs = session.run(None, {"images": input_tensor})
80
-
81
- # outputs[0] shape: [1, 34, 8400]
82
- # 34 = 4 (xywh) + 30 (class scores)
83
- # 8400 = detection candidates
 
 
 
 
 
 
 
 
 
84
  ```
85
 
86
- ### CLI (ultralytics)
87
-
88
- ```bash
89
- yolo predict model=best.onnx source=whiteboard.jpg
90
- ```
91
-
92
- ## Output Format
93
-
94
- YOLO outputs tensor `[1, 34, 8400]`:
95
-
96
- ```
97
- For each of 8400 candidates:
98
- [0] x_center (0-640)
99
- [1] y_center (0-640)
100
- [2] width
101
- [3] height
102
- [4-33] confidence per class (30 classes)
103
- ```
104
-
105
- Post-process with confidence threshold (0.25) and NMS (0.45 IoU).
106
-
107
- ## Training Performance
108
-
109
- | Class | mAP50 | Notes |
110
- |-------|-------|-------|
111
- | cloud | 0.993 | Excellent |
112
- | rounded_rectangle | 0.995 | Excellent |
113
- | stick_figure | 0.895 | Good |
114
- | oval | 0.849 | Good |
115
- | rectangle | 0.716 | Good |
116
- | text_label | 0.664 | Fair |
117
- | solid_arrow | 0.368 | Needs more data |
118
- | triangle | 0.316 | Needs more data |
119
- | cylinder | 0.045 | Needs more data |
120
-
121
  ## Files
122
 
123
- ```
124
- whiteboard-detector/
125
- ├── best.onnx # Model (use this)
126
- ├── best.pt # PyTorch weights
127
- ├── classes.txt # Class names
128
- ├── README.md # This file
129
- └── SKILL.md # Manifest
130
- ```
131
-
132
- ## Training Data
133
-
134
- - 211 annotated whiteboard images
135
- - Hand-drawn diagrams, varying styles
136
- - Augmentation: rotation, blur, noise
137
-
138
- ## Limitations
139
-
140
- - Best with clear contrast (dark ink on white)
141
- - Small shapes (<20px) may be missed
142
- - Overlapping shapes can confuse detection
143
- - Some classes undertrained (cylinder, triangle)
144
 
145
  ## License
146
 
147
- **Business Source License 1.1 (BSL-1.1)**
148
-
149
- Copyright (c) 2024 Block Xaero Inc.
150
-
151
- - ✅ Free for non-production use
152
- - ⚠️ Production use requires license
 
1
  ---
2
  license: other
3
+ license_name: business-source-license
4
+ license_link: LICENSE
 
5
  tags:
 
6
  - yolo
 
7
  - object-detection
8
  - whiteboard
9
  - diagram
10
+ - flowchart
11
+ - onnx
12
+ library_name: onnxruntime
13
  ---
14
 
15
+ # Cyan Sketch - Whiteboard Shape Detector
 
 
16
 
17
+ YOLOv8n model for detecting shapes and connectors in whiteboard/flowchart images.
18
 
19
+ ## Model Details
20
 
21
+ - **Architecture**: YOLOv8n (nano)
22
+ - **Format**: ONNX
23
+ - **Input Size**: 640x640
24
+ - **Classes**: 30 shape types
 
 
 
 
 
 
 
25
 
26
+ ## Performance
 
 
 
 
 
 
 
 
 
27
 
28
+ | Metric | Value |
29
+ |--------|-------|
30
+ | mAP50 | 0.592 |
31
+ | mAP50-95 | 0.339 |
32
 
33
+ ### Per-Class Performance (Top 10)
 
34
 
35
+ | Class | mAP50 |
36
+ |-------|-------|
37
+ | rounded_rectangle | 0.995 |
38
+ | stick_figure | 0.995 |
39
+ | cloud | 0.980 |
40
+ | rectangle | 0.857 |
41
+ | sticky_note | 0.857 |
42
+ | cylinder | 0.823 |
43
+ | text_label | 0.774 |
44
+ | circle | 0.738 |
45
+ | oval | 0.735 |
46
+ | diamond | 0.713 |
47
 
48
+ ## Classes (30)
49
+ ```
50
+ rectangle, rounded_rectangle, oval, circle, diamond, triangle,
51
+ cylinder, cloud, hexagon, parallelogram, sticky_note, stick_figure,
52
+ solid_arrow, dashed_arrow, bidirectional_arrow, line, curved_arrow,
53
+ start_dot, end_dot, text_label, ellipse, square,
54
+ curved_bidirectional_arrow, dashed_line, dotted_line, dotted_arrow,
55
+ solid_circle, double_solid_line, dashed_oval, curved_line
56
  ```
57
 
58
+ ## Usage
 
59
  ```python
60
  import onnxruntime as ort
61
+ import cv2
62
  import numpy as np
 
63
 
64
  # Load model
65
  session = ort.InferenceSession("best.onnx")
66
 
67
+ # Load classes
68
+ with open("classes.txt") as f:
69
+ classes = [l.strip() for l in f]
70
+
71
+ # Preprocess image
72
+ img = cv2.imread("whiteboard.jpg")
73
+ resized = cv2.resize(img, (640, 640))
74
+ blob = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
75
+ blob = np.transpose(blob, (2, 0, 1))[None, ...]
76
+
77
+ # Run inference
78
+ outputs = session.run(None, {"images": blob})[0]
79
+
80
+ # Parse detections (conf > 0.3)
81
+ for i in range(outputs.shape[2]):
82
+ scores = outputs[0, 4:, i]
83
+ class_id = np.argmax(scores)
84
+ conf = scores[class_id]
85
+ if conf > 0.3:
86
+ print(f"{classes[class_id]}: {conf:.2f}")
87
  ```
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  ## Files
90
 
91
+ - `best.onnx` - ONNX model (6MB)
92
+ - `classes.txt` - Class names
93
+ - `ocr_dictionary.json` - Domain terms for OCR correction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  ## License
96
 
97
+ Business Source License - See LICENSE file