dbagcal0 commited on
Commit
4938351
·
verified ·
1 Parent(s): f15ef13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -1
README.md CHANGED
@@ -133,4 +133,41 @@ and recall with a mAP50 of 0.992.
133
  | seasoner | 302 | 572 | 0.986 | 0.965 | 0.993 | 0.849 |
134
  | stationery | 162 | 300 | 0.986 | 0.957 | 0.972 | 0.785 |
135
  | tissue | 482 | 978 | 0.999 | 0.994 | 0.995 | 0.909 |
136
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  | seasoner | 302 | 572 | 0.986 | 0.965 | 0.993 | 0.849 |
134
  | stationery | 162 | 300 | 0.986 | 0.957 | 0.972 | 0.785 |
135
  | tissue | 482 | 978 | 0.999 | 0.994 | 0.995 | 0.909 |
136
+
137
+ ### Visual Examples of Classes
138
+
139
+ blah blah do this later
140
+
141
+ ### Key Visualizations
142
+
143
+ #### Confusion Matrix
144
+ ![Confusion Matrix](confusion_matrix_normalized.png)
145
+
146
+ #### F1 Confidence Curve
147
+ ![F1 Curve](F1_curve.png)
148
+
149
+ #### Training & Validation Loss Curves
150
+ ![Results](results.png)
151
+
152
+ ### Performance Analysis
153
+
154
+ The model performs consistently well across all 17 classes on the validation
155
+ dataset, with the lowest mAP50 being **stationery** at 0.972. The strongest
156
+ performing classes were **tissue** and **puffed_food** (mAP50-95: 0.909, 0.907),
157
+ likely due to their distinct packaging shapes and high training sample
158
+ counts. The weakest performing class was **stationery**
159
+ (mAP50: 0.972, mAP50-95: 0.785), which is also the
160
+ smallest class at 1,466 training images, suggesting performance
161
+ is partially limited by sample size.
162
+
163
+ ## Limitations and Biases
164
+
165
+ When tested on the **D2S dataset** (wild images),
166
+ performance dropped significantly. The model missed entire
167
+ objects, produced low-confidence detections, and misclassified items.
168
+ For example, it labeled a water bottle as `instant_noodles`. This
169
+ suggests the model may have overfit to the specific visual patterns
170
+ of the training data, or alternatively reflects a domain gap between
171
+ Asian grocery packaging (training data) and the European products in D2S.
172
+ Both explanations are plausible and further testing on diverse datasets
173
+ would be needed to distinguish between them.