cvtechniques
/

GroceryCheckoutDetection

Model card Files Files and versions

dbagcal0 commited on Mar 16

Commit

4938351

·

verified ·

1 Parent(s): f15ef13

Update README.md

Files changed (1) hide show

README.md +38 -1

README.md CHANGED Viewed

@@ -133,4 +133,41 @@ and recall with a mAP50 of 0.992.
 | seasoner         | 302         | 572       | 0.986     | 0.965  | 0.993 | 0.849    |
 | stationery       | 162         | 300       | 0.986     | 0.957  | 0.972 | 0.785    |
 | tissue           | 482         | 978       | 0.999     | 0.994  | 0.995 | 0.909    |

 | seasoner         | 302         | 572       | 0.986     | 0.965  | 0.993 | 0.849    |
 | stationery       | 162         | 300       | 0.986     | 0.957  | 0.972 | 0.785    |
 | tissue           | 482         | 978       | 0.999     | 0.994  | 0.995 | 0.909    |
+### Visual Examples of Classes
+blah blah do this later
+### Key Visualizations
+#### Confusion Matrix
+![Confusion Matrix](confusion_matrix_normalized.png)
+#### F1 Confidence Curve
+![F1 Curve](F1_curve.png)
+#### Training & Validation Loss Curves
+![Results](results.png)
+### Performance Analysis
+The model performs consistently well across all 17 classes on the validation
+dataset, with the lowest mAP50 being **stationery** at 0.972. The strongest
+performing classes were **tissue** and **puffed_food** (mAP50-95: 0.909, 0.907),
+likely due to their distinct packaging shapes and high training sample
+counts. The weakest performing class was **stationery**
+(mAP50: 0.972, mAP50-95: 0.785), which is also the
+smallest class at 1,466 training images, suggesting performance
+is partially limited by sample size.
+## Limitations and Biases
+When tested on the **D2S dataset** (wild images),
+performance dropped significantly. The model missed entire
+objects, produced low-confidence detections, and misclassified items.
+For example, it labeled a water bottle as `instant_noodles`. This
+suggests the model may have overfit to the specific visual patterns
+of the training data, or alternatively reflects a domain gap between
+Asian grocery packaging (training data) and the European products in D2S.
+Both explanations are plausible and further testing on diverse datasets
+would be needed to distinguish between them.