| --- |
| license: mit |
| tags: |
| - object-detection |
| - yolov8 |
| - grocery |
| - retail |
| - onnx |
| datasets: |
| - custom |
| pipeline_tag: object-detection |
| --- |
| |
| # NM i AI 2026 — NorgesGruppen Object Detection |
|
|
| Multi-class YOLOv8x detector for 356 grocery product categories on store shelf images. |
|
|
| ## Performance |
|
|
| | Method | Leaderboard Score | |
| |--------|------------------| |
| | Multi-scale TTA (640+960+1280 + flip) | **0.9230** | |
| | Single inference | 0.8922 | |
|
|
| Competition scoring: |
|
|
| ## Model Details |
|
|
| - **Architecture:** YOLOv8x (68.5M parameters) |
| - **Classes:** 356 grocery product categories |
| - **Training data:** 248 shelf images, 22,731 COCO annotations |
| - **Training resolution:** 1280px |
| - **Export format:** ONNX (dynamic input, 262 MB) |
| - **Inference:** Multi-scale TTA at 640/960/1280px with horizontal flip + WBF fusion |
|
|
| ## Training |
|
|
| - Pretrained on COCO (YOLOv8x), fine-tuned on competition data |
| - Optimizer: AdamW (lr=0.01, weight_decay=0.0005, cosine LR) |
| - Augmentation: mosaic, mixup (0.2), copy-paste (0.15), perspective, rotation (±15°) |
| - 300 epochs at 1280px, batch=2 on NVIDIA A100 40GB |
| - Model soup: weight averaging of epochs 240-290 for better generalization |
| |
| ## Submission Contents |
| |
| contains: |
| - — YOLOv8x model soup, dynamic input (262 MB) |
| - — YOLO class → COCO category_id mapping |
| - — Multi-scale TTA inference pipeline |
|
|
| ## Usage |
|
|
|
|
|
|
| ## Sandbox Environment |
|
|
| - GPU: NVIDIA L4, 24 GB VRAM |
| - Runtime: ~113s for test set (300s timeout) |
| - Dependencies: onnxruntime-gpu, opencv, numpy, ensemble-boxes |
|
|
| ## Key Learnings |
|
|
| 1. Multi-class YOLO (detect + classify in one step) massively outperformed two-stage (detector + kNN classifier) |
| 2. Multi-scale TTA gave +0.031 improvement by better detecting small products |
| 3. Model soup (weight averaging) improves generalization |
| 4. Higher validation mAP does NOT predict better leaderboard score when training on all data |
| 5. Dynamic ONNX export required for multi-scale inference |
|
|
| ## License |
|
|
| MIT |
|
|