NorgesGruppen Grocery Shelf Detection β€” NM i AI 2026

Competition Score: 0.9216 | 3-model YOLOv8x ensemble with Weighted Box Fusion

Built for the NM i AI 2026 Norwegian AI Championship (March 19-22, 2026).

Task

Detect and classify products on grocery store shelf images. 248 training images, 357 product categories.

Scoring: 0.7 Γ— detection_mAP@0.5 + 0.3 Γ— classification_mAP@0.5

Approach

Architecture

  • 3x YOLOv8x models trained with different seeds, resolutions, and optimizers
  • Weighted Box Fusion (WBF) to merge predictions from all 3 models
  • FP16 ONNX export for inference

Training

  • Trained on 100% of data (no validation split) β€” 248 shelf images
  • 100 epochs per model on NVIDIA L4 and A100 GPUs (GCP)
  • Used AIDE (AI-Driven Exploration) for initial hyperparameter search, then switched to direct training

Ensemble Models

Model Seed Resolution Optimizer Local mAP@0.5
model1 (run11) 42 1280px SGD 0.9547
model2 (a100-2) 789 1024px SGD 0.9434
model3 (run6) 456 1024px AdamW 0.9328

Inference (0.9216 config)

  • WBF with iou_thr=0.7, equal model weights
  • conf=0.005, iou=0.5 NMS per model
  • Hard cap at 49,000 predictions (competition limit: 50,000)
  • Runtime: ~54s on NVIDIA L4 GPU

Score Progression

Attempt Score What changed
0.4630 Detection-only baseline
0.7834 Combined detection + classification
0.8710 20 epochs
0.9027 50 epochs
0.9032 100 epochs single model
0.9140 3-model ensemble (95% data)
0.9185 3-model ensemble (100% data)
0.9213 Optimized model combo
0.9216 Tuned NMS iou=0.5, conf=0.005

Key Learnings

  1. 100% data training gave the biggest single improvement (+0.015)
  2. Ensemble diversity > individual model quality β€” different seeds matter more than higher single-model mAP
  3. WBF > simple NMS for merging ensemble predictions
  4. FP16 ONNX halves model size with negligible accuracy loss
  5. More epochs have diminishing returns β€” 100ep good, 200ep overfits on 248 images
  6. RT-DETR failed on this small dataset β€” transformers need more data
  7. Local eval doesn't predict competition deltas β€” small local improvements can go either way
  8. Lower NMS iou (0.5 vs 0.7) keeps more overlapping detections for WBF to merge β€” helped in dense shelf scenes

Files

ensemble/
  model1_run11_s42_1280_SGD.onnx     # Best individual model (FP16)
  model2_a100-2_s789_1024_SGD.onnx   # Second model (FP16)
  model3_run6_s456_1024_AdamW.onnx   # Third model (FP16)
  run.py                              # Inference script with WBF ensemble

scripts/
  train_fulldata.py      # Train YOLOv8x on 100% data
  train_100ep.py         # Direct training script (configurable)
  train_experiment.py    # Experiment runner (augmentation, LR, freeze)
  train_with_products.py # Training with product reference images
  export_fp16.py         # Export to FP16 ONNX
  benchmark.py           # Benchmark ensemble configurations
  patch_aide.py          # AIDE patches for Gemini compatibility
  check_aide_status.py   # VM training status checker

submission-best0.9216.zip  # Ready-to-submit competition zip

Usage

from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion

# Load 3 models
models = [YOLO(f"model{i}.onnx", task="detect") for i in [1, 2, 3]]

# Run on image, merge with WBF
# See ensemble/run.py for full implementation

Infrastructure

  • 14 GPU VMs on GCP (12x NVIDIA L4, 2x NVIDIA A100)
  • Gemini 3.1 Pro for AIDE code generation (free, unlimited)
  • Claude Code for orchestration, monitoring, and submission
  • Total compute: ~69 hours across all VMs

Competition

  • NM i AI 2026 β€” Norwegian Championship in AI
  • Task 1: NorgesGruppen Data β€” Grocery shelf object detection
  • Prize Pool: 1,000,000 NOK
  • Sandbox: Python 3.11, PyTorch 2.6, ultralytics 8.1.0, NVIDIA L4 GPU, 300s timeout
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support