NorgesGruppen Grocery Shelf Detection — NM i AI 2026

Competition Score: 0.9216 | 3-model YOLOv8x ensemble with Weighted Box Fusion

Built for the NM i AI 2026 Norwegian AI Championship (March 19-22, 2026).

Task

Detect and classify products on grocery store shelf images. 248 training images, 357 product categories.

Scoring: 0.7 × detection_mAP@0.5 + 0.3 × classification_mAP@0.5

Approach

Architecture

3x YOLOv8x models trained with different seeds, resolutions, and optimizers
Weighted Box Fusion (WBF) to merge predictions from all 3 models
FP16 ONNX export for inference

Training

Trained on 100% of data (no validation split) — 248 shelf images
100 epochs per model on NVIDIA L4 and A100 GPUs (GCP)
Used AIDE (AI-Driven Exploration) for initial hyperparameter search, then switched to direct training

Ensemble Models

Model	Seed	Resolution	Optimizer	Local mAP@0.5
model1 (run11)	42	1280px	SGD	0.9547
model2 (a100-2)	789	1024px	SGD	0.9434
model3 (run6)	456	1024px	AdamW	0.9328

Inference (0.9216 config)

WBF with iou_thr=0.7, equal model weights
conf=0.005, iou=0.5 NMS per model
Hard cap at 49,000 predictions (competition limit: 50,000)
Runtime: ~54s on NVIDIA L4 GPU

Score Progression

Attempt	Score	What changed
0.4630	Detection-only baseline
0.7834	Combined detection + classification
0.8710	20 epochs
0.9027	50 epochs
0.9032	100 epochs single model
0.9140	3-model ensemble (95% data)
0.9185	3-model ensemble (100% data)
0.9213	Optimized model combo
0.9216	Tuned NMS iou=0.5, conf=0.005

Key Learnings

100% data training gave the biggest single improvement (+0.015)
Ensemble diversity > individual model quality — different seeds matter more than higher single-model mAP
WBF > simple NMS for merging ensemble predictions
FP16 ONNX halves model size with negligible accuracy loss
More epochs have diminishing returns — 100ep good, 200ep overfits on 248 images
RT-DETR failed on this small dataset — transformers need more data
Local eval doesn't predict competition deltas — small local improvements can go either way
Lower NMS iou (0.5 vs 0.7) keeps more overlapping detections for WBF to merge — helped in dense shelf scenes

Files

ensemble/
  model1_run11_s42_1280_SGD.onnx     # Best individual model (FP16)
  model2_a100-2_s789_1024_SGD.onnx   # Second model (FP16)
  model3_run6_s456_1024_AdamW.onnx   # Third model (FP16)
  run.py                              # Inference script with WBF ensemble

scripts/
  train_fulldata.py      # Train YOLOv8x on 100% data
  train_100ep.py         # Direct training script (configurable)
  train_experiment.py    # Experiment runner (augmentation, LR, freeze)
  train_with_products.py # Training with product reference images
  export_fp16.py         # Export to FP16 ONNX
  benchmark.py           # Benchmark ensemble configurations
  patch_aide.py          # AIDE patches for Gemini compatibility
  check_aide_status.py   # VM training status checker

submission-best0.9216.zip  # Ready-to-submit competition zip

Usage

from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion

# Load 3 models
models = [YOLO(f"model{i}.onnx", task="detect") for i in [1, 2, 3]]

# Run on image, merge with WBF
# See ensemble/run.py for full implementation

Infrastructure

14 GPU VMs on GCP (12x NVIDIA L4, 2x NVIDIA A100)
Gemini 3.1 Pro for AIDE code generation (free, unlimited)
Claude Code for orchestration, monitoring, and submission
Total compute: ~69 hours across all VMs

Competition

NM i AI 2026 — Norwegian Championship in AI
Task 1: NorgesGruppen Data — Grocery shelf object detection
Prize Pool: 1,000,000 NOK
Sandbox: Python 3.11, PyTorch 2.6, ultralytics 8.1.0, NVIDIA L4 GPU, 300s timeout

Downloads last month: -; Downloads are not tracked for this model. How to track