NorgesGruppen Grocery Shelf Detection β NM i AI 2026
Competition Score: 0.9216 | 3-model YOLOv8x ensemble with Weighted Box Fusion
Built for the NM i AI 2026 Norwegian AI Championship (March 19-22, 2026).
Task
Detect and classify products on grocery store shelf images. 248 training images, 357 product categories.
Scoring: 0.7 Γ detection_mAP@0.5 + 0.3 Γ classification_mAP@0.5
Approach
Architecture
- 3x YOLOv8x models trained with different seeds, resolutions, and optimizers
- Weighted Box Fusion (WBF) to merge predictions from all 3 models
- FP16 ONNX export for inference
Training
- Trained on 100% of data (no validation split) β 248 shelf images
- 100 epochs per model on NVIDIA L4 and A100 GPUs (GCP)
- Used AIDE (AI-Driven Exploration) for initial hyperparameter search, then switched to direct training
Ensemble Models
| Model | Seed | Resolution | Optimizer | Local mAP@0.5 |
|---|---|---|---|---|
| model1 (run11) | 42 | 1280px | SGD | 0.9547 |
| model2 (a100-2) | 789 | 1024px | SGD | 0.9434 |
| model3 (run6) | 456 | 1024px | AdamW | 0.9328 |
Inference (0.9216 config)
- WBF with
iou_thr=0.7, equal model weights conf=0.005,iou=0.5NMS per model- Hard cap at 49,000 predictions (competition limit: 50,000)
- Runtime: ~54s on NVIDIA L4 GPU
Score Progression
| Attempt | Score | What changed |
|---|---|---|
| 0.4630 | Detection-only baseline | |
| 0.7834 | Combined detection + classification | |
| 0.8710 | 20 epochs | |
| 0.9027 | 50 epochs | |
| 0.9032 | 100 epochs single model | |
| 0.9140 | 3-model ensemble (95% data) | |
| 0.9185 | 3-model ensemble (100% data) | |
| 0.9213 | Optimized model combo | |
| 0.9216 | Tuned NMS iou=0.5, conf=0.005 |
Key Learnings
- 100% data training gave the biggest single improvement (+0.015)
- Ensemble diversity > individual model quality β different seeds matter more than higher single-model mAP
- WBF > simple NMS for merging ensemble predictions
- FP16 ONNX halves model size with negligible accuracy loss
- More epochs have diminishing returns β 100ep good, 200ep overfits on 248 images
- RT-DETR failed on this small dataset β transformers need more data
- Local eval doesn't predict competition deltas β small local improvements can go either way
- Lower NMS iou (0.5 vs 0.7) keeps more overlapping detections for WBF to merge β helped in dense shelf scenes
Files
ensemble/
model1_run11_s42_1280_SGD.onnx # Best individual model (FP16)
model2_a100-2_s789_1024_SGD.onnx # Second model (FP16)
model3_run6_s456_1024_AdamW.onnx # Third model (FP16)
run.py # Inference script with WBF ensemble
scripts/
train_fulldata.py # Train YOLOv8x on 100% data
train_100ep.py # Direct training script (configurable)
train_experiment.py # Experiment runner (augmentation, LR, freeze)
train_with_products.py # Training with product reference images
export_fp16.py # Export to FP16 ONNX
benchmark.py # Benchmark ensemble configurations
patch_aide.py # AIDE patches for Gemini compatibility
check_aide_status.py # VM training status checker
submission-best0.9216.zip # Ready-to-submit competition zip
Usage
from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion
# Load 3 models
models = [YOLO(f"model{i}.onnx", task="detect") for i in [1, 2, 3]]
# Run on image, merge with WBF
# See ensemble/run.py for full implementation
Infrastructure
- 14 GPU VMs on GCP (12x NVIDIA L4, 2x NVIDIA A100)
- Gemini 3.1 Pro for AIDE code generation (free, unlimited)
- Claude Code for orchestration, monitoring, and submission
- Total compute: ~69 hours across all VMs
Competition
- NM i AI 2026 β Norwegian Championship in AI
- Task 1: NorgesGruppen Data β Grocery shelf object detection
- Prize Pool: 1,000,000 NOK
- Sandbox: Python 3.11, PyTorch 2.6, ultralytics 8.1.0, NVIDIA L4 GPU, 300s timeout