punnerud
/

ainm-object-detection

Object Detection

Model card Files Files and versions

ainm-object-detection / README.md

punnerud's picture

Upload README.md with huggingface_hub

991fed2 verified 12 days ago

|

history blame contribute delete

1.95 kB

	---
	license: mit
	tags:
	- object-detection
	- yolov8
	- grocery
	- retail
	- onnx
	datasets:
	- custom
	pipeline_tag: object-detection
	---

	# NM i AI 2026 — NorgesGruppen Object Detection

	Multi-class YOLOv8x detector for 356 grocery product categories on store shelf images.

	## Performance

	\| Method \| Leaderboard Score \|
	\|--------\|------------------\|
	\| Multi-scale TTA (640+960+1280 + flip) \| 0.9230 \|
	\| Single inference \| 0.8922 \|

	Competition scoring:

	## Model Details

	- Architecture: YOLOv8x (68.5M parameters)
	- Classes: 356 grocery product categories
	- Training data: 248 shelf images, 22,731 COCO annotations
	- Training resolution: 1280px
	- Export format: ONNX (dynamic input, 262 MB)
	- Inference: Multi-scale TTA at 640/960/1280px with horizontal flip + WBF fusion

	## Training

	- Pretrained on COCO (YOLOv8x), fine-tuned on competition data
	- Optimizer: AdamW (lr=0.01, weight_decay=0.0005, cosine LR)
	- Augmentation: mosaic, mixup (0.2), copy-paste (0.15), perspective, rotation (±15°)
	- 300 epochs at 1280px, batch=2 on NVIDIA A100 40GB
	- Model soup: weight averaging of epochs 240-290 for better generalization

	## Submission Contents

	contains:
	- — YOLOv8x model soup, dynamic input (262 MB)
	- — YOLO class → COCO category_id mapping
	- — Multi-scale TTA inference pipeline

	## Usage



	## Sandbox Environment

	- GPU: NVIDIA L4, 24 GB VRAM
	- Runtime: ~113s for test set (300s timeout)
	- Dependencies: onnxruntime-gpu, opencv, numpy, ensemble-boxes

	## Key Learnings

	1. Multi-class YOLO (detect + classify in one step) massively outperformed two-stage (detector + kNN classifier)
	2. Multi-scale TTA gave +0.031 improvement by better detecting small products
	3. Model soup (weight averaging) improves generalization
	4. Higher validation mAP does NOT predict better leaderboard score when training on all data
	5. Dynamic ONNX export required for multi-scale inference

	## License

	MIT