AlecKarfonta
/

cardcaptor-v3

Object Detection

oriented-bounding-box

Model card Files Files and versions

cardcaptor-v3 / README.md

AlecKarfonta's picture

Upload README.md with huggingface_hub

6b71ddb verified about 2 months ago

|

history blame contribute delete

3.35 kB

	---
	language: en
	license: mit
	tags:
	- object-detection
	- oriented-bounding-box
	- yolo
	- trading-cards
	- webgpu-ready
	- onnx
	metrics:
	- mAP50
	- mAP50-95
	---

	# Trading Card Detector · YOLO11m OBB (v3)

	![preview](assets/validation_predictions.png)

	## 🚀 Summary
	- Architecture: YOLOv11m with oriented bounding box (OBB) head (7-value outputs: `cx, cy, w, h, conf, class, angle`)
	- Input resolution: 1088 × 1088
	- Targets: Single class (`trading_card`) with arbitrary rotations
	- Exports: PyTorch checkpoint + ONNX (built-in NMS, WebGPU/WebAssembly ready)
	- Training script: `training_app/src/training/train_yolo_obb.py`

	## 📦 Artifacts
	\| File \| Description \|
	\|------\|-------------\|
	\| `weights/cardcaptor_v3_best.pt` \| Full precision PyTorch checkpoint (best epoch) \|
	\| `onnx/cardcaptor_v3_best.onnx` \| Optimized ONNX export (80 MB) with built-in NMS \|

	## 📚 Dataset
	- Name: `merged_dataset_50x3`
	- Composition: ~60K synthetic renders + ~220 handcrafted captures × deterministic 50× photometric multiplier (~11K effective hand-crafted samples)
	- Validation: 55 hand-crafted frames (210 labeled cards) with 90% handcrafted ratio to match real-world deployment
	- Label format: YOLO OBB (four polygon corners per box)

	## 🎨 Augmentation Highlights
	- Deterministic photometric pre-augmentation for hand-crafted oversampling (brightness/contrast/gamma/noise/blur/HSV shifts)
	- YOLO runtime augments: ±60° rotation, ±0.15 translation, ±0.6 scale, ±20° shear, perspective jitter, flipud 0.3, fliplr 0.5
	- Composition augments: mosaic (1.0), mixup (0.1), copy-paste (0.15), RandAugment, random erasing (0.4)

	## 🏋️ Training Recipe
	- Batch size: 8
	- Epochs: 50 (early stop patience 5)
	- Optimizer: Ultralytics default (SGD w/ warmup)
	- Loss head: OBB-specific DFL + cls losses
	- Validation cadence: Every 5 epochs with periodic visualization dumps
	- Hardware: NVIDIA RTX 5090, CUDA 12.8

	## 📊 Evaluation (best epoch)
	\| Metric \| Value \|
	\|--------\|-------\|
	\| mAP@50 \| 0.983 \|
	\| mAP@50-95 \| 0.963 \|
	\| Precision \| 0.943 \|
	\| Recall \| 0.986 \|
	\| Validation set \| 55 hand-crafted photos / 210 cards \|

	## 🔍 Example Predictions
	\| Example 1 \| Example 2 \|
	\|-----------\|-----------\|
	\| ![Example 1](assets/examples/ex_1.jpg) \| ![Example 2](assets/examples/ex_2.jpg) \|

	\| Example 3 \|
	\|-----------\|
	\| ![Example 3](assets/examples/ex_3.jpg) \|

	## 🧪 Inference
	```python
	from ultralytics import YOLO

	model = YOLO("weights/cardcaptor_v3_best.pt")
	results = model("card_photo.jpg", conf=0.25)
	```

	ONNX (WebGPU / WebAssembly) example:
	```python
	import onnxruntime as ort
	import numpy as np

	session = ort.InferenceSession("onnx/cardcaptor_v3_best.onnx", providers=["CPUExecutionProvider"])
	input_name = session.get_inputs()[0].name
	outputs = session.run(None, {input_name: image_tensor})
	```

	## ✅ Responsible Use
	This model only predicts bounding boxes for trading cards. It does not read card text or assess authenticity.

	## 📎 Resources
	- Training script reference: `training_app/src/training/train_yolo_obb.py`
	- Detailed training log & analyzer outputs stored under `trading_cards_obb/yolo11m_obb_merged_50x3/`
	- Claude design doc: https://claude.ai/share/2886b94f-64a3-4b46-a42c-ba308f106902

	---
	Questions or improvements? Please open an issue or PR!