cardcaptor-v3 / README.md
AlecKarfonta's picture
Upload README.md with huggingface_hub
6b71ddb verified
---
language: en
license: mit
tags:
- object-detection
- oriented-bounding-box
- yolo
- trading-cards
- webgpu-ready
- onnx
metrics:
- mAP50
- mAP50-95
---
# Trading Card Detector Β· YOLO11m OBB (v3)
![preview](assets/validation_predictions.png)
## πŸš€ Summary
- **Architecture:** YOLOv11m with oriented bounding box (OBB) head (7-value outputs: `cx, cy, w, h, conf, class, angle`)
- **Input resolution:** 1088 Γ— 1088
- **Targets:** Single class (`trading_card`) with arbitrary rotations
- **Exports:** PyTorch checkpoint + ONNX (built-in NMS, WebGPU/WebAssembly ready)
- **Training script:** `training_app/src/training/train_yolo_obb.py`
## πŸ“¦ Artifacts
| File | Description |
|------|-------------|
| `weights/cardcaptor_v3_best.pt` | Full precision PyTorch checkpoint (best epoch) |
| `onnx/cardcaptor_v3_best.onnx` | Optimized ONNX export (80 MB) with built-in NMS |
## πŸ“š Dataset
- **Name:** `merged_dataset_50x3`
- **Composition:** ~60K synthetic renders + ~220 handcrafted captures Γ— deterministic 50Γ— photometric multiplier (~11K effective hand-crafted samples)
- **Validation:** 55 hand-crafted frames (210 labeled cards) with 90% handcrafted ratio to match real-world deployment
- **Label format:** YOLO OBB (four polygon corners per box)
## 🎨 Augmentation Highlights
- Deterministic photometric pre-augmentation for hand-crafted oversampling (brightness/contrast/gamma/noise/blur/HSV shifts)
- YOLO runtime augments: Β±60Β° rotation, Β±0.15 translation, Β±0.6 scale, Β±20Β° shear, perspective jitter, flipud 0.3, fliplr 0.5
- Composition augments: mosaic (1.0), mixup (0.1), copy-paste (0.15), RandAugment, random erasing (0.4)
## πŸ‹οΈ Training Recipe
- **Batch size:** 8
- **Epochs:** 50 (early stop patience 5)
- **Optimizer:** Ultralytics default (SGD w/ warmup)
- **Loss head:** OBB-specific DFL + cls losses
- **Validation cadence:** Every 5 epochs with periodic visualization dumps
- **Hardware:** NVIDIA RTX 5090, CUDA 12.8
## πŸ“Š Evaluation (best epoch)
| Metric | Value |
|--------|-------|
| mAP@50 | **0.983** |
| mAP@50-95 | **0.963** |
| Precision | 0.943 |
| Recall | 0.986 |
| Validation set | 55 hand-crafted photos / 210 cards |
## πŸ” Example Predictions
| Example 1 | Example 2 |
|-----------|-----------|
| ![Example 1](assets/examples/ex_1.jpg) | ![Example 2](assets/examples/ex_2.jpg) |
| Example 3 |
|-----------|
| ![Example 3](assets/examples/ex_3.jpg) |
## πŸ§ͺ Inference
```python
from ultralytics import YOLO
model = YOLO("weights/cardcaptor_v3_best.pt")
results = model("card_photo.jpg", conf=0.25)
```
ONNX (WebGPU / WebAssembly) example:
```python
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("onnx/cardcaptor_v3_best.onnx", providers=["CPUExecutionProvider"])
input_name = session.get_inputs()[0].name
outputs = session.run(None, {input_name: image_tensor})
```
## βœ… Responsible Use
This model only predicts bounding boxes for trading cards. It does not read card text or assess authenticity.
## πŸ“Ž Resources
- Training script reference: `training_app/src/training/train_yolo_obb.py`
- Detailed training log & analyzer outputs stored under `trading_cards_obb/yolo11m_obb_merged_50x3/`
- Claude design doc: https://claude.ai/share/2886b94f-64a3-4b46-a42c-ba308f106902
---
Questions or improvements? Please open an issue or PR!