Release Notes β MTG Card Pose Estimation: Round 4b
Date: 2026-05-19
Training run: runs/run_20260519_012809
Published artifacts:
runs/run_20260519_012809/β training checkpoints + config + logsexports/mtg_4kp_s_r4b_fp16.onnxβ production inference model (WebGPU / web demo)(recomended)exports/mtg_4kp_s.onnxβ baseline FP32 reference model (r4a, CPU-compatible)
Resources
Pose Model: https://huggingface.co/dhvazquez/mtg_card_pose_estimation
Pose Train: https://github.com/diegovazquez/mtg_train_card_pose_estimation
Embedding Model:https://huggingface.co/dhvazquez/mtg_card_identification_embeddings
Embedding Train: https://github.com/diegovazquez/mtg_train_card_identification_embeddings
Dataset: https://huggingface.co/datasets/dhvazquez/mtg_synthetic_large_dataset
Dataset Generator: https://github.com/diegovazquez/mtg_synthetic_dataset_generator
Model Overview
Task: Single-class MTG card corner detection β 4 keypoints (TL, TR, BR, BL) per card.
Architecture: DETRPose-S with HGNetV2-B0 backbone (fork: SebastianJanampa/DETRPose, branch mtg-4kp).
Input: 640Γ640 RGB, normalized to [0, 1] (mean=0, std=1 β no ImageNet normalization).
Output: (scores, boxes, keypoints) β boxes derived from keypoints, no dedicated bbox head.
Architecture Changes vs Upstream DETRPose-S
Round 4b shrinks the transformer 3β4Γ relative to upstream, targeting single-card inference latency on mobile WebGPU.
| Parameter | Upstream DETRPose-S | Round 4a | Round 4b (this release) |
|---|---|---|---|
hidden_dim |
256 | 256 | 128 |
dim_feedforward |
1024 | 1024 | 512 |
nhead |
8 | 8 | 4 |
num_decoder_layers |
6 | 3 | 1 |
num_queries |
60 | 10 | 4 |
dec_n_points |
4 | 2 | 2 |
| Total parameters | ~11.4 M | ~8 M | 3.6 M |
Backbone (HGNetV2-B0) is unchanged. The transformer accounts for the remaining parameter budget; backbone dominates latency on low-end hardware.
Training Configuration
| Setting | Value |
|---|---|
| Config | detrpose/configs/mtg_card_4kp.py (r4b overrides) |
| Epochs configured | 150 |
| Epochs completed | 2 (saturated β see Results) |
| Batch size | 64 (32/GPU Γ 2Γ RTX 3090) |
| LR (head / backbone) | 2e-4 / 2e-5 (β-scaled from bs=16 spec) |
| Optimizer | AdamW, weight_decay=1e-4, grad_clip=0.1 |
| AMP | FP16 |
Val cap (max_eval_samples) |
10,000 (deterministic subset, seed=42) |
| OKS sigmas | [0.025, 0.025, 0.025, 0.025] (uniform 4-corner) |
| Augmentation onset | Mosaic @ epoch 5, HSVJitter, ColorJitter, horizontal flip with corner swap |
| Flip pairs | [[0,1],[2,3]] β TLβTR, BLβBR |
| Normalization | mean=[0,0,0], std=[1,1,1] |
Results
Training converged in 1 epoch. Epoch 1 checkpoint saved as checkpoint_best_regular.pth.
| Epoch | Train Loss | AP | AP@0.5 | Mean L2 (normalized) |
|---|---|---|---|---|
| 0 | 12.54 | 0.9901 | 0.9999 | 0.000816 |
| 1 | 2.99 | 0.9990 | 1.0000 | 0.000673 |
Per-corner L2 at epoch 1 (best):
| Corner | L2 (normalized) |
|---|---|
| TL | 0.000650 |
| TR | 0.000677 |
| BR | 0.000681 |
| BL | 0.000685 |
Epoch time: ~2 h 17 min/epoch on 2Γ RTX 3090. Peak VRAM: 11.2 GB/GPU.
Published Artifacts
exports/mtg_4kp_s_r4b_fp16.onnx β Production model (WebGPU)
| Property | Value |
|---|---|
| Source checkpoint | runs/run_20260519_012809/checkpoint_best_regular.pth (epoch 1) |
| Format | ONNX opset 17, FP16 weights |
| Size | 7.2 MB |
| Target runtime | ORT Web β₯ 1.26, WebGPU EP (primary), WASM fallback |
| Web demo symlink | web/model/mtg_4kp_s_fp16.onnx β exports/mtg_4kp_s_r4b_fp16.onnx |
exports/mtg_4kp_s.onnx β Baseline FP32 reference (r4a)
| Property | Value |
|---|---|
| Source | Round 4a best checkpoint (runs/r4a_best_epoch0_AP997.pth, epoch 0) |
| Format | ONNX opset 17, FP32, external data file (mtg_4kp_s.onnx.data) |
| Size | 45 MB + .data file |
| AP | 0.9970 |
| Target runtime | ORT β₯ 1.14, CPU EP; reference for accuracy comparison |