Release Notes — MTG Card Pose Estimation: Round 4b

Date: 2026-05-19
Training run: runs/run_20260519_012809
Published artifacts:

runs/run_20260519_012809/ — training checkpoints + config + logs
exports/mtg_4kp_s_r4b_fp16.onnx — production inference model (WebGPU / web demo)(recomended)
exports/mtg_4kp_s.onnx — baseline FP32 reference model (r4a, CPU-compatible)

Resources

Pose Model: https://huggingface.co/dhvazquez/mtg_card_pose_estimation
Pose Train: https://github.com/diegovazquez/mtg_train_card_pose_estimation

Embedding Model:https://huggingface.co/dhvazquez/mtg_card_identification_embeddings
Embedding Train: https://github.com/diegovazquez/mtg_train_card_identification_embeddings

Dataset: https://huggingface.co/datasets/dhvazquez/mtg_synthetic_large_dataset
Dataset Generator: https://github.com/diegovazquez/mtg_synthetic_dataset_generator

Model Overview

Task: Single-class MTG card corner detection — 4 keypoints (TL, TR, BR, BL) per card. Architecture: DETRPose-S with HGNetV2-B0 backbone (fork: SebastianJanampa/DETRPose, branch mtg-4kp). Input: 640×640 RGB, normalized to [0, 1] (mean=0, std=1 — no ImageNet normalization). Output: (scores, boxes, keypoints) — boxes derived from keypoints, no dedicated bbox head.

Architecture Changes vs Upstream DETRPose-S

Round 4b shrinks the transformer 3–4× relative to upstream, targeting single-card inference latency on mobile WebGPU.

Parameter	Upstream DETRPose-S	Round 4a	Round 4b (this release)
`hidden_dim`	256	256	128
`dim_feedforward`	1024	1024	512
`nhead`	8	8	4
`num_decoder_layers`	6	3	1
`num_queries`	60	10	4
`dec_n_points`	4	2	2
Total parameters	~11.4 M	~8 M	3.6 M

Backbone (HGNetV2-B0) is unchanged. The transformer accounts for the remaining parameter budget; backbone dominates latency on low-end hardware.

Training Configuration

Setting	Value
Config	`detrpose/configs/mtg_card_4kp.py` (r4b overrides)
Epochs configured	150
Epochs completed	2 (saturated — see Results)
Batch size	64 (32/GPU × 2× RTX 3090)
LR (head / backbone)	2e-4 / 2e-5 (√-scaled from bs=16 spec)
Optimizer	AdamW, weight_decay=1e-4, grad_clip=0.1
AMP	FP16
Val cap (`max_eval_samples`)	10,000 (deterministic subset, seed=42)
OKS sigmas	[0.025, 0.025, 0.025, 0.025] (uniform 4-corner)
Augmentation onset	Mosaic @ epoch 5, HSVJitter, ColorJitter, horizontal flip with corner swap
Flip pairs	`[[0,1],[2,3]]` — TL↔TR, BL↔BR
Normalization	mean=[0,0,0], std=[1,1,1]

Results

Training converged in 1 epoch. Epoch 1 checkpoint saved as checkpoint_best_regular.pth.

Epoch	Train Loss	AP	AP@0.5	Mean L2 (normalized)
0	12.54	0.9901	0.9999	0.000816
1	2.99	0.9990	1.0000	0.000673

Per-corner L2 at epoch 1 (best):

Corner	L2 (normalized)
TL	0.000650
TR	0.000677
BR	0.000681
BL	0.000685

Epoch time: ~2 h 17 min/epoch on 2× RTX 3090. Peak VRAM: 11.2 GB/GPU.

Published Artifacts

`exports/mtg_4kp_s_r4b_fp16.onnx` — Production model (WebGPU)

Property	Value
Source checkpoint	`runs/run_20260519_012809/checkpoint_best_regular.pth` (epoch 1)
Format	ONNX opset 17, FP16 weights
Size	7.2 MB
Target runtime	ORT Web ≥ 1.26, WebGPU EP (primary), WASM fallback
Web demo symlink	`web/model/mtg_4kp_s_fp16.onnx → exports/mtg_4kp_s_r4b_fp16.onnx`

`exports/mtg_4kp_s.onnx` — Baseline FP32 reference (r4a)

Property	Value
Source	Round 4a best checkpoint (`runs/r4a_best_epoch0_AP997.pth`, epoch 0)
Format	ONNX opset 17, FP32, external data file (`mtg_4kp_s.onnx.data`)
Size	45 MB + `.data` file
AP	0.9970
Target runtime	ORT ≥ 1.14, CPU EP; reference for accuracy comparison

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support