Release Notes β€” MTG Card Pose Estimation: Round 4b

Date: 2026-05-19
Training run: runs/run_20260519_012809
Published artifacts:

  • runs/run_20260519_012809/ β€” training checkpoints + config + logs
  • exports/mtg_4kp_s_r4b_fp16.onnx β€” production inference model (WebGPU / web demo)(recomended)
  • exports/mtg_4kp_s.onnx β€” baseline FP32 reference model (r4a, CPU-compatible)

Resources

Pose Model: https://huggingface.co/dhvazquez/mtg_card_pose_estimation
Pose Train: https://github.com/diegovazquez/mtg_train_card_pose_estimation

Embedding Model:https://huggingface.co/dhvazquez/mtg_card_identification_embeddings
Embedding Train: https://github.com/diegovazquez/mtg_train_card_identification_embeddings

Dataset: https://huggingface.co/datasets/dhvazquez/mtg_synthetic_large_dataset
Dataset Generator: https://github.com/diegovazquez/mtg_synthetic_dataset_generator

Model Overview

Task: Single-class MTG card corner detection β€” 4 keypoints (TL, TR, BR, BL) per card. Architecture: DETRPose-S with HGNetV2-B0 backbone (fork: SebastianJanampa/DETRPose, branch mtg-4kp). Input: 640Γ—640 RGB, normalized to [0, 1] (mean=0, std=1 β€” no ImageNet normalization). Output: (scores, boxes, keypoints) β€” boxes derived from keypoints, no dedicated bbox head.

Architecture Changes vs Upstream DETRPose-S

Round 4b shrinks the transformer 3–4Γ— relative to upstream, targeting single-card inference latency on mobile WebGPU.

Parameter Upstream DETRPose-S Round 4a Round 4b (this release)
hidden_dim 256 256 128
dim_feedforward 1024 1024 512
nhead 8 8 4
num_decoder_layers 6 3 1
num_queries 60 10 4
dec_n_points 4 2 2
Total parameters ~11.4 M ~8 M 3.6 M

Backbone (HGNetV2-B0) is unchanged. The transformer accounts for the remaining parameter budget; backbone dominates latency on low-end hardware.


Training Configuration

Setting Value
Config detrpose/configs/mtg_card_4kp.py (r4b overrides)
Epochs configured 150
Epochs completed 2 (saturated β€” see Results)
Batch size 64 (32/GPU Γ— 2Γ— RTX 3090)
LR (head / backbone) 2e-4 / 2e-5 (√-scaled from bs=16 spec)
Optimizer AdamW, weight_decay=1e-4, grad_clip=0.1
AMP FP16
Val cap (max_eval_samples) 10,000 (deterministic subset, seed=42)
OKS sigmas [0.025, 0.025, 0.025, 0.025] (uniform 4-corner)
Augmentation onset Mosaic @ epoch 5, HSVJitter, ColorJitter, horizontal flip with corner swap
Flip pairs [[0,1],[2,3]] β€” TL↔TR, BL↔BR
Normalization mean=[0,0,0], std=[1,1,1]

Results

Training converged in 1 epoch. Epoch 1 checkpoint saved as checkpoint_best_regular.pth.

Epoch Train Loss AP AP@0.5 Mean L2 (normalized)
0 12.54 0.9901 0.9999 0.000816
1 2.99 0.9990 1.0000 0.000673

Per-corner L2 at epoch 1 (best):

Corner L2 (normalized)
TL 0.000650
TR 0.000677
BR 0.000681
BL 0.000685

Epoch time: ~2 h 17 min/epoch on 2Γ— RTX 3090. Peak VRAM: 11.2 GB/GPU.

Published Artifacts

exports/mtg_4kp_s_r4b_fp16.onnx β€” Production model (WebGPU)

Property Value
Source checkpoint runs/run_20260519_012809/checkpoint_best_regular.pth (epoch 1)
Format ONNX opset 17, FP16 weights
Size 7.2 MB
Target runtime ORT Web β‰₯ 1.26, WebGPU EP (primary), WASM fallback
Web demo symlink web/model/mtg_4kp_s_fp16.onnx β†’ exports/mtg_4kp_s_r4b_fp16.onnx

exports/mtg_4kp_s.onnx β€” Baseline FP32 reference (r4a)

Property Value
Source Round 4a best checkpoint (runs/r4a_best_epoch0_AP997.pth, epoch 0)
Format ONNX opset 17, FP32, external data file (mtg_4kp_s.onnx.data)
Size 45 MB + .data file
AP 0.9970
Target runtime ORT β‰₯ 1.14, CPU EP; reference for accuracy comparison
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support