MCTrack Re-ID Model

Person re-identification model trained on Market-1501 as part of the MSML 640 (Computer Vision) Final Project at the University of Maryland. This model is the appearance backbone used in our multi-camera object tracking system.

Model Variants

This repo contains two checkpoints:

best_60ep.pth (primary) - trained for 60 epochs. Used as the deployed Re-ID model in our final cross-camera demos due to qualitatively cleaner cluster outputs.
best_120ep.pth (ablation) - same recipe, trained for 120 epochs. Slightly higher Re-ID accuracy but only marginal improvement on downstream tasks (see "Performance" below).

Both checkpoints contain only the model state_dict. No optimizer or scheduler state.

Architecture

Backbone: ResNet-50 (ImageNet-initialized; final stride changed from 2 to 1 to retain higher-resolution features)
Pooling: Global average pooling
Neck: BNNeck (Batch Normalization neck) - separates triplet-loss features from classification features
Embedding dimension: 256
Total parameters: ~25M

Training Recipe

Setting	Value
Dataset	Market-1501 (12,936 train images, 751 train IDs)
Identity sampler	P=16 IDs x K=4 instances per batch
Batch size	64
Optimizer	Adam, lr=3.5e-4, weight_decay=5e-4
LR schedule	Linear warmup (10 epochs), step decay (x0.1 at epochs 40 and 70)
Loss	Combined CE (label smoothing 0.1) + triplet (soft margin)
Image size	256x128
Augmentation	Random horizontal flip, random erasing

Performance

Standalone Re-ID (Market-1501)

Variant	mAP	Rank-1	Rank-5	Rank-10
60-epoch	73.73	89.32	96.04	97.74
120-epoch	75.15	90.66	96.79	98.04

Downstream - Single-camera tracking (MOT17, with DeepSORT)

Variant	HOTA	MOTA	IDF1	IDSW
60-epoch	41.1	36.5	48.6	260
120-epoch	41.0	36.6	48.8	246

Downstream - Cross-camera tracking (Wildtrack, with ground-plane filter)

Variant	IDF1	IDP	IDR
60-epoch	16.99	23.80	13.21
120-epoch	18.72	26.41	14.50

Usage

Download and load with huggingface_hub:

from huggingface_hub import hf_hub_download
import torch

ckpt_path = hf_hub_download(
    repo_id="blank4hd/mctrack-reid",
    filename="best_60ep.pth",
)

state = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model_state = state["state_dict"]

To use this with the model architecture, you need the ReIDModel class from the project repository. A minimal standalone loader (load_reid.py) is provided alongside this model card with the architecture definition inlined, so the model can be used without cloning the full project.

Intended Use

This model produces 256-dimensional appearance embeddings for cropped person images. Two crops of the same person are expected to produce embeddings with high cosine similarity; crops of different people produce embeddings with low similarity.

Suitable for:

Person re-identification in research / academic settings
Appearance feature extraction in tracking pipelines (e.g., DeepSORT)
Educational demonstration of metric learning

Not suitable for:

Surveillance applications without explicit consent
Identification of individuals across populations (high false-positive rate in cross-domain settings)
Any use where reliability is safety-critical

Limitations

Domain gap. Trained on Market-1501 (Tsinghua University campus, ~5 surveillance cameras). Performance degrades on outdoor pedestrian-square scenes (e.g., Wildtrack), where IDF1 drops to ~17-19% in cross-camera matching.
Person crops only. Expects the input to be a tightly-cropped person image. Whole scenes or non-person inputs produce meaningless embeddings.
Resolution sensitive. Trained at 256x128. Significantly different input resolutions will degrade quality.
No fairness audit. Not evaluated for performance disparities across demographic groups.

Training Details (compute and time)

Hardware: Apple M4 Pro (MPS backend)
Per-epoch time: ~46 seconds
Total training time: 60-epoch ~46 min; 120-epoch ~92 min
Memory usage: ~3 GB unified memory at batch size 64

Citation

This work was completed for the MSML 640 final project, Spring 2026.

Group 9 - MSML 640 Final Project
Multi-Camera Object Tracking with Person Re-Identification

Acknowledgments

Architecture inspired by Luo et al. ("Bag of Tricks and a Strong Baseline for Deep Person Re-Identification", CVPR Workshop 2019)
BNNeck design from the same paper
Triplet loss formulation from Hermans et al. ("In Defense of the Triplet Loss for Person Re-Identification", arXiv 2017)
Market-1501 dataset from Zheng et al. ("Scalable Person Re-Identification: A Benchmark", ICCV 2015)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

blank4hd
/

mctrack-reid