File size: 5,717 Bytes
1d7cec6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | ---
license: mit
library_name: pytorch
tags:
- person-reidentification
- person-reid
- computer-vision
- pytorch
- resnet
- bnneck
- triplet-loss
- multi-camera-tracking
datasets:
- Market-1501
metrics:
- mAP
- Rank-1
---
# MCTrack Re-ID Model
Person re-identification model trained on Market-1501 as part of the
**MSML 640 (Computer Vision) Final Project** at the University of Maryland.
This model is the appearance backbone used in our multi-camera object tracking
system.
## Model Variants
This repo contains two checkpoints:
- **best_60ep.pth** (primary) - trained for 60 epochs.
Used as the deployed Re-ID model in our final cross-camera demos due to
qualitatively cleaner cluster outputs.
- **best_120ep.pth** (ablation) - same recipe, trained for 120 epochs.
Slightly higher Re-ID accuracy but only marginal improvement on downstream
tasks (see "Performance" below).
Both checkpoints contain only the model state_dict. No optimizer or scheduler
state.
## Architecture
- **Backbone:** ResNet-50 (ImageNet-initialized; final stride changed from
2 to 1 to retain higher-resolution features)
- **Pooling:** Global average pooling
- **Neck:** BNNeck (Batch Normalization neck) - separates triplet-loss
features from classification features
- **Embedding dimension:** 256
- **Total parameters:** ~25M
## Training Recipe
| Setting | Value |
| ---------------- | ---------------------------------------------------------------- |
| Dataset | Market-1501 (12,936 train images, 751 train IDs) |
| Identity sampler | P=16 IDs x K=4 instances per batch |
| Batch size | 64 |
| Optimizer | Adam, lr=3.5e-4, weight_decay=5e-4 |
| LR schedule | Linear warmup (10 epochs), step decay (x0.1 at epochs 40 and 70) |
| Loss | Combined CE (label smoothing 0.1) + triplet (soft margin) |
| Image size | 256x128 |
| Augmentation | Random horizontal flip, random erasing |
## Performance
### Standalone Re-ID (Market-1501)
| Variant | mAP | Rank-1 | Rank-5 | Rank-10 |
| --------- | ----- | ------ | ------ | ------- |
| 60-epoch | 73.73 | 89.32 | 96.04 | 97.74 |
| 120-epoch | 75.15 | 90.66 | 96.79 | 98.04 |
### Downstream - Single-camera tracking (MOT17, with DeepSORT)
| Variant | HOTA | MOTA | IDF1 | IDSW |
| --------- | ---- | ---- | ---- | ---- |
| 60-epoch | 41.1 | 36.5 | 48.6 | 260 |
| 120-epoch | 41.0 | 36.6 | 48.8 | 246 |
### Downstream - Cross-camera tracking (Wildtrack, with ground-plane filter)
| Variant | IDF1 | IDP | IDR |
| --------- | ----- | ----- | ----- |
| 60-epoch | 16.99 | 23.80 | 13.21 |
| 120-epoch | 18.72 | 26.41 | 14.50 |
## Usage
Download and load with `huggingface_hub`:
```python
from huggingface_hub import hf_hub_download
import torch
ckpt_path = hf_hub_download(
repo_id="blank4hd/mctrack-reid",
filename="best_60ep.pth",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model_state = state["state_dict"]
```
To use this with the model architecture, you need the `ReIDModel` class from
the project repository. A minimal standalone loader (`load_reid.py`) is
provided alongside this model card with the architecture definition inlined,
so the model can be used without cloning the full project.
## Intended Use
This model produces 256-dimensional appearance embeddings for cropped person
images. Two crops of the same person are expected to produce embeddings with
high cosine similarity; crops of different people produce embeddings with low
similarity.
**Suitable for:**
- Person re-identification in research / academic settings
- Appearance feature extraction in tracking pipelines (e.g., DeepSORT)
- Educational demonstration of metric learning
**Not suitable for:**
- Surveillance applications without explicit consent
- Identification of individuals across populations (high false-positive rate
in cross-domain settings)
- Any use where reliability is safety-critical
## Limitations
- **Domain gap.** Trained on Market-1501 (Tsinghua University campus, ~5
surveillance cameras). Performance degrades on outdoor pedestrian-square
scenes (e.g., Wildtrack), where IDF1 drops to ~17-19% in cross-camera
matching.
- **Person crops only.** Expects the input to be a tightly-cropped person
image. Whole scenes or non-person inputs produce meaningless embeddings.
- **Resolution sensitive.** Trained at 256x128. Significantly different input
resolutions will degrade quality.
- **No fairness audit.** Not evaluated for performance disparities across
demographic groups.
## Training Details (compute and time)
- **Hardware:** Apple M4 Pro (MPS backend)
- **Per-epoch time:** ~46 seconds
- **Total training time:** 60-epoch ~46 min; 120-epoch ~92 min
- **Memory usage:** ~3 GB unified memory at batch size 64
## Citation
This work was completed for the MSML 640 final project, Spring 2026.
```
Group 9 - MSML 640 Final Project
Multi-Camera Object Tracking with Person Re-Identification
```
## Acknowledgments
- Architecture inspired by Luo et al. ("Bag of Tricks and a Strong Baseline
for Deep Person Re-Identification", CVPR Workshop 2019)
- BNNeck design from the same paper
- Triplet loss formulation from Hermans et al. ("In Defense of the Triplet
Loss for Person Re-Identification", arXiv 2017)
- Market-1501 dataset from Zheng et al. ("Scalable Person Re-Identification:
A Benchmark", ICCV 2015)
|