File size: 5,717 Bytes
1d7cec6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: mit
library_name: pytorch
tags:
  - person-reidentification
  - person-reid
  - computer-vision
  - pytorch
  - resnet
  - bnneck
  - triplet-loss
  - multi-camera-tracking
datasets:
  - Market-1501
metrics:
  - mAP
  - Rank-1
---

# MCTrack Re-ID Model

Person re-identification model trained on Market-1501 as part of the
**MSML 640 (Computer Vision) Final Project** at the University of Maryland.
This model is the appearance backbone used in our multi-camera object tracking
system.

## Model Variants

This repo contains two checkpoints:

- **best_60ep.pth** (primary) - trained for 60 epochs.
  Used as the deployed Re-ID model in our final cross-camera demos due to
  qualitatively cleaner cluster outputs.
- **best_120ep.pth** (ablation) - same recipe, trained for 120 epochs.
  Slightly higher Re-ID accuracy but only marginal improvement on downstream
  tasks (see "Performance" below).

Both checkpoints contain only the model state_dict. No optimizer or scheduler
state.

## Architecture

- **Backbone:** ResNet-50 (ImageNet-initialized; final stride changed from
  2 to 1 to retain higher-resolution features)
- **Pooling:** Global average pooling
- **Neck:** BNNeck (Batch Normalization neck) - separates triplet-loss
  features from classification features
- **Embedding dimension:** 256
- **Total parameters:** ~25M

## Training Recipe

| Setting          | Value                                                            |
| ---------------- | ---------------------------------------------------------------- |
| Dataset          | Market-1501 (12,936 train images, 751 train IDs)                 |
| Identity sampler | P=16 IDs x K=4 instances per batch                               |
| Batch size       | 64                                                               |
| Optimizer        | Adam, lr=3.5e-4, weight_decay=5e-4                               |
| LR schedule      | Linear warmup (10 epochs), step decay (x0.1 at epochs 40 and 70) |
| Loss             | Combined CE (label smoothing 0.1) + triplet (soft margin)        |
| Image size       | 256x128                                                          |
| Augmentation     | Random horizontal flip, random erasing                           |

## Performance

### Standalone Re-ID (Market-1501)

| Variant   | mAP   | Rank-1 | Rank-5 | Rank-10 |
| --------- | ----- | ------ | ------ | ------- |
| 60-epoch  | 73.73 | 89.32  | 96.04  | 97.74   |
| 120-epoch | 75.15 | 90.66  | 96.79  | 98.04   |

### Downstream - Single-camera tracking (MOT17, with DeepSORT)

| Variant   | HOTA | MOTA | IDF1 | IDSW |
| --------- | ---- | ---- | ---- | ---- |
| 60-epoch  | 41.1 | 36.5 | 48.6 | 260  |
| 120-epoch | 41.0 | 36.6 | 48.8 | 246  |

### Downstream - Cross-camera tracking (Wildtrack, with ground-plane filter)

| Variant   | IDF1  | IDP   | IDR   |
| --------- | ----- | ----- | ----- |
| 60-epoch  | 16.99 | 23.80 | 13.21 |
| 120-epoch | 18.72 | 26.41 | 14.50 |

## Usage

Download and load with `huggingface_hub`:

```python
from huggingface_hub import hf_hub_download
import torch

ckpt_path = hf_hub_download(
    repo_id="blank4hd/mctrack-reid",
    filename="best_60ep.pth",
)

state = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model_state = state["state_dict"]
```

To use this with the model architecture, you need the `ReIDModel` class from
the project repository. A minimal standalone loader (`load_reid.py`) is
provided alongside this model card with the architecture definition inlined,
so the model can be used without cloning the full project.

## Intended Use

This model produces 256-dimensional appearance embeddings for cropped person
images. Two crops of the same person are expected to produce embeddings with
high cosine similarity; crops of different people produce embeddings with low
similarity.

**Suitable for:**

- Person re-identification in research / academic settings
- Appearance feature extraction in tracking pipelines (e.g., DeepSORT)
- Educational demonstration of metric learning

**Not suitable for:**

- Surveillance applications without explicit consent
- Identification of individuals across populations (high false-positive rate
  in cross-domain settings)
- Any use where reliability is safety-critical

## Limitations

- **Domain gap.** Trained on Market-1501 (Tsinghua University campus, ~5
  surveillance cameras). Performance degrades on outdoor pedestrian-square
  scenes (e.g., Wildtrack), where IDF1 drops to ~17-19% in cross-camera
  matching.
- **Person crops only.** Expects the input to be a tightly-cropped person
  image. Whole scenes or non-person inputs produce meaningless embeddings.
- **Resolution sensitive.** Trained at 256x128. Significantly different input
  resolutions will degrade quality.
- **No fairness audit.** Not evaluated for performance disparities across
  demographic groups.

## Training Details (compute and time)

- **Hardware:** Apple M4 Pro (MPS backend)
- **Per-epoch time:** ~46 seconds
- **Total training time:** 60-epoch ~46 min; 120-epoch ~92 min
- **Memory usage:** ~3 GB unified memory at batch size 64

## Citation

This work was completed for the MSML 640 final project, Spring 2026.

```
Group 9 - MSML 640 Final Project
Multi-Camera Object Tracking with Person Re-Identification
```

## Acknowledgments

- Architecture inspired by Luo et al. ("Bag of Tricks and a Strong Baseline
  for Deep Person Re-Identification", CVPR Workshop 2019)
- BNNeck design from the same paper
- Triplet loss formulation from Hermans et al. ("In Defense of the Triplet
  Loss for Person Re-Identification", arXiv 2017)
- Market-1501 dataset from Zheng et al. ("Scalable Person Re-Identification:
  A Benchmark", ICCV 2015)