mctrack-reid / README.md
blank4hd's picture
Upload Re-ID model checkpoints and model card
1d7cec6 verified
---
license: mit
library_name: pytorch
tags:
- person-reidentification
- person-reid
- computer-vision
- pytorch
- resnet
- bnneck
- triplet-loss
- multi-camera-tracking
datasets:
- Market-1501
metrics:
- mAP
- Rank-1
---
# MCTrack Re-ID Model
Person re-identification model trained on Market-1501 as part of the
**MSML 640 (Computer Vision) Final Project** at the University of Maryland.
This model is the appearance backbone used in our multi-camera object tracking
system.
## Model Variants
This repo contains two checkpoints:
- **best_60ep.pth** (primary) - trained for 60 epochs.
Used as the deployed Re-ID model in our final cross-camera demos due to
qualitatively cleaner cluster outputs.
- **best_120ep.pth** (ablation) - same recipe, trained for 120 epochs.
Slightly higher Re-ID accuracy but only marginal improvement on downstream
tasks (see "Performance" below).
Both checkpoints contain only the model state_dict. No optimizer or scheduler
state.
## Architecture
- **Backbone:** ResNet-50 (ImageNet-initialized; final stride changed from
2 to 1 to retain higher-resolution features)
- **Pooling:** Global average pooling
- **Neck:** BNNeck (Batch Normalization neck) - separates triplet-loss
features from classification features
- **Embedding dimension:** 256
- **Total parameters:** ~25M
## Training Recipe
| Setting | Value |
| ---------------- | ---------------------------------------------------------------- |
| Dataset | Market-1501 (12,936 train images, 751 train IDs) |
| Identity sampler | P=16 IDs x K=4 instances per batch |
| Batch size | 64 |
| Optimizer | Adam, lr=3.5e-4, weight_decay=5e-4 |
| LR schedule | Linear warmup (10 epochs), step decay (x0.1 at epochs 40 and 70) |
| Loss | Combined CE (label smoothing 0.1) + triplet (soft margin) |
| Image size | 256x128 |
| Augmentation | Random horizontal flip, random erasing |
## Performance
### Standalone Re-ID (Market-1501)
| Variant | mAP | Rank-1 | Rank-5 | Rank-10 |
| --------- | ----- | ------ | ------ | ------- |
| 60-epoch | 73.73 | 89.32 | 96.04 | 97.74 |
| 120-epoch | 75.15 | 90.66 | 96.79 | 98.04 |
### Downstream - Single-camera tracking (MOT17, with DeepSORT)
| Variant | HOTA | MOTA | IDF1 | IDSW |
| --------- | ---- | ---- | ---- | ---- |
| 60-epoch | 41.1 | 36.5 | 48.6 | 260 |
| 120-epoch | 41.0 | 36.6 | 48.8 | 246 |
### Downstream - Cross-camera tracking (Wildtrack, with ground-plane filter)
| Variant | IDF1 | IDP | IDR |
| --------- | ----- | ----- | ----- |
| 60-epoch | 16.99 | 23.80 | 13.21 |
| 120-epoch | 18.72 | 26.41 | 14.50 |
## Usage
Download and load with `huggingface_hub`:
```python
from huggingface_hub import hf_hub_download
import torch
ckpt_path = hf_hub_download(
repo_id="blank4hd/mctrack-reid",
filename="best_60ep.pth",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model_state = state["state_dict"]
```
To use this with the model architecture, you need the `ReIDModel` class from
the project repository. A minimal standalone loader (`load_reid.py`) is
provided alongside this model card with the architecture definition inlined,
so the model can be used without cloning the full project.
## Intended Use
This model produces 256-dimensional appearance embeddings for cropped person
images. Two crops of the same person are expected to produce embeddings with
high cosine similarity; crops of different people produce embeddings with low
similarity.
**Suitable for:**
- Person re-identification in research / academic settings
- Appearance feature extraction in tracking pipelines (e.g., DeepSORT)
- Educational demonstration of metric learning
**Not suitable for:**
- Surveillance applications without explicit consent
- Identification of individuals across populations (high false-positive rate
in cross-domain settings)
- Any use where reliability is safety-critical
## Limitations
- **Domain gap.** Trained on Market-1501 (Tsinghua University campus, ~5
surveillance cameras). Performance degrades on outdoor pedestrian-square
scenes (e.g., Wildtrack), where IDF1 drops to ~17-19% in cross-camera
matching.
- **Person crops only.** Expects the input to be a tightly-cropped person
image. Whole scenes or non-person inputs produce meaningless embeddings.
- **Resolution sensitive.** Trained at 256x128. Significantly different input
resolutions will degrade quality.
- **No fairness audit.** Not evaluated for performance disparities across
demographic groups.
## Training Details (compute and time)
- **Hardware:** Apple M4 Pro (MPS backend)
- **Per-epoch time:** ~46 seconds
- **Total training time:** 60-epoch ~46 min; 120-epoch ~92 min
- **Memory usage:** ~3 GB unified memory at batch size 64
## Citation
This work was completed for the MSML 640 final project, Spring 2026.
```
Group 9 - MSML 640 Final Project
Multi-Camera Object Tracking with Person Re-Identification
```
## Acknowledgments
- Architecture inspired by Luo et al. ("Bag of Tricks and a Strong Baseline
for Deep Person Re-Identification", CVPR Workshop 2019)
- BNNeck design from the same paper
- Triplet loss formulation from Hermans et al. ("In Defense of the Triplet
Loss for Person Re-Identification", arXiv 2017)
- Market-1501 dataset from Zheng et al. ("Scalable Person Re-Identification:
A Benchmark", ICCV 2015)