Image Feature Extraction
Py-Feat
model_hub_mixin
pytorch_model_hub_mixin
ljchang commited on
Commit
1e2637b
·
verified ·
1 Parent(s): ce3cddf

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +65 -233
README.md CHANGED
@@ -1,262 +1,94 @@
1
  ---
 
 
 
 
 
2
  license: other
3
  license_name: insightface-non-commercial-research
4
  license_link: https://github.com/deepinsight/insightface/blob/master/LICENSE
5
- tags:
6
- - face-recognition
7
- - arcface
8
- - face-identification
9
- - face-embeddings
10
- - pytorch
11
- - safetensors
12
- - py-feat
13
- library_name: pytorch
14
- pipeline_tag: image-feature-extraction
15
- language:
16
- - en
17
  ---
18
 
19
- # ArcFace ResNet50 (WebFace600K) for py-feat
20
 
21
- PyTorch port of the **ArcFace ResNet50** face-recognition model
22
- distributed by InsightFace as part of the
23
- [`buffalo_l`](https://github.com/deepinsight/insightface/tree/master/python-package#use-the-buffalo_l-pack)
24
- pack (file `w600k_r50.onnx`). Repackaged as a `.safetensors` file for
25
- use as a drop-in identity detector inside
26
- [`py-feat`](https://github.com/cosanlab/py-feat) ≥ 0.7.
27
 
28
- This repo distributes **only** the recognition (face-embedding) model
29
- from `buffalo_l`. The face-detection / landmarks models in `buffalo_l`
30
- are not used by py-feat (we have our own RetinaFace-R34 detector).
 
 
 
 
 
31
 
32
- ## Quick start
 
 
 
 
33
 
34
- ```python
35
- from feat import Detector
36
 
37
- # ArcFace is the default identity model in py-feat 0.7
38
- detector = Detector() # implicitly uses arcface
39
- # or explicitly:
40
- detector = Detector(identity_model="arcface")
41
 
42
- fex = detector.detect("video.mp4")
43
- # fex.identity_embeddings is a [n_faces, 512] embedding table.
44
- # fex.Identity is a connected-components cluster label per face.
45
  ```
46
-
47
- ## Model details
48
-
49
- | Property | Value |
50
- |---|---|
51
- | Architecture | IResNet-50 (Improved ResNet, fused-BN form) |
52
- | Parameters | 43.6 M |
53
- | Input | RGB face crop, 112×112, pixel range `[0, 1]` (the wrapper rescales to `[-1, 1]`) |
54
- | Output | 512-dim embedding vector (L2-normalized at use time, not at output) |
55
- | File size | 166 MB (safetensors, fp32) |
56
- | Loss | ArcFace additive angular margin softmax (Deng et al., 2019) |
57
- | Training data | WebFace600K — Tsinghua's curated subset of WebFace260M (≈600K identities) |
58
- | Trainer / source | InsightFace (Guo, Deng et al.) — from `buffalo_l` v0.7 release |
59
-
60
- ## Reported benchmarks
61
-
62
- Benchmark numbers below are *as reported by InsightFace* on the original
63
- ONNX model. The PyTorch port we ship is verified bit-equivalent (max
64
- absolute output difference 5.9e-6, per-row cosine similarity = 1.0 on
65
- random inputs), so these numbers carry over.
66
-
67
- | Benchmark | Score |
68
- |---|---|
69
- | LFW (verification accuracy) | 99.83 % |
70
- | CFP-FP | 98.86 % |
71
- | AgeDB-30 | 98.13 % |
72
- | **IJB-C** (TAR @ FAR = 1e-4) | **96.18 %** |
73
- | IJB-B (TAR @ FAR = 1e-4) | 95.05 % |
74
-
75
- For the FEAT use case (clustering identities across video frames with
76
- varied pose and expression), IJB-C is the most relevant benchmark. The
77
- prior default (FaceNet/VGGFace2) reaches roughly 80 % TAR @ FAR=1e-4 on
78
- IJB-C — ArcFace is ~16 percentage points better, which is the gap
79
- that drives FEAT's clustering quality lift.
80
-
81
- ## In py-feat: empirical comparison
82
-
83
- On `multi_face.jpg` (5 different people, retinaface_r34 face crops):
84
-
85
- | Identity model | Min off-diag pairwise cos | Max off-diag pairwise cos | Mean off-diag pairwise cos |
86
- |---|---|---|---|
87
- | FaceNet (prior default) | -0.07 | **0.76** | 0.23 |
88
- | ArcFace R50 (this model) | 0.05 | **0.35** | 0.13 |
89
-
90
- At a typical clustering threshold of 0.5, the FaceNet's max cross-similarity
91
- of 0.76 falsely merges two different people into one identity. ArcFace's
92
- max of 0.35 leaves comfortable margin and keeps all 5 identities distinct.
93
-
94
- Inference cost change is small: 13.0 → 13.5 ms/frame (+4 %) on M5 MBP MPS,
95
- batch=16, full retinaface_r34 + svm AU + identity pipeline on a 472-frame
96
- video. ArcFace's larger 43.6M-param IR-50 backbone is offset by its
97
- smaller 112×112 input (vs. FaceNet's 160×160).
98
-
99
- ## Intended use
100
-
101
- - Face-identity clustering in videos / image batches for affective-science
102
- research, where multiple subjects need to be tracked across frames.
103
- - Retrieval / verification (cosine similarity over the 512-dim embedding).
104
- - Building feature tables that downstream analyses can group on.
105
-
106
- ### Out of scope
107
-
108
- - **Surveillance / law-enforcement identification** of unconsented subjects.
109
- The training data (WebFace600K) was assembled by web scraping; subjects
110
- did not opt in to face-recognition training. Use that informs your
111
- consent / IRB process accordingly.
112
- - **High-stakes identity verification** (e.g., access control, financial
113
- authentication) where false-positive costs are real. Demographic
114
- fairness was not measured by us; the InsightFace authors document
115
- some cross-group disparity.
116
- - **Biometric identification of children.** WebFace600K is dominated by
117
- adult subjects.
118
-
119
- ## Limitations and biases
120
-
121
- - **Demographic bias.** Web-scraped face datasets like WebFace260M /
122
- WebFace600K are known to over-represent young / Asian / male subjects
123
- relative to the global population. Verification accuracy varies
124
- measurably across demographic groups; downstream studies should
125
- validate per-group performance before publishing claims.
126
- - **Pose extremes.** The model was trained primarily on ~frontal faces.
127
- Profile views (yaw > 60°) produce noticeably noisier embeddings.
128
- - **Occlusion / image quality.** Heavily occluded faces (masks, hands,
129
- partial visibility) and very small / blurry crops will return
130
- embeddings, but those embeddings should be treated as low-confidence.
131
- py-feat has no built-in quality filtering yet (planned for a future
132
- release; see `docs/superpowers/specs/2026-05-02-identity-detection-roadmap.md`
133
- in the py-feat repo).
134
- - **Children.** Performance on faces under ~12 years old is not
135
- characterized.
136
-
137
- ## License
138
-
139
- This model is distributed under the **InsightFace non-commercial
140
- research license**:
141
- [https://github.com/deepinsight/insightface/blob/master/LICENSE](https://github.com/deepinsight/insightface/blob/master/LICENSE).
142
-
143
- Two layers to be aware of:
144
-
145
- 1. **Model code (architecture and training scripts):** MIT-licensed by
146
- InsightFace.
147
- 2. **Pretrained weights (this repo's `.safetensors`):** Non-commercial
148
- research only. The underlying training data (WebFace600K, derived
149
- from WebFace260M assembled by Tsinghua University) is also research-
150
- only.
151
-
152
- py-feat itself is MIT-licensed, but the MIT license does **not** override the
153
- upstream weight license. Commercial users must independently validate
154
- that their use case is compatible with the InsightFace and WebFace260M
155
- terms — py-feat does not grant you commercial rights to these weights.
156
-
157
- ## Reproducing the conversion
158
-
159
- ```bash
160
- # 1. Pull the buffalo_l pack (~275 MB) from InsightFace's GitHub release
161
- curl -L -o /tmp/buffalo_l.zip \
162
- https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
163
- unzip -o /tmp/buffalo_l.zip w600k_r50.onnx -d /tmp/
164
-
165
- # 2. Convert + verify against the ONNX runtime output
166
- python scripts/convert_arcface_onnx_to_safetensors.py \
167
- --onnx /tmp/w600k_r50.onnx \
168
- --out /tmp/arcface_r50.safetensors \
169
- --backbone r50 \
170
- --verify
171
- # Expected: max |diff| < 1e-5, cosine similarity = 1.0 across two random inputs.
172
- ```
173
-
174
- The converter walks the ONNX graph in topological order to map Conv
175
- weights, Conv biases (which are stored under numeric ONNX ids), PReLU
176
- slopes, and BN params onto the matching PyTorch parameter names. The
177
- fused-BN ONNX export collapses most BatchNorm layers into the adjacent
178
- Conv via `W' = γW/σ`; the architecture in
179
- `feat/identity_detectors/arcface/iresnet.py` mirrors that fused
180
- structure so the load is bit-exact (modulo float32 rounding).
181
-
182
- ## Citation and attribution
183
-
184
- If you use this model, please cite both the **method paper** and the
185
- **InsightFace project** (which trained and distributes the weights):
186
-
187
- ```bibtex
188
  @inproceedings{deng2019arcface,
189
- title = {ArcFace: Additive Angular Margin Loss for Deep Face Recognition},
190
- author = {Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
191
- booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
192
- pages = {4690--4699},
193
- year = {2019}
194
  }
195
  ```
196
 
197
- ```bibtex
198
  @misc{insightface2018,
199
- author = {Guo, Jia and Deng, Jiankang and An, Xiang and Yu, Jack},
200
- title = {InsightFace: 2D and 3D Face Analysis Project},
201
- year = {2018--},
202
- howpublished = {\url{https://github.com/deepinsight/insightface}}
203
  }
204
  ```
205
 
206
- The training dataset (WebFace260M / WebFace600K) is from Tsinghua
207
- University:
208
-
209
- ```bibtex
210
  @inproceedings{zhu2021webface260m,
211
- title = {WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition},
212
- author = {Zhu, Zheng and Huang, Guan and Deng, Jiankang and Ye, Yun and Huang, Junjie and Chen, Xinze and Zhu, Jiagang and Yang, Tian and Lu, Jiwen and Du, Dalong and Zhou, Jie},
213
- booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
214
- pages = {10492--10502},
215
- year = {2021}
216
  }
217
  ```
218
 
219
- The IResNet (Improved ResNet) backbone follows the deep residual
220
- learning framework:
221
 
222
- ```bibtex
223
- @inproceedings{he2016deep,
224
- title = {Deep Residual Learning for Image Recognition},
225
- author = {He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
226
- booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
227
- pages = {770--778},
228
- year = {2016}
229
- }
230
- ```
231
 
232
- If you cite py-feat itself:
233
 
234
- ```bibtex
235
- @article{cheong2023pyfeat,
236
- title = {Py-Feat: Python Facial Expression Analysis Toolbox},
237
- author = {Cheong, Jin Hyun and Jolly, Eshin and Xie, Tiankang and Byrne, Sophie and Kenney, Matthew and Chang, Luke J.},
238
- journal = {Affective Science},
239
- year = {2023},
240
- publisher = {Springer},
241
- doi = {10.1007/s42761-023-00191-4}
242
- }
 
 
 
 
 
 
 
 
 
 
 
243
  ```
244
-
245
- ## Provenance summary
246
-
247
- | Layer | Authors / source | License |
248
- |---|---|---|
249
- | ArcFace loss | Deng, Guo, Xue, Zafeiriou (2019) | Method paper, freely usable |
250
- | IResNet architecture | He et al. (2016) + InsightFace's fused-BN export | Method paper / MIT (InsightFace code) |
251
- | Pretrained weights | InsightFace, trained on WebFace600K | Non-commercial research |
252
- | Training data (WebFace260M / 600K) | Zhu et al. / Tsinghua University (2021) | Non-commercial research |
253
- | ONNX → safetensors conversion | py-feat maintainers | MIT (the script) — but the weights it produces inherit the upstream license |
254
- | py-feat integration code | cosanlab / py-feat contributors | MIT |
255
-
256
- ## Contact
257
-
258
- For issues with the **conversion / py-feat integration**:
259
- [github.com/cosanlab/py-feat/issues](https://github.com/cosanlab/py-feat/issues).
260
-
261
- For issues with the **underlying model** (training, accuracy, behavior):
262
- [github.com/deepinsight/insightface/issues](https://github.com/deepinsight/insightface/issues).
 
1
  ---
2
+ tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
5
+ library_name: py-feat
6
+ pipeline_tag: image-feature-extraction
7
  license: other
8
  license_name: insightface-non-commercial-research
9
  license_link: https://github.com/deepinsight/insightface/blob/master/LICENSE
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
+ # ArcFace ResNet50
13
 
14
+ ## Model Description
15
+ ArcFace is a face-recognition model trained with an additive angular margin softmax loss that constrains identities to disjoint angular regions of the embedding sphere. The result is much tighter intra-identity clusters under pose and expression variation than prior triplet-loss embeddings (e.g., FaceNet) — exactly the regime FEAT video data hits. This py-feat distribution is the ResNet50 backbone (`w600k_r50.onnx`) from InsightFace's `buffalo_l` pack, trained on WebFace600K, repackaged as a PyTorch `safetensors` checkpoint. Available in py-feat ≥ 0.7 as the default `identity_model='arcface'` for both `Detector` and `MPDetector`. Empirically on `multi_face.jpg`, FaceNet's max off-diagonal cosine similarity between *different* people is 0.76 (false-merging at typical thresholds); ArcFace's is 0.35 (clean separation).
 
 
 
 
16
 
17
+ ## Model Details
18
+ - **Model Type**: Convolutional Neural Network (CNN)
19
+ - **Architecture**: IResNet-50 (Improved ResNet, fused-BN form matching InsightFace's ONNX export)
20
+ - **Input Size**: 112 x 112 pixels, RGB, pixel range `[0, 1]` (the wrapper rescales to `[-1, 1]`)
21
+ - **Output**: 512-dim face embedding (unnormalized; cosine similarity normalizes downstream)
22
+ - **Framework**: PyTorch
23
+ - **Training data**: WebFace600K — Tsinghua-curated subset of WebFace260M (Zhu et al., 2021)
24
+ - **Reported benchmarks**: LFW 99.83 %, IJB-C 96.18 % TAR @ FAR=1e-4 (InsightFace upstream)
25
 
26
+ ## Model Sources
27
+ - **Repository (training code, MIT)**: [deepinsight/insightface](https://github.com/deepinsight/insightface)
28
+ - **Distribution (`buffalo_l` pack, ONNX source)**: [InsightFace v0.7 release](https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip)
29
+ - **Paper (ArcFace)**: [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
30
+ - **Paper (WebFace260M)**: [WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition](https://arxiv.org/abs/2103.04098)
31
 
32
+ ## Citation
33
+ If you use this model in your research or application, please cite the ArcFace paper and the InsightFace project:
34
 
35
+ J. Deng, J. Guo, N. Xue, S. Zafeiriou. ArcFace: Additive Angular Margin Loss for Deep Face Recognition, CVPR, 2019, arXiv:1801.07698.
 
 
 
36
 
 
 
 
37
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  @inproceedings{deng2019arcface,
39
+ title={ArcFace: Additive Angular Margin Loss for Deep Face Recognition},
40
+ author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
41
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
42
+ pages={4690--4699},
43
+ year={2019}
44
  }
45
  ```
46
 
47
+ ```
48
  @misc{insightface2018,
49
+ author={Guo, Jia and Deng, Jiankang and An, Xiang and Yu, Jack},
50
+ title={InsightFace: 2D and 3D Face Analysis Project},
51
+ year={2018--},
52
+ howpublished={\url{https://github.com/deepinsight/insightface}}
53
  }
54
  ```
55
 
56
+ ```
 
 
 
57
  @inproceedings{zhu2021webface260m,
58
+ title={WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition},
59
+ author={Zhu, Zheng and Huang, Guan and Deng, Jiankang and Ye, Yun and Huang, Junjie and Chen, Xinze and Zhu, Jiagang and Yang, Tian and Lu, Jiwen and Du, Dalong and Zhou, Jie},
60
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
61
+ pages={10492--10502},
62
+ year={2021}
63
  }
64
  ```
65
 
66
+ ## Acknowledgements
67
+ We thank the InsightFace project (Guo, Deng et al.) for releasing the architecture and training scripts under the MIT license, and Tsinghua University (Zhu et al.) for the WebFace260M / WebFace600K dataset under non-commercial-research terms.
68
 
69
+ ## License Note
70
+ The InsightFace **code** is MIT-licensed. The **pretrained weights distributed in this repo** are released under InsightFace's non-commercial-research license, and the underlying training data (WebFace600K / WebFace260M) is also non-commercial-research only. Py-Feat's conversion code and integration are MIT-licensed, but the converted weights inherit the upstream restriction. Commercial users must independently validate license compatibility and may need to substitute a different identity model.
 
 
 
 
 
 
 
71
 
72
+ ## Example Useage
73
 
74
+ ```python
75
+ import torch
76
+ from huggingface_hub import hf_hub_download
77
+ from safetensors.torch import load_file
78
+ from feat.identity_detectors.arcface.arcface_model import ArcFace
79
+ from feat.utils.io import get_resource_path
80
+
81
+ device = 'cpu'
82
+ identity_detector = ArcFace(backbone='r50')
83
+ arcface_file = hf_hub_download(
84
+ repo_id="py-feat/arcface_r50",
85
+ filename="arcface_r50.safetensors",
86
+ cache_dir=get_resource_path(),
87
+ )
88
+ identity_detector.net.load_state_dict(load_file(arcface_file), strict=False)
89
+ identity_detector.eval()
90
+ identity_detector.to(device)
91
+
92
+ # Forward through a batch of [N, 3, H, W] face crops in [0, 1] range:
93
+ # embeddings = identity_detector(face_crops) # [N, 512] unnormalized
94
  ```