File size: 9,656 Bytes
e6febf8 c01f9a3 e6febf8 c01f9a3 8cc31fa c01f9a3 2b8235b 74b81a1 c01f9a3 74b81a1 c01f9a3 2b8235b c01f9a3 8cc31fa c01f9a3 8cc31fa c01f9a3 8cc31fa c01f9a3 8cc31fa c764ae5 c01f9a3 8cc31fa c764ae5 c01f9a3 74b81a1 c01f9a3 aa0a7fd c01f9a3 aa0a7fd c01f9a3 74b81a1 c01f9a3 74b81a1 c01f9a3 74b81a1 c01f9a3 2b8235b c01f9a3 d148abf 2b8235b d148abf 2b8235b c01f9a3 2b8235b c01f9a3 2b8235b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
---
license: mit
language:
- en
---
# Model Card for Model ID
This is a face recognition model, which extracts a facial feature vector from an aligned facial image.
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
## Model Details
### Model Description
- **Developed by:** Martin Knoche
- **Funded by [optional]:** Technical University of Munich
- **Shared by [optional]:** Martin Knoche
- **Model type:** Convolutional Neural Network
- **License:**
Original Work:
MIT License
Copyright (c) 2022 Zhong Yaoyao
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Changes in Code, Finetuning etc. are also under MIT License:
MIT License
Copyright (c) 2023 Martin Knoche
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
- **Finetuned from model:** [FaceTransformer](https://github.com/zhongyy/Face-Transformer) by [zhongyy](https://github.com/zhongyy)
### Model Sources
- **Repository:** [GitHub](github.com/martlgap/octuplet-loss)
- **Paper:** [IEEExplore](https://ieeexplore.ieee.org/document/10042669)
## Uses
Use the model to extract a facial feature vector from an arbitrary aligned facial image. You can then compare that vector to other facial feature vectors to decide for same or not same person.
### Direct Use
The model can be used by within an ONNX-Runtime environment.
```python
model = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())
embedding = model.run(None, {"input_image": input_image})[0][0]
```
`input_image`-Variable
- Dimensions: 112x112x3
- Channels: Should be in RGB format
- Type: float
- Values: Between 0 and 255
`embedding`-Variable
- Dimension: 512
- Type: float
## Bias, Risks, and Limitations
The model was originally trained and also finetuned on the [MS1M](https://exposing.ai/msceleb/) dataset. Thus please be check the MS1M dataset for bias and risks.
## How to Get Started with the Model
Use the code below to get started with the model:
```python
import numpy as np
import onnxruntime as rt
import mediapipe as mp
import cv2
import os
import time
from skimage.transform import SimilarityTransform
# ---------------------------------------------------------------------------------------------------------------------
# INITIALIZATIONS
# Target landmark coordinates for alignment (used in training)
LANDMARKS_TARGET = np.array(
[
[38.2946, 51.6963],
[73.5318, 51.5014],
[56.0252, 71.7366],
[41.5493, 92.3655],
[70.7299, 92.2041],
],
dtype=np.float32,
)
# Initialize Face Detector (For Example Mediapipe)
FACE_DETECTOR = mp.solutions.face_mesh.FaceMesh(
refine_landmarks=True, min_detection_confidence=0.5, min_tracking_confidence=0.5, max_num_faces=1
)
# Initialize the Face Recognition Model (FaceTransformerOctupletLoss)
FACE_RECOGNIZER = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())
# ---------------------------------------------------------------------------------------------------------------------
# FACE CAPTURE
# Capture a frame with your Webcam and store it on disk
if not os.path.exists("img.jpg"):
cap = cv2.VideoCapture(1) # open webcam
time.sleep(2) # wait for camera to warm up
if not cap.isOpened():
raise IOError("Cannot open webcam")
ret, img = cap.read() # capture a frame
if ret:
cv2.imwrite("img.jpg", img) # save the frame
else:
img = cv2.imread("img.jpg") # read the frame from disk
# ---------------------------------------------------------------------------------------------------------------------
# FACE DETECTION
# Process the image with the face detector
result = FACE_DETECTOR.process(img)
if result.multi_face_landmarks:
# Select 5 Landmarks (Eye Centers, Nose Tip, Left Mouth Corner, Right Mouth Corner)
five_landmarks = np.asarray(result.multi_face_landmarks[0].landmark)[[470, 475, 1, 57, 287]]
# Extract the x and y coordinates of the landmarks of interest
landmarks = np.asarray(
[[landmark.x * img.shape[1], landmark.y * img.shape[0]] for landmark in five_landmarks]
)
# Extract the x and y coordinates of all landmarks
all_x_coords = [landmark.x * img.shape[1] for landmark in result.multi_face_landmarks[0].landmark]
all_y_coords = [landmark.y * img.shape[0] for landmark in result.multi_face_landmarks[0].landmark]
# Compute the bounding box of the face
x_min, x_max = int(min(all_x_coords)), int(max(all_x_coords))
y_min, y_max = int(min(all_y_coords)), int(max(all_y_coords))
bbox = [[x_min, y_min], [x_max, y_max]]
else:
print("No faces detected")
exit()
# ---------------------------------------------------------------------------------------------------------------------
# FACE ALIGNMENT
# Align Image with the 5 Landmarks
tform = SimilarityTransform()
tform.estimate(landmarks, LANDMARKS_TARGET)
tmatrix = tform.params[0:2, :]
img_aligned = cv2.warpAffine(img, tmatrix, (112, 112), borderValue=0.0)
# safe to disk
cv2.imwrite("img2_aligned.jpg", img_aligned)
# ---------------------------------------------------------------------------------------------------------------------
# FACE RECOGNITION
# Inference face embeddings with onnxruntime
input_image = (np.asarray([img_aligned]).astype(np.float32)).clip(0.0, 255.0).transpose(0, 3, 1, 2)
embedding = FACE_RECOGNIZER.run(None, {"input_image": input_image})[0][0]
print("Embedding:", embedding)
# If you have embeddings for several facial images - you can then compute the cosine distance between them and distinguish
# between different or same people based on a threshold. For example, if the cosine distance is less than 0.5, then the
# two images are of the same person, otherwise they are of different people. The lower the cosine distance, the more similar
# the two images are. The cosine distance is a value between 0 and 2, where 0 means the two images are identical and 2 means
# the two images are completely different.
# ---------------------------------------------------------------------------------------------------------------------
# VISUALIZATION
# Draw Boundingbox on a copy of image
img_draw = img.copy()
cv2.rectangle(img_draw, (bbox[0][0], bbox[0][1]), (bbox[1][0], bbox[1][1]), (255, 0, 0), 2)
# Show the detected face on the image
cv2.imshow("img", img_draw)
cv2.waitKey(0)
# Show the aligned image
cv2.imshow("img", img_aligned)
cv2.waitKey(0)
```
See also main.py to start off with the model.
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
- [LFW](http://vis-www.cs.umass.edu/lfw/)
- [CALFW](http://whdeng.cn/CALFW/)
- [CPLFW](http://whdeng.cn/CPLFW/)
- [MLFW](http://whdeng.cn/mlfw/)
- [XQLFW](https://martlgap.github.io/xqlfw/)
#### Metrics
Accuracy [%]
### Results
| [LFW](http://vis-www.cs.umass.edu/lfw/) | [CALFW](http://whdeng.cn/CALFW/) | [CPLFW](http://whdeng.cn/CPLFW/) | [MLFW](http://whdeng.cn/mlfw/) | [XQLFW](https://martlgap.github.io/xqlfw/) |
|---|---|---|---|---|
| 99.73 | 94.93 | 91.58 | 85.63 | 95.12 |
## Citation
**BibTeX:**
~~~tex
@inproceedings{knoche2023octuplet,
title={Octuplet loss: Make face recognition robust to image resolution},
author={Knoche, Martin and Elkadeem, Mohamed and H{\"o}rmann, Stefan and Rigoll, Gerhard},
booktitle={2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)},
pages={1--8},
year={2023},
organization={IEEE}
}
~~~
## Model Card Author
Martin Knoche
## Model Card Contact
Martin.Knoche@tum.de |