File size: 9,656 Bytes
e6febf8
 
c01f9a3
 
e6febf8
c01f9a3
 
 
8cc31fa
c01f9a3
 
 
 
 
 
 
2b8235b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74b81a1
c01f9a3
74b81a1
c01f9a3
2b8235b
 
c01f9a3
 
 
8cc31fa
c01f9a3
 
 
8cc31fa
c01f9a3
8cc31fa
 
 
 
c01f9a3
8cc31fa
c764ae5
 
 
 
 
c01f9a3
8cc31fa
c764ae5
 
 
c01f9a3
 
 
74b81a1
c01f9a3
 
 
aa0a7fd
c01f9a3
aa0a7fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c01f9a3
 
 
 
 
 
 
74b81a1
 
 
 
 
c01f9a3
 
 
74b81a1
c01f9a3
 
 
74b81a1
 
 
c01f9a3
2b8235b
c01f9a3
 
 
d148abf
2b8235b
 
 
 
 
 
 
 
d148abf
 
2b8235b
c01f9a3
2b8235b
c01f9a3
 
 
2b8235b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
license: mit
language:
- en
---

# Model Card for Model ID

This is a face recognition model, which extracts a facial feature vector from an aligned facial image.

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

- **Developed by:** Martin Knoche
- **Funded by [optional]:** Technical University of Munich
- **Shared by [optional]:** Martin Knoche
- **Model type:** Convolutional Neural Network
- **License:** 
Original Work:

MIT License

Copyright (c) 2022 Zhong Yaoyao

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Changes in Code, Finetuning etc. are also under MIT License:

MIT License

Copyright (c) 2023 Martin Knoche

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

- **Finetuned from model:** [FaceTransformer](https://github.com/zhongyy/Face-Transformer) by [zhongyy](https://github.com/zhongyy)

### Model Sources

- **Repository:** [GitHub](github.com/martlgap/octuplet-loss)
- **Paper:** [IEEExplore](https://ieeexplore.ieee.org/document/10042669)

## Uses

Use the model to extract a facial feature vector from an arbitrary aligned facial image. You can then compare that vector to other facial feature vectors to decide for same or not same person. 

### Direct Use

The model can be used by within an ONNX-Runtime environment. 

```python
model = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())
embedding = model.run(None, {"input_image": input_image})[0][0]
```

`input_image`-Variable

- Dimensions: 112x112x3
- Channels: Should be in RGB format
- Type: float
- Values: Between 0 and 255

`embedding`-Variable

- Dimension: 512
- Type: float

## Bias, Risks, and Limitations

The model was originally trained and also finetuned on the [MS1M](https://exposing.ai/msceleb/) dataset. Thus please be check the MS1M dataset for bias and risks.

## How to Get Started with the Model

Use the code below to get started with the model: 

```python
import numpy as np
import onnxruntime as rt
import mediapipe as mp
import cv2
import os
import time
from skimage.transform import SimilarityTransform


# ---------------------------------------------------------------------------------------------------------------------
# INITIALIZATIONS

# Target landmark coordinates for alignment (used in training)
LANDMARKS_TARGET = np.array(
    [
        [38.2946, 51.6963],
        [73.5318, 51.5014],
        [56.0252, 71.7366],
        [41.5493, 92.3655],
        [70.7299, 92.2041],
    ],
    dtype=np.float32,
)

# Initialize Face Detector (For Example Mediapipe)
FACE_DETECTOR = mp.solutions.face_mesh.FaceMesh(
    refine_landmarks=True, min_detection_confidence=0.5, min_tracking_confidence=0.5, max_num_faces=1
)

# Initialize the Face Recognition Model (FaceTransformerOctupletLoss)
FACE_RECOGNIZER = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())


# ---------------------------------------------------------------------------------------------------------------------
# FACE CAPTURE

# Capture a frame with your Webcam and store it on disk
if not os.path.exists("img.jpg"):
    cap = cv2.VideoCapture(1) # open webcam
    time.sleep(2) # wait for camera to warm up
    
    if not cap.isOpened():
        raise IOError("Cannot open webcam")
    
    ret, img = cap.read() # capture a frame
    if ret:
        cv2.imwrite("img.jpg", img) # save the frame
else:
    img = cv2.imread("img.jpg") # read the frame from disk


# ---------------------------------------------------------------------------------------------------------------------
# FACE DETECTION

# Process the image with the face detector
result = FACE_DETECTOR.process(img)

if result.multi_face_landmarks:
    # Select 5 Landmarks (Eye Centers, Nose Tip, Left Mouth Corner, Right Mouth Corner)
    five_landmarks = np.asarray(result.multi_face_landmarks[0].landmark)[[470, 475, 1, 57, 287]]

    # Extract the x and y coordinates of the landmarks of interest
    landmarks = np.asarray(
        [[landmark.x * img.shape[1], landmark.y * img.shape[0]] for landmark in five_landmarks]
    )

    # Extract the x and y coordinates of all landmarks
    all_x_coords = [landmark.x * img.shape[1] for landmark in result.multi_face_landmarks[0].landmark]
    all_y_coords = [landmark.y * img.shape[0] for landmark in result.multi_face_landmarks[0].landmark]

    # Compute the bounding box of the face
    x_min, x_max = int(min(all_x_coords)), int(max(all_x_coords))
    y_min, y_max = int(min(all_y_coords)), int(max(all_y_coords))
    bbox = [[x_min, y_min], [x_max, y_max]]

else:
    print("No faces detected")
    exit()


# ---------------------------------------------------------------------------------------------------------------------
# FACE ALIGNMENT

# Align Image with the 5 Landmarks
tform = SimilarityTransform()
tform.estimate(landmarks, LANDMARKS_TARGET)
tmatrix = tform.params[0:2, :]
img_aligned = cv2.warpAffine(img, tmatrix, (112, 112), borderValue=0.0)

# safe to disk
cv2.imwrite("img2_aligned.jpg", img_aligned)


# ---------------------------------------------------------------------------------------------------------------------
# FACE RECOGNITION

# Inference face embeddings with onnxruntime
input_image = (np.asarray([img_aligned]).astype(np.float32)).clip(0.0, 255.0).transpose(0, 3, 1, 2)
embedding = FACE_RECOGNIZER.run(None, {"input_image": input_image})[0][0]

print("Embedding:", embedding)

# If you have embeddings for several facial images - you can then compute the cosine distance between them and distinguish
# between different or same people based on a threshold. For example, if the cosine distance is less than 0.5, then the
# two images are of the same person, otherwise they are of different people. The lower the cosine distance, the more similar
# the two images are. The cosine distance is a value between 0 and 2, where 0 means the two images are identical and 2 means 
# the two images are completely different. 

# ---------------------------------------------------------------------------------------------------------------------
# VISUALIZATION

# Draw Boundingbox on a copy of image
img_draw = img.copy()
cv2.rectangle(img_draw, (bbox[0][0], bbox[0][1]), (bbox[1][0], bbox[1][1]), (255, 0, 0), 2)

# Show the detected face on the image
cv2.imshow("img", img_draw)
cv2.waitKey(0)

# Show the aligned image
cv2.imshow("img", img_aligned)
cv2.waitKey(0)
```

See also main.py to start off with the model. 

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

- [LFW](http://vis-www.cs.umass.edu/lfw/)
- [CALFW](http://whdeng.cn/CALFW/)
- [CPLFW](http://whdeng.cn/CPLFW/)
- [MLFW](http://whdeng.cn/mlfw/)
- [XQLFW](https://martlgap.github.io/xqlfw/)

#### Metrics

Accuracy [%]

### Results

| [LFW](http://vis-www.cs.umass.edu/lfw/) | [CALFW](http://whdeng.cn/CALFW/) | [CPLFW](http://whdeng.cn/CPLFW/) | [MLFW](http://whdeng.cn/mlfw/) | [XQLFW](https://martlgap.github.io/xqlfw/) |
|---|---|---|---|---|
| 99.73 | 94.93 | 91.58 | 85.63 | 95.12 | 

## Citation

**BibTeX:**

~~~tex
@inproceedings{knoche2023octuplet,
  title={Octuplet loss: Make face recognition robust to image resolution},
  author={Knoche, Martin and Elkadeem, Mohamed and H{\"o}rmann, Stefan and Rigoll, Gerhard},
  booktitle={2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}
~~~

## Model Card Author

Martin Knoche

## Model Card Contact

Martin.Knoche@tum.de