|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
This is a face recognition model, which extracts a facial feature vector from an aligned facial image. |
|
|
|
|
|
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** Martin Knoche |
|
|
- **Funded by [optional]:** Technical University of Munich |
|
|
- **Shared by [optional]:** Martin Knoche |
|
|
- **Model type:** Convolutional Neural Network |
|
|
- **License:** |
|
|
Original Work: |
|
|
|
|
|
MIT License |
|
|
|
|
|
Copyright (c) 2022 Zhong Yaoyao |
|
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining a copy |
|
|
of this software and associated documentation files (the "Software"), to deal |
|
|
in the Software without restriction, including without limitation the rights |
|
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
|
|
copies of the Software, and to permit persons to whom the Software is |
|
|
furnished to do so, subject to the following conditions: |
|
|
|
|
|
The above copyright notice and this permission notice shall be included in all |
|
|
copies or substantial portions of the Software. |
|
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
|
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
|
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
|
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
|
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
|
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
|
|
SOFTWARE. |
|
|
|
|
|
Changes in Code, Finetuning etc. are also under MIT License: |
|
|
|
|
|
MIT License |
|
|
|
|
|
Copyright (c) 2023 Martin Knoche |
|
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining a copy |
|
|
of this software and associated documentation files (the "Software"), to deal |
|
|
in the Software without restriction, including without limitation the rights |
|
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
|
|
copies of the Software, and to permit persons to whom the Software is |
|
|
furnished to do so, subject to the following conditions: |
|
|
|
|
|
The above copyright notice and this permission notice shall be included in all |
|
|
copies or substantial portions of the Software. |
|
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
|
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
|
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
|
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
|
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
|
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
|
|
SOFTWARE. |
|
|
|
|
|
- **Finetuned from model:** [FaceTransformer](https://github.com/zhongyy/Face-Transformer) by [zhongyy](https://github.com/zhongyy) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [GitHub](github.com/martlgap/octuplet-loss) |
|
|
- **Paper:** [IEEExplore](https://ieeexplore.ieee.org/document/10042669) |
|
|
|
|
|
## Uses |
|
|
|
|
|
Use the model to extract a facial feature vector from an arbitrary aligned facial image. You can then compare that vector to other facial feature vectors to decide for same or not same person. |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
The model can be used by within an ONNX-Runtime environment. |
|
|
|
|
|
```python |
|
|
model = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers()) |
|
|
embedding = model.run(None, {"input_image": input_image})[0][0] |
|
|
``` |
|
|
|
|
|
`input_image`-Variable |
|
|
|
|
|
- Dimensions: 112x112x3 |
|
|
- Channels: Should be in RGB format |
|
|
- Type: float |
|
|
- Values: Between 0 and 255 |
|
|
|
|
|
`embedding`-Variable |
|
|
|
|
|
- Dimension: 512 |
|
|
- Type: float |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
The model was originally trained and also finetuned on the [MS1M](https://exposing.ai/msceleb/) dataset. Thus please be check the MS1M dataset for bias and risks. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model: |
|
|
|
|
|
```python |
|
|
import numpy as np |
|
|
import onnxruntime as rt |
|
|
import mediapipe as mp |
|
|
import cv2 |
|
|
import os |
|
|
import time |
|
|
from skimage.transform import SimilarityTransform |
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------------------------------------------------- |
|
|
# INITIALIZATIONS |
|
|
|
|
|
# Target landmark coordinates for alignment (used in training) |
|
|
LANDMARKS_TARGET = np.array( |
|
|
[ |
|
|
[38.2946, 51.6963], |
|
|
[73.5318, 51.5014], |
|
|
[56.0252, 71.7366], |
|
|
[41.5493, 92.3655], |
|
|
[70.7299, 92.2041], |
|
|
], |
|
|
dtype=np.float32, |
|
|
) |
|
|
|
|
|
# Initialize Face Detector (For Example Mediapipe) |
|
|
FACE_DETECTOR = mp.solutions.face_mesh.FaceMesh( |
|
|
refine_landmarks=True, min_detection_confidence=0.5, min_tracking_confidence=0.5, max_num_faces=1 |
|
|
) |
|
|
|
|
|
# Initialize the Face Recognition Model (FaceTransformerOctupletLoss) |
|
|
FACE_RECOGNIZER = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers()) |
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------------------------------------------------- |
|
|
# FACE CAPTURE |
|
|
|
|
|
# Capture a frame with your Webcam and store it on disk |
|
|
if not os.path.exists("img.jpg"): |
|
|
cap = cv2.VideoCapture(1) # open webcam |
|
|
time.sleep(2) # wait for camera to warm up |
|
|
|
|
|
if not cap.isOpened(): |
|
|
raise IOError("Cannot open webcam") |
|
|
|
|
|
ret, img = cap.read() # capture a frame |
|
|
if ret: |
|
|
cv2.imwrite("img.jpg", img) # save the frame |
|
|
else: |
|
|
img = cv2.imread("img.jpg") # read the frame from disk |
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------------------------------------------------- |
|
|
# FACE DETECTION |
|
|
|
|
|
# Process the image with the face detector |
|
|
result = FACE_DETECTOR.process(img) |
|
|
|
|
|
if result.multi_face_landmarks: |
|
|
# Select 5 Landmarks (Eye Centers, Nose Tip, Left Mouth Corner, Right Mouth Corner) |
|
|
five_landmarks = np.asarray(result.multi_face_landmarks[0].landmark)[[470, 475, 1, 57, 287]] |
|
|
|
|
|
# Extract the x and y coordinates of the landmarks of interest |
|
|
landmarks = np.asarray( |
|
|
[[landmark.x * img.shape[1], landmark.y * img.shape[0]] for landmark in five_landmarks] |
|
|
) |
|
|
|
|
|
# Extract the x and y coordinates of all landmarks |
|
|
all_x_coords = [landmark.x * img.shape[1] for landmark in result.multi_face_landmarks[0].landmark] |
|
|
all_y_coords = [landmark.y * img.shape[0] for landmark in result.multi_face_landmarks[0].landmark] |
|
|
|
|
|
# Compute the bounding box of the face |
|
|
x_min, x_max = int(min(all_x_coords)), int(max(all_x_coords)) |
|
|
y_min, y_max = int(min(all_y_coords)), int(max(all_y_coords)) |
|
|
bbox = [[x_min, y_min], [x_max, y_max]] |
|
|
|
|
|
else: |
|
|
print("No faces detected") |
|
|
exit() |
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------------------------------------------------- |
|
|
# FACE ALIGNMENT |
|
|
|
|
|
# Align Image with the 5 Landmarks |
|
|
tform = SimilarityTransform() |
|
|
tform.estimate(landmarks, LANDMARKS_TARGET) |
|
|
tmatrix = tform.params[0:2, :] |
|
|
img_aligned = cv2.warpAffine(img, tmatrix, (112, 112), borderValue=0.0) |
|
|
|
|
|
# safe to disk |
|
|
cv2.imwrite("img2_aligned.jpg", img_aligned) |
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------------------------------------------------- |
|
|
# FACE RECOGNITION |
|
|
|
|
|
# Inference face embeddings with onnxruntime |
|
|
input_image = (np.asarray([img_aligned]).astype(np.float32)).clip(0.0, 255.0).transpose(0, 3, 1, 2) |
|
|
embedding = FACE_RECOGNIZER.run(None, {"input_image": input_image})[0][0] |
|
|
|
|
|
print("Embedding:", embedding) |
|
|
|
|
|
# If you have embeddings for several facial images - you can then compute the cosine distance between them and distinguish |
|
|
# between different or same people based on a threshold. For example, if the cosine distance is less than 0.5, then the |
|
|
# two images are of the same person, otherwise they are of different people. The lower the cosine distance, the more similar |
|
|
# the two images are. The cosine distance is a value between 0 and 2, where 0 means the two images are identical and 2 means |
|
|
# the two images are completely different. |
|
|
|
|
|
# --------------------------------------------------------------------------------------------------------------------- |
|
|
# VISUALIZATION |
|
|
|
|
|
# Draw Boundingbox on a copy of image |
|
|
img_draw = img.copy() |
|
|
cv2.rectangle(img_draw, (bbox[0][0], bbox[0][1]), (bbox[1][0], bbox[1][1]), (255, 0, 0), 2) |
|
|
|
|
|
# Show the detected face on the image |
|
|
cv2.imshow("img", img_draw) |
|
|
cv2.waitKey(0) |
|
|
|
|
|
# Show the aligned image |
|
|
cv2.imshow("img", img_aligned) |
|
|
cv2.waitKey(0) |
|
|
``` |
|
|
|
|
|
See also main.py to start off with the model. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
- [LFW](http://vis-www.cs.umass.edu/lfw/) |
|
|
- [CALFW](http://whdeng.cn/CALFW/) |
|
|
- [CPLFW](http://whdeng.cn/CPLFW/) |
|
|
- [MLFW](http://whdeng.cn/mlfw/) |
|
|
- [XQLFW](https://martlgap.github.io/xqlfw/) |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
Accuracy [%] |
|
|
|
|
|
### Results |
|
|
|
|
|
| [LFW](http://vis-www.cs.umass.edu/lfw/) | [CALFW](http://whdeng.cn/CALFW/) | [CPLFW](http://whdeng.cn/CPLFW/) | [MLFW](http://whdeng.cn/mlfw/) | [XQLFW](https://martlgap.github.io/xqlfw/) | |
|
|
|---|---|---|---|---| |
|
|
| 99.73 | 94.93 | 91.58 | 85.63 | 95.12 | |
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
~~~tex |
|
|
@inproceedings{knoche2023octuplet, |
|
|
title={Octuplet loss: Make face recognition robust to image resolution}, |
|
|
author={Knoche, Martin and Elkadeem, Mohamed and H{\"o}rmann, Stefan and Rigoll, Gerhard}, |
|
|
booktitle={2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)}, |
|
|
pages={1--8}, |
|
|
year={2023}, |
|
|
organization={IEEE} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## Model Card Author |
|
|
|
|
|
Martin Knoche |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
Martin.Knoche@tum.de |