File size: 9,656 Bytes

---
license: mit
language:
- en
---

# Model Card for Model ID

This is a face recognition model, which extracts a facial feature vector from an aligned facial image.

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

- **Developed by:** Martin Knoche
- **Funded by [optional]:** Technical University of Munich
- **Shared by [optional]:** Martin Knoche
- **Model type:** Convolutional Neural Network
- **License:** 
Original Work:

MIT License

Copyright (c) 2022 Zhong Yaoyao

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Changes in Code, Finetuning etc. are also under MIT License:

MIT License

Copyright (c) 2023 Martin Knoche

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

- **Finetuned from model:** [FaceTransformer](https://github.com/zhongyy/Face-Transformer) by [zhongyy](https://github.com/zhongyy)

### Model Sources

- **Repository:** [GitHub](github.com/martlgap/octuplet-loss)
- **Paper:** [IEEExplore](https://ieeexplore.ieee.org/document/10042669)

## Uses

Use the model to extract a facial feature vector from an arbitrary aligned facial image. You can then compare that vector to other facial feature vectors to decide for same or not same person. 

### Direct Use

The model can be used by within an ONNX-Runtime environment. 

```python
model = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())
embedding = model.run(None, {"input_image": input_image})[0][0]
```

`input_image`-Variable

- Dimensions: 112x112x3
- Channels: Should be in RGB format
- Type: float
- Values: Between 0 and 255

`embedding`-Variable

- Dimension: 512
- Type: float

## Bias, Risks, and Limitations

The model was originally trained and also finetuned on the [MS1M](https://exposing.ai/msceleb/) dataset. Thus please be check the MS1M dataset for bias and risks.

## How to Get Started with the Model

Use the code below to get started with the model: 

```python
import numpy as np
import onnxruntime as rt
import mediapipe as mp
import cv2
import os
import time
from skimage.transform import SimilarityTransform


# ---------------------------------------------------------------------------------------------------------------------
# INITIALIZATIONS

# Target landmark coordinates for alignment (used in training)
LANDMARKS_TARGET = np.array(
    [
        [38.2946, 51.6963],
        [73.5318, 51.5014],
        [56.0252, 71.7366],
        [41.5493, 92.3655],
        [70.7299, 92.2041],
    ],
    dtype=np.float32,
)

# Initialize Face Detector (For Example Mediapipe)
FACE_DETECTOR = mp.solutions.face_mesh.FaceMesh(
    refine_landmarks=True, min_detection_confidence=0.5, min_tracking_confidence=0.5, max_num_faces=1
)

# Initialize the Face Recognition Model (FaceTransformerOctupletLoss)
FACE_RECOGNIZER = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())


# ---------------------------------------------------------------------------------------------------------------------
# FACE CAPTURE

# Capture a frame with your Webcam and store it on disk
if not os.path.exists("img.jpg"):
    cap = cv2.VideoCapture(1) # open webcam
    time.sleep(2) # wait for camera to warm up
    
    if not cap.isOpened():
        raise IOError("Cannot open webcam")
    
    ret, img = cap.read() # capture a frame
    if ret:
        cv2.imwrite("img.jpg", img) # save the frame
else:
    img = cv2.imread("img.jpg") # read the frame from disk


# ---------------------------------------------------------------------------------------------------------------------
# FACE DETECTION

# Process the image with the face detector
result = FACE_DETECTOR.process(img)

if result.multi_face_landmarks:
    # Select 5 Landmarks (Eye Centers, Nose Tip, Left Mouth Corner, Right Mouth Corner)
    five_landmarks = np.asarray(result.multi_face_landmarks[0].landmark)[[470, 475, 1, 57, 287]]

    # Extract the x and y coordinates of the landmarks of interest
    landmarks = np.asarray(
        [[landmark.x * img.shape[1], landmark.y * img.shape[0]] for landmark in five_landmarks]
    )

    # Extract the x and y coordinates of all landmarks
    all_x_coords = [landmark.x * img.shape[1] for landmark in result.multi_face_landmarks[0].landmark]
    all_y_coords = [landmark.y * img.shape[0] for landmark in result.multi_face_landmarks[0].landmark]

    # Compute the bounding box of the face
    x_min, x_max = int(min(all_x_coords)), int(max(all_x_coords))
    y_min, y_max = int(min(all_y_coords)), int(max(all_y_coords))
    bbox = [[x_min, y_min], [x_max, y_max]]

else:
    print("No faces detected")
    exit()


# ---------------------------------------------------------------------------------------------------------------------
# FACE ALIGNMENT

# Align Image with the 5 Landmarks
tform = SimilarityTransform()
tform.estimate(landmarks, LANDMARKS_TARGET)
tmatrix = tform.params[0:2, :]
img_aligned = cv2.warpAffine(img, tmatrix, (112, 112), borderValue=0.0)

# safe to disk
cv2.imwrite("img2_aligned.jpg", img_aligned)


# ---------------------------------------------------------------------------------------------------------------------
# FACE RECOGNITION

# Inference face embeddings with onnxruntime
input_image = (np.asarray([img_aligned]).astype(np.float32)).clip(0.0, 255.0).transpose(0, 3, 1, 2)
embedding = FACE_RECOGNIZER.run(None, {"input_image": input_image})[0][0]

print("Embedding:", embedding)

# If you have embeddings for several facial images - you can then compute the cosine distance between them and distinguish
# between different or same people based on a threshold. For example, if the cosine distance is less than 0.5, then the
# two images are of the same person, otherwise they are of different people. The lower the cosine distance, the more similar
# the two images are. The cosine distance is a value between 0 and 2, where 0 means the two images are identical and 2 means 
# the two images are completely different. 

# ---------------------------------------------------------------------------------------------------------------------
# VISUALIZATION

# Draw Boundingbox on a copy of image
img_draw = img.copy()
cv2.rectangle(img_draw, (bbox[0][0], bbox[0][1]), (bbox[1][0], bbox[1][1]), (255, 0, 0), 2)

# Show the detected face on the image
cv2.imshow("img", img_draw)
cv2.waitKey(0)

# Show the aligned image
cv2.imshow("img", img_aligned)
cv2.waitKey(0)
```

See also main.py to start off with the model. 

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

- [LFW](http://vis-www.cs.umass.edu/lfw/)
- [CALFW](http://whdeng.cn/CALFW/)
- [CPLFW](http://whdeng.cn/CPLFW/)
- [MLFW](http://whdeng.cn/mlfw/)
- [XQLFW](https://martlgap.github.io/xqlfw/)

#### Metrics

Accuracy [%]

### Results

| [LFW](http://vis-www.cs.umass.edu/lfw/) | [CALFW](http://whdeng.cn/CALFW/) | [CPLFW](http://whdeng.cn/CPLFW/) | [MLFW](http://whdeng.cn/mlfw/) | [XQLFW](https://martlgap.github.io/xqlfw/) |
|---|---|---|---|---|
| 99.73 | 94.93 | 91.58 | 85.63 | 95.12 | 

## Citation

**BibTeX:**

~~~tex
@inproceedings{knoche2023octuplet,
  title={Octuplet loss: Make face recognition robust to image resolution},
  author={Knoche, Martin and Elkadeem, Mohamed and H{\"o}rmann, Stefan and Rigoll, Gerhard},
  booktitle={2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}
~~~

## Model Card Author

Martin Knoche

## Model Card Contact

Martin.Knoche@tum.de