Update README.md

c764ae5 about 2 years ago

9.66 kB

	---
	license: mit
	language:
	- en
	---

	# Model Card for Model ID

	This is a face recognition model, which extracts a facial feature vector from an aligned facial image.

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details

	### Model Description

	- Developed by: Martin Knoche
	- Funded by [optional]: Technical University of Munich
	- Shared by [optional]: Martin Knoche
	- Model type: Convolutional Neural Network
	- License:
	Original Work:

	MIT License

	Copyright (c) 2022 Zhong Yaoyao

	Permission is hereby granted, free of charge, to any person obtaining a copy
	of this software and associated documentation files (the "Software"), to deal
	in the Software without restriction, including without limitation the rights
	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	copies of the Software, and to permit persons to whom the Software is
	furnished to do so, subject to the following conditions:

	The above copyright notice and this permission notice shall be included in all
	copies or substantial portions of the Software.

	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
	SOFTWARE.

	Changes in Code, Finetuning etc. are also under MIT License:

	MIT License

	Copyright (c) 2023 Martin Knoche

	Permission is hereby granted, free of charge, to any person obtaining a copy
	of this software and associated documentation files (the "Software"), to deal
	in the Software without restriction, including without limitation the rights
	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	copies of the Software, and to permit persons to whom the Software is
	furnished to do so, subject to the following conditions:

	The above copyright notice and this permission notice shall be included in all
	copies or substantial portions of the Software.

	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
	SOFTWARE.

	- Finetuned from model: [FaceTransformer](https://github.com/zhongyy/Face-Transformer) by [zhongyy](https://github.com/zhongyy)

	### Model Sources

	- Repository: [GitHub](github.com/martlgap/octuplet-loss)
	- Paper: [IEEExplore](https://ieeexplore.ieee.org/document/10042669)

	## Uses

	Use the model to extract a facial feature vector from an arbitrary aligned facial image. You can then compare that vector to other facial feature vectors to decide for same or not same person.

	### Direct Use

	The model can be used by within an ONNX-Runtime environment.

	```python
	model = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())
	embedding = model.run(None, {"input_image": input_image})[0][0]
	```

	`input_image`-Variable

	- Dimensions: 112x112x3
	- Channels: Should be in RGB format
	- Type: float
	- Values: Between 0 and 255

	`embedding`-Variable

	- Dimension: 512
	- Type: float

	## Bias, Risks, and Limitations

	The model was originally trained and also finetuned on the [MS1M](https://exposing.ai/msceleb/) dataset. Thus please be check the MS1M dataset for bias and risks.

	## How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	import numpy as np
	import onnxruntime as rt
	import mediapipe as mp
	import cv2
	import os
	import time
	from skimage.transform import SimilarityTransform


	# ---------------------------------------------------------------------------------------------------------------------
	# INITIALIZATIONS

	# Target landmark coordinates for alignment (used in training)
	LANDMARKS_TARGET = np.array(
	[
	[38.2946, 51.6963],
	[73.5318, 51.5014],
	[56.0252, 71.7366],
	[41.5493, 92.3655],
	[70.7299, 92.2041],
	],
	dtype=np.float32,
	)

	# Initialize Face Detector (For Example Mediapipe)
	FACE_DETECTOR = mp.solutions.face_mesh.FaceMesh(
	refine_landmarks=True, min_detection_confidence=0.5, min_tracking_confidence=0.5, max_num_faces=1
	)

	# Initialize the Face Recognition Model (FaceTransformerOctupletLoss)
	FACE_RECOGNIZER = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())


	# ---------------------------------------------------------------------------------------------------------------------
	# FACE CAPTURE

	# Capture a frame with your Webcam and store it on disk
	if not os.path.exists("img.jpg"):
	cap = cv2.VideoCapture(1) # open webcam
	time.sleep(2) # wait for camera to warm up

	if not cap.isOpened():
	raise IOError("Cannot open webcam")

	ret, img = cap.read() # capture a frame
	if ret:
	cv2.imwrite("img.jpg", img) # save the frame
	else:
	img = cv2.imread("img.jpg") # read the frame from disk


	# ---------------------------------------------------------------------------------------------------------------------
	# FACE DETECTION

	# Process the image with the face detector
	result = FACE_DETECTOR.process(img)

	if result.multi_face_landmarks:
	# Select 5 Landmarks (Eye Centers, Nose Tip, Left Mouth Corner, Right Mouth Corner)
	five_landmarks = np.asarray(result.multi_face_landmarks[0].landmark)[[470, 475, 1, 57, 287]]

	# Extract the x and y coordinates of the landmarks of interest
	landmarks = np.asarray(
	[[landmark.x * img.shape[1], landmark.y * img.shape[0]] for landmark in five_landmarks]
	)

	# Extract the x and y coordinates of all landmarks
	all_x_coords = [landmark.x * img.shape[1] for landmark in result.multi_face_landmarks[0].landmark]
	all_y_coords = [landmark.y * img.shape[0] for landmark in result.multi_face_landmarks[0].landmark]

	# Compute the bounding box of the face
	x_min, x_max = int(min(all_x_coords)), int(max(all_x_coords))
	y_min, y_max = int(min(all_y_coords)), int(max(all_y_coords))
	bbox = [[x_min, y_min], [x_max, y_max]]

	else:
	print("No faces detected")
	exit()


	# ---------------------------------------------------------------------------------------------------------------------
	# FACE ALIGNMENT

	# Align Image with the 5 Landmarks
	tform = SimilarityTransform()
	tform.estimate(landmarks, LANDMARKS_TARGET)
	tmatrix = tform.params[0:2, :]
	img_aligned = cv2.warpAffine(img, tmatrix, (112, 112), borderValue=0.0)

	# safe to disk
	cv2.imwrite("img2_aligned.jpg", img_aligned)


	# ---------------------------------------------------------------------------------------------------------------------
	# FACE RECOGNITION

	# Inference face embeddings with onnxruntime
	input_image = (np.asarray([img_aligned]).astype(np.float32)).clip(0.0, 255.0).transpose(0, 3, 1, 2)
	embedding = FACE_RECOGNIZER.run(None, {"input_image": input_image})[0][0]

	print("Embedding:", embedding)

	# If you have embeddings for several facial images - you can then compute the cosine distance between them and distinguish
	# between different or same people based on a threshold. For example, if the cosine distance is less than 0.5, then the
	# two images are of the same person, otherwise they are of different people. The lower the cosine distance, the more similar
	# the two images are. The cosine distance is a value between 0 and 2, where 0 means the two images are identical and 2 means
	# the two images are completely different.

	# ---------------------------------------------------------------------------------------------------------------------
	# VISUALIZATION

	# Draw Boundingbox on a copy of image
	img_draw = img.copy()
	cv2.rectangle(img_draw, (bbox[0][0], bbox[0][1]), (bbox[1][0], bbox[1][1]), (255, 0, 0), 2)

	# Show the detected face on the image
	cv2.imshow("img", img_draw)
	cv2.waitKey(0)

	# Show the aligned image
	cv2.imshow("img", img_aligned)
	cv2.waitKey(0)
	```

	See also main.py to start off with the model.

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	- [LFW](http://vis-www.cs.umass.edu/lfw/)
	- [CALFW](http://whdeng.cn/CALFW/)
	- [CPLFW](http://whdeng.cn/CPLFW/)
	- [MLFW](http://whdeng.cn/mlfw/)
	- [XQLFW](https://martlgap.github.io/xqlfw/)

	#### Metrics

	Accuracy [%]

	### Results

	\| [LFW](http://vis-www.cs.umass.edu/lfw/) \| [CALFW](http://whdeng.cn/CALFW/) \| [CPLFW](http://whdeng.cn/CPLFW/) \| [MLFW](http://whdeng.cn/mlfw/) \| [XQLFW](https://martlgap.github.io/xqlfw/) \|
	\|---\|---\|---\|---\|---\|
	\| 99.73 \| 94.93 \| 91.58 \| 85.63 \| 95.12 \|

	## Citation

	BibTeX:

	~~~tex
	@inproceedings{knoche2023octuplet,
	title={Octuplet loss: Make face recognition robust to image resolution},
	author={Knoche, Martin and Elkadeem, Mohamed and H{\"o}rmann, Stefan and Rigoll, Gerhard},
	booktitle={2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)},
	pages={1--8},
	year={2023},
	organization={IEEE}
	}
	~~~

	## Model Card Author

	Martin Knoche

	## Model Card Contact

	Martin.Knoche@tum.de