Spaces:

Katiyar48
/

SecureAttendAI

Sleeping

App Files Files Community

SecureAttendAI / comparative_analysis.md

Nishant Katiyar

Deploy biometric node to HF Spaces

b561839 23 days ago

preview code

Raw

History Blame Contribute Delete

7.03 kB

Comparative Analysis: Biometric Models for Face Attendance

This document evaluates the biometric models currently used in the SecureAttend AI system versus the alternative methods mentioned in your request, focusing on their suitability for lightweight CPU-only hardware.

1. Executive Summary

The current system implements YuNet (for detection) and SFace (for recognition), which are OpenCV's official DNN-optimized models. None of the alternative models mentioned in your request are currently in use.

A live benchmark executed on your local machine shows that the current YuNet + SFace stack delivers exceptional real-time performance on a standard CPU:

Face Detection (YuNet): 19.53 ms per frame (~51.2 FPS)
Face Recognition (SFace): 11.50 ms per face crop (~87.0 FPS)
Embedding Comparison: 0.0079 ms per match

Based on these results and architectural trade-offs, retaining the current YuNet + SFace stack is the highly recommended path, as it requires zero extra library dependencies and operates far below the 33ms real-time latency threshold.

2. Baseline Benchmark Results (Local CPU)

We ran a diagnostic benchmark (benchmark_current.py) directly on your machine's CPU to measure the baseline speed of the current implementation:

Pipeline Step	Latency (ms)	Throughput (FPS)	Resource Footprint
YuNet Face Detection	19.53 ms	51.2 FPS	232 KB model file
SFace Feature Extraction	11.50 ms	87.0 FPS	38.6 MB model file
Cosine Similarity Match	0.0079 ms	126,500 matches/sec	Virtually zero

These numbers show that the core AI processing (detection + recognition) takes ~31 ms combined. This fits within a single frame budget (33.3 ms for 30 FPS) even without frame-skipping.

3. Face Detection Model Comparison

The table below compares the active detector (YuNet) against the requested alternatives:

Model	Size	CPU Latency (640x480)	Landmarks	Library Dependency	Architectural Suitability
YuNet (Current)	~232 KB	~19.5 ms	✅ Yes (5 points)	None (OpenCV Native)	Excellent (Recommended). Extremely fast, lightweight, and specifically designed for real-time edge CPU workloads.
Qualcomm LWFD	~3.4 MB	~30 - 50 ms	❌ No	Qualcomm QNN / AI Hub	Poor. Vendor-locked. Highly optimized for Snapdragon NPUs/DSPs, but runs slower on generic x86/ARM CPUs. Lacks landmarks.
BlazeFace	~100-200 KB	~10 - 15 ms	✅ Yes (6 points)	Google MediaPipe	Moderate. Excellent for close-up phone/selfie range, but suffers from low accuracy for distant or multi-person detections.
RetinaFace	~1.7 MB (MobileNet) to ~104 MB (ResNet)	~60 - 500+ ms	✅ Yes (5 points)	PyTorch / InsightFace	Poor. High-accuracy powerhouse, but far too heavy for real-time CPU deployment. Leads to laggy frame rates.
YOLOv8-face	~6 MB (nano)	~40 - 80 ms	✅ Yes (5 points)	PyTorch / Ultralytics	Moderate. Strong multi-face detection, but carries a heavy PyTorch dependency and higher CPU latency.

Why YuNet is best for our case:

Landmarks & Alignment: YuNet outputs 5 facial landmarks natively, which SFace requires to crop and align the face. Without landmarks, we cannot feed aligned inputs to the recognition engine.
Zero Overhead: Being compiled directly inside the OpenCV C++ core DNN module (cv2.FaceDetectorYN), it avoids launching heavy Python interpreters (like PyTorch or TensorFlow) which consume large amounts of RAM.

4. Face Recognition Model Comparison

The table below compares the active vectorizer (SFace) against the requested alternatives:

Model	Size	CPU Latency	Embedding Dim	Library Dependency	Threshold Metric	Architectural Suitability
SFace (Current)	~38.6 MB	~11.5 ms	128-d	None (OpenCV Native)	Cosine (0.363)	Excellent (Recommended). Tailored for 112x112 YuNet crops. High speed-to-accuracy ratio.
MobileFaceNet	~4.0 MB	~15 - 25 ms	128-d	ONNX Runtime / TF Lite	Cosine (0.40)	Good. Extreme storage efficiency. Great if disk/memory space is highly constrained (e.g. microcontrollers).
ArcFace (R50)	~100+ MB	~120 - 200 ms	512-d	PyTorch / InsightFace	Cosine (0.65)	Poor (for CPU). SOTA accuracy, but the ResNet-50 backbone is too heavy for live 30 FPS CPU matching.
FaceNet	~90 MB	~100 - 150 ms	128/512-d	PyTorch / TensorFlow	Euclidean (1.1)	Poor. Legacy Inception-ResNet architecture. Highly resource-intensive and slow on CPU.
Dlib (ResNet-34)	~100+ MB	~100 - 150 ms	128-d	dlib (C++ build tools)	Euclidean (0.6)	Very Poor. Difficult to install on Windows (requires CMake and C++ compiler setup). Sluggish CPU performance.

Why SFace is best for our case:

Perfect Integration: It uses the same alignment crops (112x112) produced by YuNet, meaning they operate as a cohesive dual-stage pipeline inside face_engine.py.
Inference Speed: At 11.50 ms, it matches faces almost instantly, making it optimal for rapid check-in/out kiosk streams.

5. Summary Matrix & Recommendation

graph TD
    A[Biometric System Architecture] --> B{Hardware Constraints}
    B -->|Lightweight CPU / Mini PC| C[YuNet + SFace Stack]
    B -->|Heavy GPU Server / SOTA Accuracy| D[RetinaFace + ArcFace R50 Stack]
    B -->|Ultra-low Memory <10MB RAM| E[BlazeFace + MobileFaceNet Stack]
    
    style C fill:#2ecc71,stroke:#27ae60,stroke-width:2px,color:#fff
    style D fill:#e74c3c,stroke:#c0392b,stroke-width:1px,color:#fff
    style E fill:#f39c12,stroke:#d35400,stroke-width:1px,color:#fff

Final Verdict: Keep YuNet + SFace

For your local webcam attendance system running on lightweight hardware:

Qualcomm LWFD is rejected due to hardware lock and lack of landmark features.
RetinaFace, YOLOv8-face, ArcFace, FaceNet, and Dlib are rejected due to high CPU latency (>100ms) and bloated dependencies (PyTorch/TensorFlow).
BlazeFace and MobileFaceNet are viable alternatives if you need to run on extremely low-spec microcontrollers, but they offer lower face detection range and require introducing extra runtime libraries (MediaPipe/ONNX Runtime) which degrades the clean codebase simplicity.

Recommendation: Retain YuNet and SFace. Instead of replacing them, apply the quick optimizations highlighted in your ARCHITECTURE.md (e.g. Frame Skipping and Downscaled Detection) to drop CPU usage by up to 50% without altering the model pipeline.