SecureAttendAI / comparative_analysis.md
Nishant Katiyar
Deploy biometric node to HF Spaces
b561839
|
Raw
History Blame Contribute Delete
7.03 kB

Comparative Analysis: Biometric Models for Face Attendance

This document evaluates the biometric models currently used in the SecureAttend AI system versus the alternative methods mentioned in your request, focusing on their suitability for lightweight CPU-only hardware.


1. Executive Summary

The current system implements YuNet (for detection) and SFace (for recognition), which are OpenCV's official DNN-optimized models. None of the alternative models mentioned in your request are currently in use.

A live benchmark executed on your local machine shows that the current YuNet + SFace stack delivers exceptional real-time performance on a standard CPU:

  • Face Detection (YuNet): 19.53 ms per frame (~51.2 FPS)
  • Face Recognition (SFace): 11.50 ms per face crop (~87.0 FPS)
  • Embedding Comparison: 0.0079 ms per match

Based on these results and architectural trade-offs, retaining the current YuNet + SFace stack is the highly recommended path, as it requires zero extra library dependencies and operates far below the 33ms real-time latency threshold.


2. Baseline Benchmark Results (Local CPU)

We ran a diagnostic benchmark (benchmark_current.py) directly on your machine's CPU to measure the baseline speed of the current implementation:

Pipeline Step Latency (ms) Throughput (FPS) Resource Footprint
YuNet Face Detection 19.53 ms 51.2 FPS 232 KB model file
SFace Feature Extraction 11.50 ms 87.0 FPS 38.6 MB model file
Cosine Similarity Match 0.0079 ms 126,500 matches/sec Virtually zero

These numbers show that the core AI processing (detection + recognition) takes ~31 ms combined. This fits within a single frame budget (33.3 ms for 30 FPS) even without frame-skipping.


3. Face Detection Model Comparison

The table below compares the active detector (YuNet) against the requested alternatives:

Model Size CPU Latency (640x480) Landmarks Library Dependency Architectural Suitability
YuNet (Current) ~232 KB ~19.5 ms ✅ Yes (5 points) None (OpenCV Native) Excellent (Recommended). Extremely fast, lightweight, and specifically designed for real-time edge CPU workloads.
Qualcomm LWFD ~3.4 MB ~30 - 50 ms ❌ No Qualcomm QNN / AI Hub Poor. Vendor-locked. Highly optimized for Snapdragon NPUs/DSPs, but runs slower on generic x86/ARM CPUs. Lacks landmarks.
BlazeFace ~100-200 KB ~10 - 15 ms ✅ Yes (6 points) Google MediaPipe Moderate. Excellent for close-up phone/selfie range, but suffers from low accuracy for distant or multi-person detections.
RetinaFace ~1.7 MB (MobileNet) to ~104 MB (ResNet) ~60 - 500+ ms ✅ Yes (5 points) PyTorch / InsightFace Poor. High-accuracy powerhouse, but far too heavy for real-time CPU deployment. Leads to laggy frame rates.
YOLOv8-face ~6 MB (nano) ~40 - 80 ms ✅ Yes (5 points) PyTorch / Ultralytics Moderate. Strong multi-face detection, but carries a heavy PyTorch dependency and higher CPU latency.

Why YuNet is best for our case:

  1. Landmarks & Alignment: YuNet outputs 5 facial landmarks natively, which SFace requires to crop and align the face. Without landmarks, we cannot feed aligned inputs to the recognition engine.
  2. Zero Overhead: Being compiled directly inside the OpenCV C++ core DNN module (cv2.FaceDetectorYN), it avoids launching heavy Python interpreters (like PyTorch or TensorFlow) which consume large amounts of RAM.

4. Face Recognition Model Comparison

The table below compares the active vectorizer (SFace) against the requested alternatives:

Model Size CPU Latency Embedding Dim Library Dependency Threshold Metric Architectural Suitability
SFace (Current) ~38.6 MB ~11.5 ms 128-d None (OpenCV Native) Cosine (0.363) Excellent (Recommended). Tailored for 112x112 YuNet crops. High speed-to-accuracy ratio.
MobileFaceNet ~4.0 MB ~15 - 25 ms 128-d ONNX Runtime / TF Lite Cosine (0.40) Good. Extreme storage efficiency. Great if disk/memory space is highly constrained (e.g. microcontrollers).
ArcFace (R50) ~100+ MB ~120 - 200 ms 512-d PyTorch / InsightFace Cosine (0.65) Poor (for CPU). SOTA accuracy, but the ResNet-50 backbone is too heavy for live 30 FPS CPU matching.
FaceNet ~90 MB ~100 - 150 ms 128/512-d PyTorch / TensorFlow Euclidean (1.1) Poor. Legacy Inception-ResNet architecture. Highly resource-intensive and slow on CPU.
Dlib (ResNet-34) ~100+ MB ~100 - 150 ms 128-d dlib (C++ build tools) Euclidean (0.6) Very Poor. Difficult to install on Windows (requires CMake and C++ compiler setup). Sluggish CPU performance.

Why SFace is best for our case:

  1. Perfect Integration: It uses the same alignment crops (112x112) produced by YuNet, meaning they operate as a cohesive dual-stage pipeline inside face_engine.py.
  2. Inference Speed: At 11.50 ms, it matches faces almost instantly, making it optimal for rapid check-in/out kiosk streams.

5. Summary Matrix & Recommendation

graph TD
    A[Biometric System Architecture] --> B{Hardware Constraints}
    B -->|Lightweight CPU / Mini PC| C[YuNet + SFace Stack]
    B -->|Heavy GPU Server / SOTA Accuracy| D[RetinaFace + ArcFace R50 Stack]
    B -->|Ultra-low Memory <10MB RAM| E[BlazeFace + MobileFaceNet Stack]
    
    style C fill:#2ecc71,stroke:#27ae60,stroke-width:2px,color:#fff
    style D fill:#e74c3c,stroke:#c0392b,stroke-width:1px,color:#fff
    style E fill:#f39c12,stroke:#d35400,stroke-width:1px,color:#fff

Final Verdict: Keep YuNet + SFace

For your local webcam attendance system running on lightweight hardware:

  • Qualcomm LWFD is rejected due to hardware lock and lack of landmark features.
  • RetinaFace, YOLOv8-face, ArcFace, FaceNet, and Dlib are rejected due to high CPU latency (>100ms) and bloated dependencies (PyTorch/TensorFlow).
  • BlazeFace and MobileFaceNet are viable alternatives if you need to run on extremely low-spec microcontrollers, but they offer lower face detection range and require introducing extra runtime libraries (MediaPipe/ONNX Runtime) which degrades the clean codebase simplicity.

Recommendation: Retain YuNet and SFace. Instead of replacing them, apply the quick optimizations highlighted in your ARCHITECTURE.md (e.g. Frame Skipping and Downscaled Detection) to drop CPU usage by up to 50% without altering the model pipeline.