Spaces:
Sleeping
Comparative Analysis: Biometric Models for Face Attendance
This document evaluates the biometric models currently used in the SecureAttend AI system versus the alternative methods mentioned in your request, focusing on their suitability for lightweight CPU-only hardware.
1. Executive Summary
The current system implements YuNet (for detection) and SFace (for recognition), which are OpenCV's official DNN-optimized models. None of the alternative models mentioned in your request are currently in use.
A live benchmark executed on your local machine shows that the current YuNet + SFace stack delivers exceptional real-time performance on a standard CPU:
- Face Detection (YuNet): 19.53 ms per frame (~51.2 FPS)
- Face Recognition (SFace): 11.50 ms per face crop (~87.0 FPS)
- Embedding Comparison: 0.0079 ms per match
Based on these results and architectural trade-offs, retaining the current YuNet + SFace stack is the highly recommended path, as it requires zero extra library dependencies and operates far below the 33ms real-time latency threshold.
2. Baseline Benchmark Results (Local CPU)
We ran a diagnostic benchmark (benchmark_current.py) directly on your machine's CPU to measure the baseline speed of the current implementation:
| Pipeline Step | Latency (ms) | Throughput (FPS) | Resource Footprint |
|---|---|---|---|
| YuNet Face Detection | 19.53 ms | 51.2 FPS | 232 KB model file |
| SFace Feature Extraction | 11.50 ms | 87.0 FPS | 38.6 MB model file |
| Cosine Similarity Match | 0.0079 ms | 126,500 matches/sec | Virtually zero |
These numbers show that the core AI processing (detection + recognition) takes ~31 ms combined. This fits within a single frame budget (33.3 ms for 30 FPS) even without frame-skipping.
3. Face Detection Model Comparison
The table below compares the active detector (YuNet) against the requested alternatives:
| Model | Size | CPU Latency (640x480) | Landmarks | Library Dependency | Architectural Suitability |
|---|---|---|---|---|---|
| YuNet (Current) | ~232 KB | ~19.5 ms | ✅ Yes (5 points) | None (OpenCV Native) | Excellent (Recommended). Extremely fast, lightweight, and specifically designed for real-time edge CPU workloads. |
| Qualcomm LWFD | ~3.4 MB | ~30 - 50 ms | ❌ No | Qualcomm QNN / AI Hub | Poor. Vendor-locked. Highly optimized for Snapdragon NPUs/DSPs, but runs slower on generic x86/ARM CPUs. Lacks landmarks. |
| BlazeFace | ~100-200 KB | ~10 - 15 ms | ✅ Yes (6 points) | Google MediaPipe | Moderate. Excellent for close-up phone/selfie range, but suffers from low accuracy for distant or multi-person detections. |
| RetinaFace | ~1.7 MB (MobileNet) to ~104 MB (ResNet) | ~60 - 500+ ms | ✅ Yes (5 points) | PyTorch / InsightFace | Poor. High-accuracy powerhouse, but far too heavy for real-time CPU deployment. Leads to laggy frame rates. |
| YOLOv8-face | ~6 MB (nano) | ~40 - 80 ms | ✅ Yes (5 points) | PyTorch / Ultralytics | Moderate. Strong multi-face detection, but carries a heavy PyTorch dependency and higher CPU latency. |
Why YuNet is best for our case:
- Landmarks & Alignment: YuNet outputs 5 facial landmarks natively, which SFace requires to crop and align the face. Without landmarks, we cannot feed aligned inputs to the recognition engine.
- Zero Overhead: Being compiled directly inside the OpenCV C++ core DNN module (
cv2.FaceDetectorYN), it avoids launching heavy Python interpreters (like PyTorch or TensorFlow) which consume large amounts of RAM.
4. Face Recognition Model Comparison
The table below compares the active vectorizer (SFace) against the requested alternatives:
| Model | Size | CPU Latency | Embedding Dim | Library Dependency | Threshold Metric | Architectural Suitability |
|---|---|---|---|---|---|---|
| SFace (Current) | ~38.6 MB | ~11.5 ms | 128-d | None (OpenCV Native) | Cosine (0.363) | Excellent (Recommended). Tailored for 112x112 YuNet crops. High speed-to-accuracy ratio. |
| MobileFaceNet | ~4.0 MB | ~15 - 25 ms | 128-d | ONNX Runtime / TF Lite | Cosine (0.40) | Good. Extreme storage efficiency. Great if disk/memory space is highly constrained (e.g. microcontrollers). |
| ArcFace (R50) | ~100+ MB | ~120 - 200 ms | 512-d | PyTorch / InsightFace | Cosine (0.65) | Poor (for CPU). SOTA accuracy, but the ResNet-50 backbone is too heavy for live 30 FPS CPU matching. |
| FaceNet | ~90 MB | ~100 - 150 ms | 128/512-d | PyTorch / TensorFlow | Euclidean (1.1) | Poor. Legacy Inception-ResNet architecture. Highly resource-intensive and slow on CPU. |
| Dlib (ResNet-34) | ~100+ MB | ~100 - 150 ms | 128-d | dlib (C++ build tools) | Euclidean (0.6) | Very Poor. Difficult to install on Windows (requires CMake and C++ compiler setup). Sluggish CPU performance. |
Why SFace is best for our case:
- Perfect Integration: It uses the same alignment crops (112x112) produced by YuNet, meaning they operate as a cohesive dual-stage pipeline inside face_engine.py.
- Inference Speed: At 11.50 ms, it matches faces almost instantly, making it optimal for rapid check-in/out kiosk streams.
5. Summary Matrix & Recommendation
graph TD
A[Biometric System Architecture] --> B{Hardware Constraints}
B -->|Lightweight CPU / Mini PC| C[YuNet + SFace Stack]
B -->|Heavy GPU Server / SOTA Accuracy| D[RetinaFace + ArcFace R50 Stack]
B -->|Ultra-low Memory <10MB RAM| E[BlazeFace + MobileFaceNet Stack]
style C fill:#2ecc71,stroke:#27ae60,stroke-width:2px,color:#fff
style D fill:#e74c3c,stroke:#c0392b,stroke-width:1px,color:#fff
style E fill:#f39c12,stroke:#d35400,stroke-width:1px,color:#fff
Final Verdict: Keep YuNet + SFace
For your local webcam attendance system running on lightweight hardware:
- Qualcomm LWFD is rejected due to hardware lock and lack of landmark features.
- RetinaFace, YOLOv8-face, ArcFace, FaceNet, and Dlib are rejected due to high CPU latency (>100ms) and bloated dependencies (PyTorch/TensorFlow).
- BlazeFace and MobileFaceNet are viable alternatives if you need to run on extremely low-spec microcontrollers, but they offer lower face detection range and require introducing extra runtime libraries (MediaPipe/ONNX Runtime) which degrades the clean codebase simplicity.
Recommendation: Retain YuNet and SFace. Instead of replacing them, apply the quick optimizations highlighted in your ARCHITECTURE.md (e.g. Frame Skipping and Downscaled Detection) to drop CPU usage by up to 50% without altering the model pipeline.