--- license: mit tags: - facial-expression-recognition - emotion-recognition - computer-vision - pytorch - mediapipe - efficientnet - real-time - image-classification pipeline_tag: image-classification ---
![header](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=200§ion=header&text=VisageCNN&fontSize=70&fontColor=fff&animation=fadeIn&fontAlignY=38&desc=Real-Time%20Facial%20Expression%20Recognition&descAlignY=60&descAlign=50) Typing SVG
![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white) ![PyTorch](https://img.shields.io/badge/PyTorch-2.x-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white) ![MediaPipe](https://img.shields.io/badge/MediaPipe-0.10-00BCD4?style=for-the-badge&logo=google&logoColor=white) ![OpenCV](https://img.shields.io/badge/OpenCV-4.x-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white) ![CUDA](https://img.shields.io/badge/CUDA-11.8+-76B900?style=for-the-badge&logo=nvidia&logoColor=white) ![License](https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge) [![GitHub](https://img.shields.io/badge/GitHub-VisageCNN-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Huuffy/VisageCNN)
--- ## What Is This? **HybridEmotionNet ** — a dual-branch neural network for real-time facial emotion recognition that fuses **EfficientNet-B2 appearance features** with **MediaPipe 3D landmark geometry** via bidirectional cross-attention. Processes webcam frames at **30+ FPS**, extracts **478 3D landmarks**, crops the face at 224×224, and classifies into 7 emotions with EMA + sliding window temporal smoothing. ** highlights:** 87.9% validation accuracy · Disgust recall 51%→90% · Fear recall 65%→75% · 75k balanced training images · ViT-scored quality filtering --- ## Architecture ![Architecture](https://huggingface.co/Huuffy/VisageCNN/resolve/main/Architecture%20digram.png) ``` Face crop (224×224) ──► EfficientNet-B2 ──► [B, 256] appearance blocks 0-1 frozen blocks 2-8 fine-tuned 478 landmarks (xyz) ──► MLP encoder ──► [B, 256] geometry 1434 → 512 → 384 → 256 Bidirectional Cross-Attention (4 heads each) ┌──────────────────────────────────────────┐ │ coord → CNN (geometry queries appear.) │ │ CNN → coord (appear. queries geometry) │ └──────────────────────────────────────────┘ │ Fusion MLP: 512 → 384 → 256 → 128 │ Classifier: 128 → 7 emotions ``` | Component | Detail | |-----------|--------| | CNN branch | EfficientNet-B2, ImageNet init, blocks 0–1 frozen, gradient checkpointing | | Coord branch | MLP 1434 → 512 → 384 → 256, BN + Dropout | | Fusion | Bidirectional cross-attention + MLP | | Parameters | ~8M total | | Model size | ~90 MB | --- ## Performance | Metric | Value | |--------|-------| | Validation accuracy | **87.9%** | | Macro F1 | **0.88** | | Inference speed | ~12 ms/frame on RTX 3050 | | Emotion | Precision | Recall | F1 | |---------|-----------|--------|----| | Angry | 0.85 | 0.83 | 0.84 | | Disgust | 0.97 | 0.90 | 0.94 | | Fear | 0.89 | 0.75 | 0.82 | | Happy | 0.97 | 0.99 | 0.98 | | Neutral | 0.85 | 0.91 | 0.88 | | Sad | 0.78 | 0.88 | 0.83 | | Surprised | 0.83 | 0.90 | 0.86 | --- ## Files in This Repo | File | Size | Required | |------|------|---------| | `models/weights/hybrid_best_model.pth` | ~90 MB | Yes — best macro F1 checkpoint | | `models/weights/hybrid_swa_final.pth` | ~90 MB | Optional — SWA ensemble model | | `models/scalers/hybrid_coordinate_scaler.pkl` | 18 KB | Yes — landmark scaler | --- ## Quick Start ### 1 — Clone the code ```bash git clone https://github.com/Huuffy/VisageCNN.git cd VisageCNN python -m venv venv && venv\Scripts\activate pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 pip install -r requirements.txt ``` ### 2 — Download weights ```python from huggingface_hub import hf_hub_download import shutil, pathlib for remote, local in [ ("models/weights/hybrid_best_model.pth", "models/weights/hybrid_best_model.pth"), ("models/weights/hybrid_swa_final.pth", "models/weights/hybrid_swa_final.pth"), ("models/scalers/hybrid_coordinate_scaler.pkl", "models/scalers/hybrid_coordinate_scaler.pkl"), ]: src = hf_hub_download(repo_id="Huuffy/VisageCNN", filename=remote) pathlib.Path(local).parent.mkdir(parents=True, exist_ok=True) shutil.copy(src, local) ``` Or with the HF CLI: ```bash huggingface-cli download Huuffy/VisageCNN models/weights/hybrid_best_model.pth --local-dir . huggingface-cli download Huuffy/VisageCNN models/weights/hybrid_swa_final.pth --local-dir . huggingface-cli download Huuffy/VisageCNN models/scalers/hybrid_coordinate_scaler.pkl --local-dir . ``` ### 3 — Run inference ```bash # Standard python inference/run_hybrid.py # With SWA ensemble python inference/run_hybrid.py --ensemble ``` Press **Q** to quit. --- ## Emotion Classes | Label | Emotion | Key Signals | |-------|---------|-------------| | 0 | Angry | Furrowed brows, tightened jaw | | 1 | Disgust | Raised upper lip, wrinkled nose | | 2 | Fear | Wide eyes, raised brows, open mouth | | 3 | Happy | Raised cheeks, open smile | | 4 | Neutral | Relaxed, no strong deformation | | 5 | Sad | Lowered brow corners, downturned lips | | 6 | Surprised | Raised brows, wide eyes, dropped jaw | --- ## Training Dataset 75,376 total images — 10,768 per class × 7 emotions, perfectly balanced. **Sources:** AffectNet · RAF-DB · FER2013 · AffectNet-Short · ScullyowesHenry · RAF-DB Kaggle All images passed a two-stage quality filter: 1. MediaPipe FaceMesh (dual confidence: 0.5 normal + 0.2 lenient for extreme expressions) 2. ViT confidence scoring (`dima806/facial_emotions_image_detection`) with per-class asymmetric mislabel thresholds Final class balance achieved via ViT-scored capping — lowest-confidence images removed first, preserving the highest quality examples per class. --- ## Training Config | Setting | Value | |---------|-------| | Loss | Focal Loss γ=2.0 + label smoothing 0.12 | | Optimizer | AdamW, weight decay 0.05 | | LR | OneCycleLR — CNN 5e-5, fusion 5e-4 | | Batch | 96 + grad accumulation ×2 (eff. 192) | | Augmentation | CutMix + noise + rotation + zoom | | Mixed precision | torch.amp (AMP) | | Best model saved by | Macro F1 (not val accuracy) | | SWA | Epochs 30–70, BN update after training | | Early stopping | patience=15 on macro F1 | --- ## Retrain From Scratch ```bash # Delete old cache and train rmdir /s /q models\cache python scripts/train_hybrid.py ``` Full guide: [GitHub README](https://github.com/Huuffy/VisageCNN) ---
**Built with curiosity and a lot of training runs** ![footer](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=120§ion=footer)