Huuffy commited on
Commit
77bd6fa
Β·
verified Β·
1 Parent(s): e2e2bcd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +219 -193
README.md CHANGED
@@ -1,193 +1,219 @@
1
- ---
2
- license: mit
3
- tags:
4
- - facial-expression-recognition
5
- - emotion-recognition
6
- - computer-vision
7
- - pytorch
8
- - mediapipe
9
- - efficientnet
10
- - real-time
11
- - image-classification
12
- pipeline_tag: image-classification
13
- ---
14
-
15
- <div align="center">
16
-
17
- ![header](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=200&section=header&text=VisageCNN&fontSize=70&fontColor=fff&animation=fadeIn&fontAlignY=38&desc=Real-Time%20Facial%20Expression%20Recognition&descAlignY=60&descAlign=50)
18
-
19
- <a href="https://git.io/typing-svg"><img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=600&size=22&pause=1000&color=06B6D4&center=true&vCenter=true&width=750&lines=Hybrid+CNN+%2B+MediaPipe+Landmark+Architecture;7+Emotion+Classes+%E2%80%94+Real-Time+at+30+FPS;Bidirectional+Cross-Attention+%7C+EfficientNet-B0+%2B+478+Landmarks;Optimized+for+RTX+3050+%E2%80%94+4GB+VRAM" alt="Typing SVG" /></a>
20
-
21
- <br/>
22
-
23
- ![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
24
- ![PyTorch](https://img.shields.io/badge/PyTorch-2.x-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)
25
- ![MediaPipe](https://img.shields.io/badge/MediaPipe-0.10-00BCD4?style=for-the-badge&logo=google&logoColor=white)
26
- ![OpenCV](https://img.shields.io/badge/OpenCV-4.x-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white)
27
- ![CUDA](https://img.shields.io/badge/CUDA-11.8+-76B900?style=for-the-badge&logo=nvidia&logoColor=white)
28
- ![License](https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge)
29
- [![GitHub](https://img.shields.io/badge/GitHub-VisageCNN-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Huuffy/VisageCNN)
30
-
31
- </div>
32
-
33
- ---
34
-
35
- ## What Is This?
36
-
37
- **HybridEmotionNet** β€” a dual-branch neural network for real-time facial emotion recognition that fuses **EfficientNet-B0 appearance features** with **MediaPipe 3D landmark geometry** via bidirectional cross-attention.
38
-
39
- Processes webcam frames at **30+ FPS**, extracts **478 3D landmarks**, crops the face, and classifies into 7 emotions with temporal smoothing.
40
-
41
- ---
42
-
43
- ## Architecture
44
-
45
- ![Architecture](https://huggingface.co/Huuffy/VisageCNN/resolve/main/Architecture%20digram.png)
46
-
47
- ```
48
- Face crop (224Γ—224) ──► EfficientNet-B0 ──► [B, 256] appearance
49
- 478 landmarks (xyz) ──► MLP encoder ──► [B, 256] geometry
50
- β”‚
51
- Bidirectional Cross-Attention (4 heads each)
52
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
53
- β”‚ coord β†’ CNN (geometry queries appear.) β”‚
54
- β”‚ CNN β†’ coord (appear. queries geometry) β”‚
55
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
56
- β”‚
57
- Fusion MLP: 512 β†’ 384 β†’ 256 β†’ 128
58
- β”‚
59
- Classifier: 128 β†’ 7 emotions
60
- ```
61
-
62
- | Component | Detail |
63
- |-----------|--------|
64
- | CNN branch | EfficientNet-B0, ImageNet init, blocks 0–2 frozen |
65
- | Coord branch | MLP 1434 β†’ 512 β†’ 384 β†’ 256, BN + Dropout |
66
- | Fusion | Bidirectional cross-attention + MLP |
67
- | Parameters | 6.2M total / 5.75M trainable |
68
- | Model size | 72 MB |
69
-
70
- ---
71
-
72
- ## Files in This Repo
73
-
74
- | File | Size | Required |
75
- |------|------|---------|
76
- | `models/weights/hybrid_best_model.pth` | 72 MB | Yes β€” model weights |
77
- | `models/scalers/hybrid_coordinate_scaler.pkl` | 18 KB | Yes β€” landmark scaler |
78
- | `Architecture digram.png` | β€” | No β€” docs only |
79
-
80
- ---
81
-
82
- ## Quick Start
83
-
84
- ### 1 β€” Clone the code
85
-
86
- ```bash
87
- git clone https://github.com/Huuffy/VisageCNN.git
88
- cd VisageCNN
89
- python -m venv venv && venv\Scripts\activate
90
- pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
91
- pip install -r requirements.txt
92
- ```
93
-
94
- ### 2 β€” Download weights
95
-
96
- ```python
97
- from huggingface_hub import hf_hub_download
98
- import shutil, pathlib
99
-
100
- for remote, local in [
101
- ("models/weights/hybrid_best_model.pth", "models/weights/hybrid_best_model.pth"),
102
- ("models/scalers/hybrid_coordinate_scaler.pkl", "models/scalers/hybrid_coordinate_scaler.pkl"),
103
- ]:
104
- src = hf_hub_download(repo_id="Huuffy/VisageCNN", filename=remote)
105
- pathlib.Path(local).parent.mkdir(parents=True, exist_ok=True)
106
- shutil.copy(src, local)
107
- ```
108
-
109
- Or with the HF CLI:
110
- ```bash
111
- hf download Huuffy/VisageCNN models/weights/hybrid_best_model.pth --local-dir .
112
- hf download Huuffy/VisageCNN models/scalers/hybrid_coordinate_scaler.pkl --local-dir .
113
- ```
114
-
115
- ### 3 β€” Run inference
116
-
117
- ```bash
118
- python inference/run_hybrid.py
119
- ```
120
-
121
- Press **Q** to quit.
122
-
123
- ---
124
-
125
- ## Emotion Classes
126
-
127
- | Label | Emotion | Key Signals |
128
- |-------|---------|-------------|
129
- | 0 | Angry | Furrowed brows, tightened jaw |
130
- | 1 | Disgust | Raised upper lip, wrinkled nose |
131
- | 2 | Fear | Wide eyes, raised brows, open mouth |
132
- | 3 | Happy | Raised cheeks, open smile |
133
- | 4 | Neutral | Relaxed, no strong deformation |
134
- | 5 | Sad | Lowered brow corners, downturned lips |
135
- | 6 | Surprised | Raised brows, wide eyes, dropped jaw |
136
-
137
- ---
138
-
139
- ## Training Dataset
140
-
141
- ~30k clean images β€” FER2013 noise removed across all classes:
142
-
143
- | Class | Images | Sources |
144
- |-------|--------|---------|
145
- | Angry | 6,130 | RAF-DB + AffectNet + AffectNet-Short + CK+ |
146
- | Surprised | 5,212 | RAF-DB + AffectNet |
147
- | Sad | 4,941 | RAF-DB + AffectNet + AffectNet-Short + CK+ |
148
- | Disgust | 3,782 | AffectNet-Short + RAF-DB + CK+ |
149
- | Neutral | 3,475 | RAF-DB + AffectNet |
150
- | Fear | 3,418 | AffectNet-Short + RAF-DB + CK+ |
151
- | Happy | 3,124 | RAF-DB + AffectNet |
152
-
153
- Max class imbalance: **1.97Γ—**
154
-
155
- ---
156
-
157
- ## Training Config
158
-
159
- | Setting | Value |
160
- |---------|-------|
161
- | Loss | Focal Loss Ξ³=2.0 + label smoothing 0.12 |
162
- | Optimizer | AdamW, weight decay 0.05 |
163
- | LR | OneCycleLR β€” CNN 5e-5, fusion 5e-4 |
164
- | Batch | 128 + grad accumulation Γ—2 (eff. 256) |
165
- | Augmentation | CutMix + noise + rotation + zoom |
166
- | Mixed precision | torch.amp (AMP) |
167
- | Early stopping | patience=40 on val accuracy |
168
-
169
- ---
170
-
171
- ## Retrain From Scratch
172
-
173
- ```bash
174
- # Build dataset (downloads ~30k clean images from HuggingFace)
175
- pip install datasets
176
- python scripts/prepare_dataset.py
177
-
178
- # Delete old cache and train
179
- rmdir /s /q models\cache
180
- python scripts/train_hybrid.py
181
- ```
182
-
183
- Full training guide: [GitHub README](https://github.com/Huuffy/VisageCNN)
184
-
185
- ---
186
-
187
- <div align="center">
188
-
189
- **Built with curiosity and a lot of training runs**
190
-
191
- ![footer](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=120&section=footer)
192
-
193
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - facial-expression-recognition
5
+ - emotion-recognition
6
+ - computer-vision
7
+ - pytorch
8
+ - mediapipe
9
+ - efficientnet
10
+ - real-time
11
+ - image-classification
12
+ pipeline_tag: image-classification
13
+ ---
14
+
15
+ <div align="center">
16
+
17
+ ![header](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=200&section=header&text=VisageCNN&fontSize=70&fontColor=fff&animation=fadeIn&fontAlignY=38&desc=Real-Time%20Facial%20Expression%20Recognition&descAlignY=60&descAlign=50)
18
+
19
+ <a href="https://git.io/typing-svg"><img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=600&size=22&pause=1000&color=06B6D4&center=true&vCenter=true&width=750&lines=Hybrid+CNN+%2B+MediaPipe+Landmark+Architecture;7+Emotion+Classes+%E2%80%94+Real-Time+at+30+FPS;Bidirectional+Cross-Attention+%7C+EfficientNet-B2+%2B+478+Landmarks;87.9%25+Validation+Accuracy+%7C+Disgust+92%25+Recall" alt="Typing SVG" /></a>
20
+
21
+ <br/>
22
+
23
+ ![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
24
+ ![PyTorch](https://img.shields.io/badge/PyTorch-2.x-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)
25
+ ![MediaPipe](https://img.shields.io/badge/MediaPipe-0.10-00BCD4?style=for-the-badge&logo=google&logoColor=white)
26
+ ![OpenCV](https://img.shields.io/badge/OpenCV-4.x-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white)
27
+ ![CUDA](https://img.shields.io/badge/CUDA-11.8+-76B900?style=for-the-badge&logo=nvidia&logoColor=white)
28
+ ![License](https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge)
29
+ [![GitHub](https://img.shields.io/badge/GitHub-VisageCNN-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Huuffy/VisageCNN)
30
+
31
+ </div>
32
+
33
+ ---
34
+
35
+ ## What Is This?
36
+
37
+ **HybridEmotionNet ** β€” a dual-branch neural network for real-time facial emotion recognition that fuses **EfficientNet-B2 appearance features** with **MediaPipe 3D landmark geometry** via bidirectional cross-attention.
38
+
39
+ Processes webcam frames at **30+ FPS**, extracts **478 3D landmarks**, crops the face at 224Γ—224, and classifies into 7 emotions with EMA + sliding window temporal smoothing.
40
+
41
+ ** highlights:** 87.9% validation accuracy Β· Disgust recall 51%β†’90% Β· Fear recall 65%β†’75% Β· 75k balanced training images Β· ViT-scored quality filtering
42
+
43
+ ---
44
+
45
+ ## Architecture
46
+
47
+ ![Architecture](https://huggingface.co/Huuffy/VisageCNN/resolve/main/Architecture%20digram.png)
48
+
49
+ ```
50
+ Face crop (224Γ—224) ──► EfficientNet-B2 ──► [B, 256] appearance
51
+ blocks 0-1 frozen
52
+ blocks 2-8 fine-tuned
53
+
54
+ 478 landmarks (xyz) ──► MLP encoder ──► [B, 256] geometry
55
+ 1434 β†’ 512 β†’ 384 β†’ 256
56
+
57
+ Bidirectional Cross-Attention (4 heads each)
58
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
59
+ β”‚ coord β†’ CNN (geometry queries appear.) β”‚
60
+ β”‚ CNN β†’ coord (appear. queries geometry) β”‚
61
+ β””β”€β”€β”€β”€β”€β”€β”€οΏ½οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
62
+ β”‚
63
+ Fusion MLP: 512 β†’ 384 β†’ 256 β†’ 128
64
+ β”‚
65
+ Classifier: 128 β†’ 7 emotions
66
+ ```
67
+
68
+ | Component | Detail |
69
+ |-----------|--------|
70
+ | CNN branch | EfficientNet-B2, ImageNet init, blocks 0–1 frozen, gradient checkpointing |
71
+ | Coord branch | MLP 1434 β†’ 512 β†’ 384 β†’ 256, BN + Dropout |
72
+ | Fusion | Bidirectional cross-attention + MLP |
73
+ | Parameters | ~8M total |
74
+ | Model size | ~90 MB |
75
+
76
+ ---
77
+
78
+ ## Performance
79
+
80
+ | Metric | Value |
81
+ |--------|-------|
82
+ | Validation accuracy | **87.9%** |
83
+ | Macro F1 | **0.88** |
84
+ | Inference speed | ~12 ms/frame on RTX 3050 |
85
+
86
+ | Emotion | Precision | Recall | F1 |
87
+ |---------|-----------|--------|----|
88
+ | Angry | 0.85 | 0.83 | 0.84 |
89
+ | Disgust | 0.97 | 0.90 | 0.94 |
90
+ | Fear | 0.89 | 0.75 | 0.82 |
91
+ | Happy | 0.97 | 0.99 | 0.98 |
92
+ | Neutral | 0.85 | 0.91 | 0.88 |
93
+ | Sad | 0.78 | 0.88 | 0.83 |
94
+ | Surprised | 0.83 | 0.90 | 0.86 |
95
+
96
+ ---
97
+
98
+ ## Files in This Repo
99
+
100
+ | File | Size | Required |
101
+ |------|------|---------|
102
+ | `models/weights/hybrid_best_model.pth` | ~90 MB | Yes β€” best macro F1 checkpoint |
103
+ | `models/weights/hybrid_swa_final.pth` | ~90 MB | Optional β€” SWA ensemble model |
104
+ | `models/scalers/hybrid_coordinate_scaler.pkl` | 18 KB | Yes β€” landmark scaler |
105
+
106
+ ---
107
+
108
+ ## Quick Start
109
+
110
+ ### 1 β€” Clone the code
111
+
112
+ ```bash
113
+ git clone https://github.com/Huuffy/VisageCNN.git
114
+ cd VisageCNN
115
+ python -m venv venv && venv\Scripts\activate
116
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
117
+ pip install -r requirements.txt
118
+ ```
119
+
120
+ ### 2 β€” Download weights
121
+
122
+ ```python
123
+ from huggingface_hub import hf_hub_download
124
+ import shutil, pathlib
125
+
126
+ for remote, local in [
127
+ ("models/weights/hybrid_best_model.pth", "models/weights/hybrid_best_model.pth"),
128
+ ("models/weights/hybrid_swa_final.pth", "models/weights/hybrid_swa_final.pth"),
129
+ ("models/scalers/hybrid_coordinate_scaler.pkl", "models/scalers/hybrid_coordinate_scaler.pkl"),
130
+ ]:
131
+ src = hf_hub_download(repo_id="Huuffy/VisageCNN", filename=remote)
132
+ pathlib.Path(local).parent.mkdir(parents=True, exist_ok=True)
133
+ shutil.copy(src, local)
134
+ ```
135
+
136
+ Or with the HF CLI:
137
+ ```bash
138
+ huggingface-cli download Huuffy/VisageCNN models/weights/hybrid_best_model.pth --local-dir .
139
+ huggingface-cli download Huuffy/VisageCNN models/weights/hybrid_swa_final.pth --local-dir .
140
+ huggingface-cli download Huuffy/VisageCNN models/scalers/hybrid_coordinate_scaler.pkl --local-dir .
141
+ ```
142
+
143
+ ### 3 β€” Run inference
144
+
145
+ ```bash
146
+ # Standard
147
+ python inference/run_hybrid.py
148
+
149
+ # With SWA ensemble
150
+ python inference/run_hybrid.py --ensemble
151
+ ```
152
+
153
+ Press **Q** to quit.
154
+
155
+ ---
156
+
157
+ ## Emotion Classes
158
+
159
+ | Label | Emotion | Key Signals |
160
+ |-------|---------|-------------|
161
+ | 0 | Angry | Furrowed brows, tightened jaw |
162
+ | 1 | Disgust | Raised upper lip, wrinkled nose |
163
+ | 2 | Fear | Wide eyes, raised brows, open mouth |
164
+ | 3 | Happy | Raised cheeks, open smile |
165
+ | 4 | Neutral | Relaxed, no strong deformation |
166
+ | 5 | Sad | Lowered brow corners, downturned lips |
167
+ | 6 | Surprised | Raised brows, wide eyes, dropped jaw |
168
+
169
+ ---
170
+
171
+ ## Training Dataset
172
+
173
+ 75,376 total images β€” 10,768 per class Γ— 7 emotions, perfectly balanced.
174
+
175
+ **Sources:** AffectNet Β· RAF-DB Β· FER2013 Β· AffectNet-Short Β· ScullyowesHenry Β· RAF-DB Kaggle
176
+
177
+ All images passed a two-stage quality filter:
178
+ 1. MediaPipe FaceMesh (dual confidence: 0.5 normal + 0.2 lenient for extreme expressions)
179
+ 2. ViT confidence scoring (`dima806/facial_emotions_image_detection`) with per-class asymmetric mislabel thresholds
180
+
181
+ Final class balance achieved via ViT-scored capping β€” lowest-confidence images removed first, preserving the highest quality examples per class.
182
+
183
+ ---
184
+
185
+ ## Training Config
186
+
187
+ | Setting | Value |
188
+ |---------|-------|
189
+ | Loss | Focal Loss Ξ³=2.0 + label smoothing 0.12 |
190
+ | Optimizer | AdamW, weight decay 0.05 |
191
+ | LR | OneCycleLR β€” CNN 5e-5, fusion 5e-4 |
192
+ | Batch | 96 + grad accumulation Γ—2 (eff. 192) |
193
+ | Augmentation | CutMix + noise + rotation + zoom |
194
+ | Mixed precision | torch.amp (AMP) |
195
+ | Best model saved by | Macro F1 (not val accuracy) |
196
+ | SWA | Epochs 30–70, BN update after training |
197
+ | Early stopping | patience=15 on macro F1 |
198
+
199
+ ---
200
+
201
+ ## Retrain From Scratch
202
+
203
+ ```bash
204
+ # Delete old cache and train
205
+ rmdir /s /q models\cache
206
+ python scripts/train_hybrid.py
207
+ ```
208
+
209
+ Full guide: [GitHub README](https://github.com/Huuffy/VisageCNN)
210
+
211
+ ---
212
+
213
+ <div align="center">
214
+
215
+ **Built with curiosity and a lot of training runs**
216
+
217
+ ![footer](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=120&section=footer)
218
+
219
+ </div>