Huuffy commited on
Commit
743d2ce
Β·
verified Β·
1 Parent(s): 6040cef

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +193 -0
README.md ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - facial-expression-recognition
5
+ - emotion-recognition
6
+ - computer-vision
7
+ - pytorch
8
+ - mediapipe
9
+ - efficientnet
10
+ - real-time
11
+ - image-classification
12
+ pipeline_tag: image-classification
13
+ ---
14
+
15
+ <div align="center">
16
+
17
+ ![header](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=200&section=header&text=VisageCNN&fontSize=70&fontColor=fff&animation=fadeIn&fontAlignY=38&desc=Real-Time%20Facial%20Expression%20Recognition&descAlignY=60&descAlign=50)
18
+
19
+ <a href="https://git.io/typing-svg"><img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=600&size=22&pause=1000&color=06B6D4&center=true&vCenter=true&width=750&lines=Hybrid+CNN+%2B+MediaPipe+Landmark+Architecture;7+Emotion+Classes+%E2%80%94+Real-Time+at+30+FPS;Bidirectional+Cross-Attention+%7C+EfficientNet-B0+%2B+478+Landmarks;Optimized+for+RTX+3050+%E2%80%94+4GB+VRAM" alt="Typing SVG" /></a>
20
+
21
+ <br/>
22
+
23
+ ![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
24
+ ![PyTorch](https://img.shields.io/badge/PyTorch-2.x-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)
25
+ ![MediaPipe](https://img.shields.io/badge/MediaPipe-0.10-00BCD4?style=for-the-badge&logo=google&logoColor=white)
26
+ ![OpenCV](https://img.shields.io/badge/OpenCV-4.x-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white)
27
+ ![CUDA](https://img.shields.io/badge/CUDA-11.8+-76B900?style=for-the-badge&logo=nvidia&logoColor=white)
28
+ ![License](https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge)
29
+ [![GitHub](https://img.shields.io/badge/GitHub-VisageCNN-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Huuffy/VisageCNN)
30
+
31
+ </div>
32
+
33
+ ---
34
+
35
+ ## What Is This?
36
+
37
+ **HybridEmotionNet** β€” a dual-branch neural network for real-time facial emotion recognition that fuses **EfficientNet-B0 appearance features** with **MediaPipe 3D landmark geometry** via bidirectional cross-attention.
38
+
39
+ Processes webcam frames at **30+ FPS**, extracts **478 3D landmarks**, crops the face, and classifies into 7 emotions with temporal smoothing.
40
+
41
+ ---
42
+
43
+ ## Architecture
44
+
45
+ ![Architecture](https://huggingface.co/Huuffy/VisageCNN/resolve/main/Architecture%20digram.png)
46
+
47
+ ```
48
+ Face crop (224Γ—224) ──► EfficientNet-B0 ──► [B, 256] appearance
49
+ 478 landmarks (xyz) ──► MLP encoder ──► [B, 256] geometry
50
+ β”‚
51
+ Bidirectional Cross-Attention (4 heads each)
52
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
53
+ β”‚ coord β†’ CNN (geometry queries appear.) β”‚
54
+ β”‚ CNN β†’ coord (appear. queries geometry) β”‚
55
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
56
+ β”‚
57
+ Fusion MLP: 512 β†’ 384 β†’ 256 β†’ 128
58
+ β”‚
59
+ Classifier: 128 β†’ 7 emotions
60
+ ```
61
+
62
+ | Component | Detail |
63
+ |-----------|--------|
64
+ | CNN branch | EfficientNet-B0, ImageNet init, blocks 0–2 frozen |
65
+ | Coord branch | MLP 1434 β†’ 512 β†’ 384 β†’ 256, BN + Dropout |
66
+ | Fusion | Bidirectional cross-attention + MLP |
67
+ | Parameters | 6.2M total / 5.75M trainable |
68
+ | Model size | 72 MB |
69
+
70
+ ---
71
+
72
+ ## Files in This Repo
73
+
74
+ | File | Size | Required |
75
+ |------|------|---------|
76
+ | `models/weights/hybrid_best_model.pth` | 72 MB | Yes β€” model weights |
77
+ | `models/scalers/hybrid_coordinate_scaler.pkl` | 18 KB | Yes β€” landmark scaler |
78
+ | `Architecture digram.png` | β€” | No β€” docs only |
79
+
80
+ ---
81
+
82
+ ## Quick Start
83
+
84
+ ### 1 β€” Clone the code
85
+
86
+ ```bash
87
+ git clone https://github.com/Huuffy/VisageCNN.git
88
+ cd VisageCNN
89
+ python -m venv venv && venv\Scripts\activate
90
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
91
+ pip install -r requirements.txt
92
+ ```
93
+
94
+ ### 2 β€” Download weights
95
+
96
+ ```python
97
+ from huggingface_hub import hf_hub_download
98
+ import shutil, pathlib
99
+
100
+ for remote, local in [
101
+ ("models/weights/hybrid_best_model.pth", "models/weights/hybrid_best_model.pth"),
102
+ ("models/scalers/hybrid_coordinate_scaler.pkl", "models/scalers/hybrid_coordinate_scaler.pkl"),
103
+ ]:
104
+ src = hf_hub_download(repo_id="Huuffy/VisageCNN", filename=remote)
105
+ pathlib.Path(local).parent.mkdir(parents=True, exist_ok=True)
106
+ shutil.copy(src, local)
107
+ ```
108
+
109
+ Or with the HF CLI:
110
+ ```bash
111
+ hf download Huuffy/VisageCNN models/weights/hybrid_best_model.pth --local-dir .
112
+ hf download Huuffy/VisageCNN models/scalers/hybrid_coordinate_scaler.pkl --local-dir .
113
+ ```
114
+
115
+ ### 3 β€” Run inference
116
+
117
+ ```bash
118
+ python inference/run_hybrid.py
119
+ ```
120
+
121
+ Press **Q** to quit.
122
+
123
+ ---
124
+
125
+ ## Emotion Classes
126
+
127
+ | Label | Emotion | Key Signals |
128
+ |-------|---------|-------------|
129
+ | 0 | Angry | Furrowed brows, tightened jaw |
130
+ | 1 | Disgust | Raised upper lip, wrinkled nose |
131
+ | 2 | Fear | Wide eyes, raised brows, open mouth |
132
+ | 3 | Happy | Raised cheeks, open smile |
133
+ | 4 | Neutral | Relaxed, no strong deformation |
134
+ | 5 | Sad | Lowered brow corners, downturned lips |
135
+ | 6 | Surprised | Raised brows, wide eyes, dropped jaw |
136
+
137
+ ---
138
+
139
+ ## Training Dataset
140
+
141
+ ~30k clean images β€” FER2013 noise removed across all classes:
142
+
143
+ | Class | Images | Sources |
144
+ |-------|--------|---------|
145
+ | Angry | 6,130 | RAF-DB + AffectNet + AffectNet-Short + CK+ |
146
+ | Surprised | 5,212 | RAF-DB + AffectNet |
147
+ | Sad | 4,941 | RAF-DB + AffectNet + AffectNet-Short + CK+ |
148
+ | Disgust | 3,782 | AffectNet-Short + RAF-DB + CK+ |
149
+ | Neutral | 3,475 | RAF-DB + AffectNet |
150
+ | Fear | 3,418 | AffectNet-Short + RAF-DB + CK+ |
151
+ | Happy | 3,124 | RAF-DB + AffectNet |
152
+
153
+ Max class imbalance: **1.97Γ—**
154
+
155
+ ---
156
+
157
+ ## Training Config
158
+
159
+ | Setting | Value |
160
+ |---------|-------|
161
+ | Loss | Focal Loss Ξ³=2.0 + label smoothing 0.12 |
162
+ | Optimizer | AdamW, weight decay 0.05 |
163
+ | LR | OneCycleLR β€” CNN 5e-5, fusion 5e-4 |
164
+ | Batch | 128 + grad accumulation Γ—2 (eff. 256) |
165
+ | Augmentation | CutMix + noise + rotation + zoom |
166
+ | Mixed precision | torch.amp (AMP) |
167
+ | Early stopping | patience=40 on val accuracy |
168
+
169
+ ---
170
+
171
+ ## Retrain From Scratch
172
+
173
+ ```bash
174
+ # Build dataset (downloads ~30k clean images from HuggingFace)
175
+ pip install datasets
176
+ python scripts/prepare_dataset.py
177
+
178
+ # Delete old cache and train
179
+ rmdir /s /q models\cache
180
+ python scripts/train_hybrid.py
181
+ ```
182
+
183
+ Full training guide: [GitHub README](https://github.com/Huuffy/VisageCNN)
184
+
185
+ ---
186
+
187
+ <div align="center">
188
+
189
+ **Built with curiosity and a lot of training runs**
190
+
191
+ ![footer](https://capsule-render.vercel.app/api?type=waving&color=gradient&customColorList=6,11,20&height=120&section=footer)
192
+
193
+ </div>