Classroom Reaction Recognition - VGG16
A VGG16 classifier trained to recognize facial reactions of students from cropped person bounding boxes in classroom lecture videos.
Classes
| Index | Label | Description |
|---|---|---|
| 0 | Neutral |
No visible expression |
| 1 | Confused |
Furrowed brow, squinting |
| 2 | Smiling_Amused |
Visible smile, laughter |
| 3 | Surprised |
Raised eyebrows, open mouth |
| 4 | Bored_Tired |
Yawning, blank stare |
Architecture
- Backbone: VGG16 (ImageNet pre-trained, feature layers frozen)
- Classifier head:
nn.Linear(4096, 5) - Loss: Weighted
CrossEntropyLossto handle class imbalance - Optimizer: Adam (lr=0.001)
- Input size: 224 x 224 RGB, ImageNet-normalized
Training
Trained on manually annotated person crops extracted from classroom lecture videos using YOLOv8-nano detection. 70/10/20 stratified train/val/test split.
Usage
import torch
from torchvision import models
model = models.vgg16(weights=None)
model.classifier[6] = torch.nn.Linear(4096, 5)
model.load_state_dict(torch.load("best_vgg16.pth", map_location="cpu"))
model.eval()