SE-AlexNet: Enhancing Face Perception via Dimension Reduction
A collection of 34 fine-tuned convolutional neural networks for studying how Squeeze-and-Excitation (SE) modules affect facial emotion recognition. This model zoo supports the psychophysical analysis pipeline described in the companion paper.
π¦ GitHub (Code & Analysis): SE-AlexNet | PsychometricFittingCurve | GradCAM-ROI-SaliencyMapDecoder
1. Model Details
This repository contains weights for 5 model architectures, systematically varied across 3 experimental axes:
| Experimental Axis | Values |
|---|---|
| Architecture | AlexNet, VGG16, SE-AlexNet-L1, SE-AlexNet-L2, SE-AlexNet-L3 |
| SE Reduction Ratio ($r$) | 2, 4, 8, 16, 32 (applies to SE variants) |
| Pre-training Basis | FaceBased (VGGFace2) or ObjectBased (ImageNet) |
Architecture Descriptions
| Model | SE Position | Key Characteristic |
|---|---|---|
| AlexNet | None (baseline) | Standard 5-conv AlexNet, num_classes=11 |
| SE-AlexNet-L1 | After last conv, before FC stack | SE on 256-channel feature maps |
| SE-AlexNet-L2 | Between fc6 β fc7 | SE on 4096-dim feature vector, reduction fixed at 16 |
| SE-AlexNet-L3 | As first element of classifier | Best performing variant β SELayer before fc6 |
| VGG16 | None (benchmark) | Standard VGG16, num_classes=2 (binary Happy/Sad) |
β οΈ Important note on SE-Location-2: All 10 L2 variants share identical architecture (reduction=16). The
squeeze-{r}label refers to a training/data configuration, not the architectural reduction ratio. This is preserved for reproducibility.
2. Intended Use
These models are designed for visual feature interpretability analysis in facial emotion recognition research. Specific use cases:
- Psychophysical model comparison: Compare model perceptual biases (Points of Subjective Equality, PSE) against human observers
- Grad-CAM ROI analysis: Visualize which facial regions (eyes, nose, mouth) different architectures attend to
- Dimension reduction study: Quantify how SE-based channel recalibration affects representational efficiency
- Pre-training effect study: Compare face-based (VGGFace2) vs. object-based (ImageNet) feature transfer
Not intended for: Production emotion recognition systems, clinical diagnosis, or surveillance applications. These are research models trained on controlled lab datasets.
3. Training Data
All models were fine-tuned on AffectNet, the largest facial expression dataset:
- Training set: 28,000 facial images (modified subset)
- Validation set: 1,000 facial images
- Classes: 8 basic emotions (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger, Contempt) β some variants use 11-class or binary labels
- Preprocessing: Standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Pre-training sources:
- FaceBased: VGGFace2 (faces only β captures facial identity features)
- ObjectBased: ImageNet-1K (general objects β captures generic visual features)
Training hyperparameters:
- Batch size: 128
- Learning rate: 1e-4 (Adam optimizer)
- Epochs: 40
- Frozen conv layers (transfer learning), only SE blocks + classifier trained
4. Evaluation & Performance
Key Findings from Ablation Study
| Metric | Finding |
|---|---|
| Best Architecture | SE-AlexNet-L3 (SE block closest to classifier) |
| Optimal Reduction | $r=32$ consistently outperformed lower reductions |
| Pre-training Effect | Face-based pre-training improved emotion discrimination by ~12% over object-based |
| ROI Attention | SE models showed more focused attention on mouth and eye regions vs. baseline AlexNet |
Psychophysical Benchmarking
Models were evaluated against human observers ($N=40$) in a 2AFC Happy/Sad discrimination task:
- Metric: RMSE between model PSE and human PSE distribution
- Statistical tests: One-sample t-tests, Cohen's d effect sizes
- Full results: See PsychometricFittingCurve repository
5. How to Get Started
Installation
pip install -r requirements.txt
Quick Start β Forward Pass in 10 Lines
from inference import SEModelPipeline
# Load the best model (SE-Location3, FaceBased, r=32)
pipe = SEModelPipeline('se-location3/facebased/squeeze-32')
# Run inference
probs = pipe.predict('path/to/face_image.jpg')
print(f'Prediction shape: {probs.shape}') # (1, 11)
List Available Models
import os, json
for root, dirs, files in os.walk('.'):
if 'config.json' in files:
with open(os.path.join(root, 'config.json')) as f:
cfg = json.load(f)
print(f"{root}: {cfg['model_type']} | {cfg['pretraining']} | {cfg.get('reduction', 'N/A')}")
Load a Specific Model Programmatically
import json
from modeling import load_model_from_config
# Load config
with open('se-location3/facebased/squeeze-32/config.json') as f:
config = json.load(f)
# Build model with weights
model = load_model_from_config(
config,
weights_path='se-location3/facebased/squeeze-32/model.safetensors',
device='cpu'
)
# Forward pass
import torch
x = torch.randn(1, 3, 224, 224) # dummy input
output = model(x)
Use for Grad-CAM Visualization
from modeling import load_model_from_config
import json
# Load any model
with open('se-location3/facebased/squeeze-32/config.json') as f:
config = json.load(f)
model = load_model_from_config(
config,
'se-location3/facebased/squeeze-32/model.safetensors'
)
# Target the last conv layer for Grad-CAM
target_layer = model.features[-3] # Last Conv2d in features
# ... apply standard Grad-CAM pipeline
6. Repository Structure
SE-AlexNet/
βββ README.md # This Model Card
βββ requirements.txt # Python dependencies
βββ modeling.py # Exact model architecture definitions
βββ inference.py # Universal inference pipeline
βββ convert_weights.py # .pth β .safetensors conversion script
βββ generate_configs.py # Config file generator
β
βββ alexnet/ # Standard AlexNet baseline
β βββ facebased/
β β βββ config.json
β β βββ model.safetensors
β βββ objectbased/
β βββ config.json
β βββ model.safetensors
β
βββ vgg16/ # VGG16 benchmark
β βββ facebased/
β βββ objectbased/
β
βββ se-location1/ # SE after all convolutions
β βββ facebased/
β β βββ squeeze-2/ β squeeze-32/ (5 reduction ratios)
β βββ objectbased/
β βββ squeeze-2/ β squeeze-32/
β
βββ se-location2/ # SE between fc6 β fc7
β βββ facebased/
β βββ objectbased/
β
βββ se-location3/ # SE in classifier (BEST)
βββ facebased/
βββ objectbased/
7. Citation & Related Repositories
Companion Code Repositories
| Repository | Role | Link |
|---|---|---|
| SE-AlexNet | Training code, ablation scripts, raw results | GitHub |
| PsychometricFittingCurve | MATLAB psychometric curve fitting, PSE calculation, human comparison | GitHub |
| GradCAM-ROI-SaliencyMapDecoder | Grad-CAM heatmap generation & ROI statistical decoding | GitHub |
Technical Architecture
SE-AlexNet extends the classic AlexNet by inserting Squeeze-and-Excitation blocks at strategic locations:
- Squeeze: Global Average Pooling compresses spatial information into channel descriptors
- Excitation: A bottleneck MLP learns channel-wise scaling factors
- Recalibration: Feature maps are re-weighted by learned importance scores
The three location variants test the hypothesis that SE blocks are most effective when placed closer to the decision boundary (classifier), where channel-wise feature importance directly impacts classification.
8. Limitations
- Dataset bias: Trained only on AffectNet (posed + spontaneous expressions). Performance may degrade on in-the-wild or cross-cultural expressions.
- Image resolution: Input fixed at 224Γ224. Smaller or lower-quality faces may reduce accuracy.
- Binary vs. Multi-class: VGG16 models use binary classification (Happy/Sad); AlexNet/SE variants use 8-11 classes. Not directly comparable without task alignment.
- Transfer learning only: Convolutional layers are frozen. Models are not trained end-to-end from scratch.
- Architecture age: AlexNet and VGG16 are now classic architectures. These models are for scientific comparison, not state-of-the-art emotion recognition.
9. License
MIT License. See companion GitHub repositories for full license details.
Trained weights converted from original PyTorch .pth checkpoints. For questions about the training methodology or experimental design, please refer to the companion paper and GitHub repositories.
Evaluation results
- accuracy on AffectNetself-reportedBest among SE variants