SE-AlexNet: Enhancing Face Perception via Dimension Reduction

A collection of 34 fine-tuned convolutional neural networks for studying how Squeeze-and-Excitation (SE) modules affect facial emotion recognition. This model zoo supports the psychophysical analysis pipeline described in the companion paper.

📦 GitHub (Code & Analysis): SE-AlexNet | PsychometricFittingCurve | GradCAM-ROI-SaliencyMapDecoder

1. Model Details

This repository contains weights for 5 model architectures, systematically varied across 3 experimental axes:

Experimental Axis	Values
Architecture	AlexNet, VGG16, SE-AlexNet-L1, SE-AlexNet-L2, SE-AlexNet-L3
SE Reduction Ratio ($r$)	2, 4, 8, 16, 32 (applies to SE variants)
Pre-training Basis	FaceBased (VGGFace2) or ObjectBased (ImageNet)

Architecture Descriptions

Model	SE Position	Key Characteristic
AlexNet	None (baseline)	Standard 5-conv AlexNet, num_classes=11
SE-AlexNet-L1	After last conv, before FC stack	SE on 256-channel feature maps
SE-AlexNet-L2	Between fc6 → fc7	SE on 4096-dim feature vector, reduction fixed at 16
SE-AlexNet-L3	As first element of classifier	Best performing variant — SELayer before fc6
VGG16	None (benchmark)	Standard VGG16, num_classes=2 (binary Happy/Sad)

⚠️ Important note on SE-Location-2: All 10 L2 variants share identical architecture (reduction=16). The squeeze-{r} label refers to a training/data configuration, not the architectural reduction ratio. This is preserved for reproducibility.

2. Intended Use

These models are designed for visual feature interpretability analysis in facial emotion recognition research. Specific use cases:

Psychophysical model comparison: Compare model perceptual biases (Points of Subjective Equality, PSE) against human observers
Grad-CAM ROI analysis: Visualize which facial regions (eyes, nose, mouth) different architectures attend to
Dimension reduction study: Quantify how SE-based channel recalibration affects representational efficiency
Pre-training effect study: Compare face-based (VGGFace2) vs. object-based (ImageNet) feature transfer

Not intended for: Production emotion recognition systems, clinical diagnosis, or surveillance applications. These are research models trained on controlled lab datasets.

3. Training Data

All models were fine-tuned on AffectNet, the largest facial expression dataset:

Training set: 28,000 facial images (modified subset)
Validation set: 1,000 facial images
Classes: 8 basic emotions (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger, Contempt) — some variants use 11-class or binary labels
Preprocessing: Standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

Pre-training sources:

FaceBased: VGGFace2 (faces only — captures facial identity features)
ObjectBased: ImageNet-1K (general objects — captures generic visual features)

Training hyperparameters:

Batch size: 128
Learning rate: 1e-4 (Adam optimizer)
Epochs: 40
Frozen conv layers (transfer learning), only SE blocks + classifier trained

4. Evaluation & Performance

Key Findings from Ablation Study

Metric	Finding
Best Architecture	SE-AlexNet-L3 (SE block closest to classifier)
Optimal Reduction	$r=32$ consistently outperformed lower reductions
Pre-training Effect	Face-based pre-training improved emotion discrimination by ~12% over object-based
ROI Attention	SE models showed more focused attention on mouth and eye regions vs. baseline AlexNet

Psychophysical Benchmarking

Models were evaluated against human observers ($N=40$) in a 2AFC Happy/Sad discrimination task:

Metric: RMSE between model PSE and human PSE distribution
Statistical tests: One-sample t-tests, Cohen's d effect sizes
Full results: See PsychometricFittingCurve repository

5. How to Get Started

Installation

pip install -r requirements.txt

Quick Start — Forward Pass in 10 Lines

from inference import SEModelPipeline

# Load the best model (SE-Location3, FaceBased, r=32)
pipe = SEModelPipeline('se-location3/facebased/squeeze-32')

# Run inference
probs = pipe.predict('path/to/face_image.jpg')
print(f'Prediction shape: {probs.shape}')  # (1, 11)

List Available Models

import os, json

for root, dirs, files in os.walk('.'):
    if 'config.json' in files:
        with open(os.path.join(root, 'config.json')) as f:
            cfg = json.load(f)
        print(f"{root}: {cfg['model_type']} | {cfg['pretraining']} | {cfg.get('reduction', 'N/A')}")

Load a Specific Model Programmatically

import json
from modeling import load_model_from_config

# Load config
with open('se-location3/facebased/squeeze-32/config.json') as f:
    config = json.load(f)

# Build model with weights
model = load_model_from_config(
    config,
    weights_path='se-location3/facebased/squeeze-32/model.safetensors',
    device='cpu'
)

# Forward pass
import torch
x = torch.randn(1, 3, 224, 224)  # dummy input
output = model(x)

Use for Grad-CAM Visualization

from modeling import load_model_from_config
import json

# Load any model
with open('se-location3/facebased/squeeze-32/config.json') as f:
    config = json.load(f)
model = load_model_from_config(
    config,
    'se-location3/facebased/squeeze-32/model.safetensors'
)

# Target the last conv layer for Grad-CAM
target_layer = model.features[-3]  # Last Conv2d in features
# ... apply standard Grad-CAM pipeline

6. Repository Structure

SE-AlexNet/
├── README.md                    # This Model Card
├── requirements.txt             # Python dependencies
├── modeling.py                  # Exact model architecture definitions
├── inference.py                 # Universal inference pipeline
├── convert_weights.py           # .pth → .safetensors conversion script
├── generate_configs.py          # Config file generator
│
├── alexnet/                     # Standard AlexNet baseline
│   ├── facebased/
│   │   ├── config.json
│   │   └── model.safetensors
│   └── objectbased/
│       ├── config.json
│       └── model.safetensors
│
├── vgg16/                       # VGG16 benchmark
│   ├── facebased/
│   └── objectbased/
│
├── se-location1/                # SE after all convolutions
│   ├── facebased/
│   │   ├── squeeze-2/ → squeeze-32/  (5 reduction ratios)
│   └── objectbased/
│       └── squeeze-2/ → squeeze-32/
│
├── se-location2/                # SE between fc6 → fc7
│   ├── facebased/
│   └── objectbased/
│
└── se-location3/                # SE in classifier (BEST)
    ├── facebased/
    └── objectbased/

7. Citation & Related Repositories

Companion Code Repositories

Repository	Role	Link
SE-AlexNet	Training code, ablation scripts, raw results	GitHub
PsychometricFittingCurve	MATLAB psychometric curve fitting, PSE calculation, human comparison	GitHub
GradCAM-ROI-SaliencyMapDecoder	Grad-CAM heatmap generation & ROI statistical decoding	GitHub

Technical Architecture

SE-AlexNet extends the classic AlexNet by inserting Squeeze-and-Excitation blocks at strategic locations:

Squeeze: Global Average Pooling compresses spatial information into channel descriptors
Excitation: A bottleneck MLP learns channel-wise scaling factors
Recalibration: Feature maps are re-weighted by learned importance scores

The three location variants test the hypothesis that SE blocks are most effective when placed closer to the decision boundary (classifier), where channel-wise feature importance directly impacts classification.

8. Limitations

Dataset bias: Trained only on AffectNet (posed + spontaneous expressions). Performance may degrade on in-the-wild or cross-cultural expressions.
Image resolution: Input fixed at 224×224. Smaller or lower-quality faces may reduce accuracy.
Binary vs. Multi-class: VGG16 models use binary classification (Happy/Sad); AlexNet/SE variants use 8-11 classes. Not directly comparable without task alignment.
Transfer learning only: Convolutional layers are frozen. Models are not trained end-to-end from scratch.
Architecture age: AlexNet and VGG16 are now classic architectures. These models are for scientific comparison, not state-of-the-art emotion recognition.

9. License

MIT License. See companion GitHub repositories for full license details.

Trained weights converted from original PyTorch .pth checkpoints. For questions about the training methodology or experimental design, please refer to the companion paper and GitHub repositories.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

accuracy on AffectNet
self-reported

Best among SE variants