SE-AlexNet: Enhancing Face Perception via Dimension Reduction

A collection of 34 fine-tuned convolutional neural networks for studying how Squeeze-and-Excitation (SE) modules affect facial emotion recognition. This model zoo supports the psychophysical analysis pipeline described in the companion paper.

πŸ“¦ GitHub (Code & Analysis): SE-AlexNet | PsychometricFittingCurve | GradCAM-ROI-SaliencyMapDecoder


1. Model Details

This repository contains weights for 5 model architectures, systematically varied across 3 experimental axes:

Experimental Axis Values
Architecture AlexNet, VGG16, SE-AlexNet-L1, SE-AlexNet-L2, SE-AlexNet-L3
SE Reduction Ratio ($r$) 2, 4, 8, 16, 32 (applies to SE variants)
Pre-training Basis FaceBased (VGGFace2) or ObjectBased (ImageNet)

Architecture Descriptions

Model SE Position Key Characteristic
AlexNet None (baseline) Standard 5-conv AlexNet, num_classes=11
SE-AlexNet-L1 After last conv, before FC stack SE on 256-channel feature maps
SE-AlexNet-L2 Between fc6 β†’ fc7 SE on 4096-dim feature vector, reduction fixed at 16
SE-AlexNet-L3 As first element of classifier Best performing variant β€” SELayer before fc6
VGG16 None (benchmark) Standard VGG16, num_classes=2 (binary Happy/Sad)

⚠️ Important note on SE-Location-2: All 10 L2 variants share identical architecture (reduction=16). The squeeze-{r} label refers to a training/data configuration, not the architectural reduction ratio. This is preserved for reproducibility.


2. Intended Use

These models are designed for visual feature interpretability analysis in facial emotion recognition research. Specific use cases:

  • Psychophysical model comparison: Compare model perceptual biases (Points of Subjective Equality, PSE) against human observers
  • Grad-CAM ROI analysis: Visualize which facial regions (eyes, nose, mouth) different architectures attend to
  • Dimension reduction study: Quantify how SE-based channel recalibration affects representational efficiency
  • Pre-training effect study: Compare face-based (VGGFace2) vs. object-based (ImageNet) feature transfer

Not intended for: Production emotion recognition systems, clinical diagnosis, or surveillance applications. These are research models trained on controlled lab datasets.


3. Training Data

All models were fine-tuned on AffectNet, the largest facial expression dataset:

  • Training set: 28,000 facial images (modified subset)
  • Validation set: 1,000 facial images
  • Classes: 8 basic emotions (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger, Contempt) β€” some variants use 11-class or binary labels
  • Preprocessing: Standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

Pre-training sources:

  • FaceBased: VGGFace2 (faces only β€” captures facial identity features)
  • ObjectBased: ImageNet-1K (general objects β€” captures generic visual features)

Training hyperparameters:

  • Batch size: 128
  • Learning rate: 1e-4 (Adam optimizer)
  • Epochs: 40
  • Frozen conv layers (transfer learning), only SE blocks + classifier trained

4. Evaluation & Performance

Key Findings from Ablation Study

Metric Finding
Best Architecture SE-AlexNet-L3 (SE block closest to classifier)
Optimal Reduction $r=32$ consistently outperformed lower reductions
Pre-training Effect Face-based pre-training improved emotion discrimination by ~12% over object-based
ROI Attention SE models showed more focused attention on mouth and eye regions vs. baseline AlexNet

Psychophysical Benchmarking

Models were evaluated against human observers ($N=40$) in a 2AFC Happy/Sad discrimination task:

  • Metric: RMSE between model PSE and human PSE distribution
  • Statistical tests: One-sample t-tests, Cohen's d effect sizes
  • Full results: See PsychometricFittingCurve repository

5. How to Get Started

Installation

pip install -r requirements.txt

Quick Start β€” Forward Pass in 10 Lines

from inference import SEModelPipeline

# Load the best model (SE-Location3, FaceBased, r=32)
pipe = SEModelPipeline('se-location3/facebased/squeeze-32')

# Run inference
probs = pipe.predict('path/to/face_image.jpg')
print(f'Prediction shape: {probs.shape}')  # (1, 11)

List Available Models

import os, json

for root, dirs, files in os.walk('.'):
    if 'config.json' in files:
        with open(os.path.join(root, 'config.json')) as f:
            cfg = json.load(f)
        print(f"{root}: {cfg['model_type']} | {cfg['pretraining']} | {cfg.get('reduction', 'N/A')}")

Load a Specific Model Programmatically

import json
from modeling import load_model_from_config

# Load config
with open('se-location3/facebased/squeeze-32/config.json') as f:
    config = json.load(f)

# Build model with weights
model = load_model_from_config(
    config,
    weights_path='se-location3/facebased/squeeze-32/model.safetensors',
    device='cpu'
)

# Forward pass
import torch
x = torch.randn(1, 3, 224, 224)  # dummy input
output = model(x)

Use for Grad-CAM Visualization

from modeling import load_model_from_config
import json

# Load any model
with open('se-location3/facebased/squeeze-32/config.json') as f:
    config = json.load(f)
model = load_model_from_config(
    config,
    'se-location3/facebased/squeeze-32/model.safetensors'
)

# Target the last conv layer for Grad-CAM
target_layer = model.features[-3]  # Last Conv2d in features
# ... apply standard Grad-CAM pipeline

6. Repository Structure

SE-AlexNet/
β”œβ”€β”€ README.md                    # This Model Card
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ modeling.py                  # Exact model architecture definitions
β”œβ”€β”€ inference.py                 # Universal inference pipeline
β”œβ”€β”€ convert_weights.py           # .pth β†’ .safetensors conversion script
β”œβ”€β”€ generate_configs.py          # Config file generator
β”‚
β”œβ”€β”€ alexnet/                     # Standard AlexNet baseline
β”‚   β”œβ”€β”€ facebased/
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   └── model.safetensors
β”‚   └── objectbased/
β”‚       β”œβ”€β”€ config.json
β”‚       └── model.safetensors
β”‚
β”œβ”€β”€ vgg16/                       # VGG16 benchmark
β”‚   β”œβ”€β”€ facebased/
β”‚   └── objectbased/
β”‚
β”œβ”€β”€ se-location1/                # SE after all convolutions
β”‚   β”œβ”€β”€ facebased/
β”‚   β”‚   β”œβ”€β”€ squeeze-2/ β†’ squeeze-32/  (5 reduction ratios)
β”‚   └── objectbased/
β”‚       └── squeeze-2/ β†’ squeeze-32/
β”‚
β”œβ”€β”€ se-location2/                # SE between fc6 β†’ fc7
β”‚   β”œβ”€β”€ facebased/
β”‚   └── objectbased/
β”‚
└── se-location3/                # SE in classifier (BEST)
    β”œβ”€β”€ facebased/
    └── objectbased/

7. Citation & Related Repositories

Companion Code Repositories

Repository Role Link
SE-AlexNet Training code, ablation scripts, raw results GitHub
PsychometricFittingCurve MATLAB psychometric curve fitting, PSE calculation, human comparison GitHub
GradCAM-ROI-SaliencyMapDecoder Grad-CAM heatmap generation & ROI statistical decoding GitHub

Technical Architecture

SE-AlexNet extends the classic AlexNet by inserting Squeeze-and-Excitation blocks at strategic locations:

  • Squeeze: Global Average Pooling compresses spatial information into channel descriptors
  • Excitation: A bottleneck MLP learns channel-wise scaling factors
  • Recalibration: Feature maps are re-weighted by learned importance scores

The three location variants test the hypothesis that SE blocks are most effective when placed closer to the decision boundary (classifier), where channel-wise feature importance directly impacts classification.


8. Limitations

  • Dataset bias: Trained only on AffectNet (posed + spontaneous expressions). Performance may degrade on in-the-wild or cross-cultural expressions.
  • Image resolution: Input fixed at 224Γ—224. Smaller or lower-quality faces may reduce accuracy.
  • Binary vs. Multi-class: VGG16 models use binary classification (Happy/Sad); AlexNet/SE variants use 8-11 classes. Not directly comparable without task alignment.
  • Transfer learning only: Convolutional layers are frozen. Models are not trained end-to-end from scratch.
  • Architecture age: AlexNet and VGG16 are now classic architectures. These models are for scientific comparison, not state-of-the-art emotion recognition.

9. License

MIT License. See companion GitHub repositories for full license details.


Trained weights converted from original PyTorch .pth checkpoints. For questions about the training methodology or experimental design, please refer to the companion paper and GitHub repositories.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results