TSAM β€” Temporal Shift Audio-Visual Model for Viewer Emotion Recognition

Pre-trained model weights for the paper "Decoding Viewer Emotions in Video Ads" by Alexey Antonov, Shravan Sampath Kumar, Jiefei Wei, William Headley, Orlando Wood, and Giovanni Montana, published in Nature Scientific Reports.

Code: github.com/gmontana/DecodingViewerEmotions Dataset: dnamodel/adcumen-viewer-emotions

Model Description

TSAM (Temporal Shift Augmented Module) is a deep learning model that predicts viewer emotional responses to video advertisements. It processes both visual frames and audio tracks from 5-second video clips to classify emotional reactions across seven categories.

Architecture

  • Backbone: ResNet50 pre-trained on ImageNet-21K
  • Temporal modeling: Temporal Shift Module (TSM) for efficient video understanding
  • Audio-visual fusion: Multimodal fusion of visual and audio features
  • Output: 7-class emotion classification

Emotion Classes

ID Emotion
0 Anger
1 Contempt
2 Disgust
3 Fear
4 Happiness
5 Sadness
6 Surprise

Files

File Description
backbone_weights.tar ResNet50 backbone pre-trained on ImageNet-21K
tsam_weights.tar Trained TSAM model checkpoint (best balanced accuracy)

Usage

Download weights

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="dnamodel/tsam-viewer-emotions",
    local_dir="./tsam-weights"
)

Inference

See the code repository for full training and inference instructions.

# 1. Clone the code repo
git clone https://github.com/gmontana/DecodingViewerEmotions.git
cd DecodingViewerEmotions

# 2. Install dependencies
pip install -r requirements.txt

# 3. Download dataset and model weights
# 4. Run setup_data.py to extract frames and audio
# 5. Run predict.py for inference
python predict.py

Requirements

  • Python 3.10+
  • PyTorch 2.5+
  • FFmpeg
  • CUDA-capable GPU

Training Details

  • Training data: 21,392 five-second video clips from video advertisements
  • Validation data: 2,856 clips
  • Test data: 2,387 clips
  • Annotation: Each original advertisement annotated by ~75 viewers using System1's "Test Your Ad" tool
  • Selection criterion: Best balanced accuracy on the validation set

Citation

@article{antonov2024decoding,
  title={Decoding viewer emotions in video ads},
  author={Antonov, Alexey and Kumar, Shravan Sampath and Wei, Jiefei and Headley, William and Wood, Orlando and Montana, Giovanni},
  journal={Scientific Reports},
  volume={14},
  pages={25680},
  year={2024},
  publisher={Nature Publishing Group},
  doi={10.1038/s41598-024-76968-9}
}

License

The TSAM software and associated weights are provided under a custom license from the University of Warwick. Use is permitted solely for academic research and non-commercial evaluation. See the LICENSE file for full terms.

Contact

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train dnamodel/tsam-viewer-emotions