|
|
--- |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- openai/clip-vit-base-patch32 |
|
|
--- |
|
|
# EmotionCLIP Model |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
## Project Overview |
|
|
|
|
|
EmotionCLIP is an open-domain multimodal emotion perception model built on CLIP. This model aims to perform broad emotion recognition through multimodal inputs such as faces, scenes, and photos, supporting the analysis of emotional attributes in images, scene layouts, and even artworks. |
|
|
|
|
|
## Datasets |
|
|
|
|
|
The model is trained using the following datasets: |
|
|
|
|
|
1. **EmoSet**: |
|
|
- Citation: |
|
|
``` |
|
|
@inproceedings{yang2023emoset, |
|
|
title={EmoSet: A Large-Scale Visual Emotion Dataset with Rich Attributes}, |
|
|
author={Yang, Jingyuan and Huang, Qirui and Ding, Tingting and Lischinski, Dani and Cohen-Or, Danny and Huang, Hui}, |
|
|
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, |
|
|
pages={20383--20394}, |
|
|
year={2023} |
|
|
} |
|
|
``` |
|
|
- This dataset contains rich emotional labels and visual features, providing a foundation for emotion perception.In this model, We use the dataset Emoset118K. |
|
|
|
|
|
2. **Open Human Facial Emotion Recognition Dataset**: |
|
|
- Contains nearly 10,000 images with emotion labels gathered from wild scenes to enhance the model's capability in facial emotion recognition. |
|
|
|
|
|
## training method |
|
|
|
|
|
Prefix-Tuning |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Fine-tuning Weights |
|
|
|
|
|
This repository provides one fine-tuned weights: |
|
|
|
|
|
1. **EmotionCLIP Weights** |
|
|
- Fine-tuned on the EmoSet 118K dataset, without additional training specifically for facial emotion recognition. |
|
|
- Final evaluation results: |
|
|
- Accuracy: 0.8042 |
|
|
- Recall: 0.8042 |
|
|
- F1: 0.8057 |
|
|
|
|
|
|
|
|
|
|
|
## Usage Instructions |
|
|
|
|
|
```bash |
|
|
git clone https://huggingface.co/jiangchengchengNLP/EmotionCLIP |
|
|
|
|
|
cd EmotionCLIP |
|
|
# Create your own test file to store images ending in JPG, or organize images from the repository for testing |
|
|
# By default, MixCLIP weights are used. Run the following python command in the current folder. |
|
|
``` |
|
|
|
|
|
```python |
|
|
from EmotionCLIP import model, preprocess, tokenizer |
|
|
from PIL import Image |
|
|
import torch |
|
|
import matplotlib.pyplot as plt |
|
|
import os |
|
|
from torch.nn import functional as F |
|
|
|
|
|
# Image folder path |
|
|
image_folder = r'./test' #test images are in EmotionCLIP repo : jiangchengchengNLP/EmotionCLIP |
|
|
image_files = [os.path.join(image_folder, f) for f in os.listdir(image_folder) if f.endswith('.jpg')] |
|
|
|
|
|
# Emotion label mapping |
|
|
consist_json = { |
|
|
'amusement': 0, |
|
|
'anger': 1, |
|
|
'awe': 2, |
|
|
'contentment': 3, |
|
|
'disgust': 4, |
|
|
'excitement': 5, |
|
|
'fear': 6, |
|
|
'sadness': 7, |
|
|
'neutral': 8 |
|
|
} |
|
|
reversal_json = {v: k for k, v in consist_json.items()} |
|
|
text_list = [f"This picture conveys a sense of {key}" for key in consist_json.keys()] |
|
|
text_input = tokenizer(text_list) |
|
|
|
|
|
# Create subplots |
|
|
num_images = len(image_files) |
|
|
rows = 3 # 3 rows |
|
|
cols = 3 # 3 columns |
|
|
fig, axes = plt.subplots(rows, cols, figsize=(15, 10)) # Adjust the canvas size |
|
|
axes = axes.flatten() # Flatten the subplots to a 1D array |
|
|
title_fontsize = 20 |
|
|
|
|
|
# Iterate through each image |
|
|
for idx, img_path in enumerate(image_files): |
|
|
# Load image |
|
|
img = Image.open(img_path) |
|
|
img_input = preprocess(img) |
|
|
|
|
|
# Predict emotion |
|
|
with torch.no_grad(): |
|
|
logits_per_image, _ = model(img_input.unsqueeze(0).to(device=model.device, dtype=model.dtype), text_input.to(device=model.device)) |
|
|
softmax_logits_per_image = F.softmax(logits_per_image, dim=-1) |
|
|
top_k_values, top_k_indexes = torch.topk(softmax_logits_per_image, k=1, dim=-1) |
|
|
predicted_emotion = reversal_json[top_k_indexes.item()] |
|
|
|
|
|
# Display image and prediction result |
|
|
ax = axes[idx] |
|
|
ax.imshow(img) |
|
|
ax.set_title(f"Predicted: {predicted_emotion}", fontsize=title_fontsize) |
|
|
ax.axis('off') |
|
|
|
|
|
# Hide any extra subplots |
|
|
for idx in range(num_images, rows * cols): |
|
|
axes[idx].axis('off') |
|
|
|
|
|
plt.tight_layout() |
|
|
plt.show() |
|
|
``` |
|
|
|
|
|
|
|
|
--- |