File size: 3,986 Bytes
beb9e66
 
 
 
 
 
 
02716f7
 
 
580ea9f
02716f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1eaafe
02716f7
 
 
 
5b79907
 
a7f3432
 
 
 
5b79907
02716f7
 
a857c64
02716f7
a7f3432
02716f7
 
367bd3d
 
 
 
a857c64
02716f7
 
 
 
 
 
 
256d5cb
02716f7
 
 
 
 
 
 
 
 
 
 
 
a857c64
02716f7
 
 
 
 
 
 
 
 
 
 
 
a857c64
02716f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
367bd3d
02716f7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: apache-2.0
metrics:
- accuracy
base_model:
- openai/clip-vit-base-patch32
---
# EmotionCLIP Model


![image/png](https://cdn-uploads.huggingface.co/production/uploads/662f655a02d973f5970ccbd3/Po7ZsuwjpDM1v3T6HMCnW.png)



## Project Overview

EmotionCLIP is an open-domain multimodal emotion perception model built on CLIP. This model aims to perform broad emotion recognition through multimodal inputs such as faces, scenes, and photos, supporting the analysis of emotional attributes in images, scene layouts, and even artworks.

## Datasets

The model is trained using the following datasets:

1. **EmoSet**:  
   - Citation:
     ```
     @inproceedings{yang2023emoset,
       title={EmoSet: A Large-Scale Visual Emotion Dataset with Rich Attributes},
       author={Yang, Jingyuan and Huang, Qirui and Ding, Tingting and Lischinski, Dani and Cohen-Or, Danny and Huang, Hui},
       booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
       pages={20383--20394},
       year={2023}
     }
     ```
   - This dataset contains rich emotional labels and visual features, providing a foundation for emotion perception.In this model, We use the dataset Emoset118K.

2. **Open Human Facial Emotion Recognition Dataset**:  
   - Contains nearly 10,000 images with emotion labels gathered from wild scenes to enhance the model's capability in facial emotion recognition.

## training method

Prefix-Tuning




## Fine-tuning Weights

This repository provides one fine-tuned weights:

1. **EmotionCLIP Weights**
   - Fine-tuned on the EmoSet 118K dataset, without additional training specifically for facial emotion recognition.
   - Final evaluation results:
     - Accuracy: 0.8042
     - Recall: 0.8042
     - F1: 0.8057



## Usage Instructions

```bash
git clone https://huggingface.co/jiangchengchengNLP/EmotionCLIP

cd EmotionCLIP
# Create your own test file to store images ending in JPG, or organize images from the repository for testing
# By default, MixCLIP weights are used. Run the following python command in the current folder.
```

```python
from EmotionCLIP import model, preprocess, tokenizer
from PIL import Image
import torch
import matplotlib.pyplot as plt
import os
from torch.nn import functional as F

# Image folder path
image_folder = r'./test'  #test images are in EmotionCLIP repo : jiangchengchengNLP/EmotionCLIP
image_files = [os.path.join(image_folder, f) for f in os.listdir(image_folder) if f.endswith('.jpg')]

# Emotion label mapping
consist_json = {
    'amusement': 0,
    'anger': 1,
    'awe': 2,
    'contentment': 3,
    'disgust': 4,
    'excitement': 5,
    'fear': 6,
    'sadness': 7,
    'neutral': 8
}
reversal_json = {v: k for k, v in consist_json.items()}
text_list = [f"This picture conveys a sense of {key}" for key in consist_json.keys()]
text_input = tokenizer(text_list)

# Create subplots
num_images = len(image_files)
rows = 3  # 3 rows
cols = 3  # 3 columns
fig, axes = plt.subplots(rows, cols, figsize=(15, 10))  # Adjust the canvas size
axes = axes.flatten()  # Flatten the subplots to a 1D array
title_fontsize = 20

# Iterate through each image
for idx, img_path in enumerate(image_files):
    # Load image
    img = Image.open(img_path)
    img_input = preprocess(img)

    # Predict emotion
    with torch.no_grad():
        logits_per_image, _ = model(img_input.unsqueeze(0).to(device=model.device, dtype=model.dtype), text_input.to(device=model.device))
    softmax_logits_per_image = F.softmax(logits_per_image, dim=-1)
    top_k_values, top_k_indexes = torch.topk(softmax_logits_per_image, k=1, dim=-1)
    predicted_emotion = reversal_json[top_k_indexes.item()]

    # Display image and prediction result
    ax = axes[idx]
    ax.imshow(img)
    ax.set_title(f"Predicted: {predicted_emotion}", fontsize=title_fontsize)
    ax.axis('off')

# Hide any extra subplots
for idx in range(num_images, rows * cols):
    axes[idx].axis('off')

plt.tight_layout()
plt.show()
```


---