ChristophSchuhmann commited on
Commit
efff45a
·
verified ·
1 Parent(s): 09e6170

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +286 -1
README.md CHANGED
@@ -1,3 +1,288 @@
1
  ---
2
  license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ ---
4
+ Okay, I've updated the README.md with the corrected author list you provided and made the requested changes to the code example.
5
+
6
+ Here's the revised README.md:
7
+
8
+ # Empathic-Insight-Face-Large
9
+
10
+ **Empathic-Insight-Face-Large** is a set of 40 emotion regression models trained on the EMoNet-FACE benchmark suite. Each model is designed to predict the intensity of a specific fine-grained emotion from facial expressions. These models are built on top of SigLIP2 image embeddings followed by MLP regression heads.
11
+
12
+ This work is based on the research paper:
13
+ **"EMONET-FACE: An Expert-Annotated Benchmark for Synthetic Emotion Recognition"**
14
+ *Authors: Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, Maurice Kraus, Felix Friedrich, Huu Nguyen, Krishna Kalyan, Kourosh Nadi, Kristian Kersting, Sören Auer.*
15
+ *(Please refer to the full paper for a complete list of authors and affiliations if applicable).*
16
+ *Paper link: (Insert ArXiv/Conference link here when available)*
17
+
18
+ The models and datasets are released under the **CC-BY-4.0 license**.
19
+
20
+ ## Model Description
21
+
22
+ The Empathic-Insight-Face-Large suite consists of 40 individual MLP models. Each model takes a 1152-dimensional SigLIP2 image embedding as input and outputs a continuous score (typically 0-7, can be mean-subtracted) for one of the 40 emotion categories defined in the EMoNet-FACE taxonomy.
23
+
24
+ The models were pre-trained on the EMoNet-FACE BIG dataset (over 203k synthetic images with generated labels) and fine-tuned on the EMoNet-FACE BINARY dataset (nearly 20k synthetic images with over 65k human expert binary annotations).
25
+
26
+ **Key Features:**
27
+ * **Fine-grained Emotions:** Covers a novel 40-category emotion taxonomy.
28
+ * **High Performance:** Achieves human-expert-level performance on the EMoNet-FACE HQ benchmark.
29
+ * **Synthetic Data:** Trained on AI-generated, demographically balanced, full-face expressions.
30
+ * **Open:** Publicly released models, datasets, and taxonomy.
31
+
32
+ ## Intended Use
33
+
34
+ These models are intended for research purposes in affective computing, human-AI interaction, and emotion recognition. They can be used to:
35
+ * Analyze and predict fine-grained emotional expressions in synthetic facial images.
36
+ * Serve as a baseline for developing more advanced emotion recognition systems.
37
+ * Facilitate research into nuanced emotional understanding in AI.
38
+
39
+ **Out-of-Scope Use:**
40
+ These models are trained on synthetic faces and may not generalize well to real-world, in-the-wild images without further adaptation. They should not be used for making critical decisions about individuals, for surveillance, or in any manner that could lead to discriminatory outcomes.
41
+
42
+ ## How to Use
43
+
44
+ These are individual `.pth` files, each corresponding to one emotion classifier. To use them, you will typically:
45
+
46
+ 1. **Obtain SigLIP2 Embeddings:**
47
+ * Use a pre-trained SigLIP2 model (e.g., `google/siglip2-so400m-patch16-384`).
48
+ * Extract the 1152-dimensional image embedding for your target facial image.
49
+ 2. **Load an MLP Model:**
50
+ * Each `.pth` file (e.g., `model_elation_best.pth`) is a PyTorch state dictionary for an MLP.
51
+ * The MLP architecture used for "Empathic-Insight-Face-Large" (big models) is:
52
+ * Input: 1152 features
53
+ * Hidden Layer 1: 1024 neurons, ReLU, Dropout (0.2)
54
+ * Hidden Layer 2: 512 neurons, ReLU, Dropout (0.2)
55
+ * Hidden Layer 3: 256 neurons, ReLU, Dropout (0.2)
56
+ * Output Layer: 1 neuron (continuous score)
57
+ 3. **Perform Inference:**
58
+ * Pass the SigLIP2 embedding through the loaded MLP model(s).
59
+ 4. **(Optional) Mean Subtraction:**
60
+ * The raw output scores can be adjusted by subtracting the model's mean score on neutral faces. The `neutral_stats_cache-_human-binary-big-mlps_v8_two_stage_higher_lr_stage2_5_200+` file in this repository contains these mean values for each emotion model.
61
+
62
+ **Example (Conceptual PyTorch for all 40 emotions):**
63
+
64
+ ```python
65
+ import torch
66
+ import torch.nn as nn
67
+ from transformers import AutoModel, AutoProcessor
68
+ from PIL import Image
69
+ import numpy as np
70
+ import json
71
+ import os # For listing model files
72
+ from pathlib import Path
73
+
74
+ # --- 1. Define MLP Architecture (Big Model) ---
75
+ class MLP(nn.Module):
76
+ def __init__(self, input_size=1152, output_size=1):
77
+ super().__init__()
78
+ self.layers = nn.Sequential(
79
+ nn.Linear(input_size, 1024),
80
+ nn.ReLU(),
81
+ nn.Dropout(0.2),
82
+ nn.Linear(1024, 512),
83
+ nn.ReLU(),
84
+ nn.Dropout(0.2),
85
+ nn.Linear(512, 256),
86
+ nn.ReLU(),
87
+ nn.Dropout(0.2),
88
+ nn.Linear(256, output_size)
89
+ )
90
+ def forward(self, x):
91
+ return self.layers(x)
92
+
93
+ # --- 2. Load Models and Processor ---
94
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
95
+
96
+ # === IMPORTANT: Set this to the directory where your .pth models are downloaded ===
97
+ # If you've cloned the repo, it might be "./" or the name of the cloned folder.
98
+ MODEL_DIRECTORY = Path("./Empathic-Insight-Face-Large") # ADJUST THIS PATH
99
+ # ================================================================================
100
+
101
+
102
+ # Load SigLIP (ensure it's the correct one for 1152 dim)
103
+ siglip_model_id = "google/siglip2-so400m-patch16-384" # Produces 1152-dim embeddings
104
+ siglip_processor = AutoProcessor.from_pretrained(siglip_model_id)
105
+ siglip_model = AutoModel.from_pretrained(siglip_model_id).to(device).eval()
106
+
107
+ # Load neutral stats
108
+ neutral_stats_filename = "neutral_stats_cache-_human-binary-big-mlps_v8_two_stage_higher_lr_stage2_5_200+"
109
+ neutral_stats_path = MODEL_DIRECTORY / neutral_stats_filename
110
+ neutral_stats_all = {}
111
+ if neutral_stats_path.exists():
112
+ with open(neutral_stats_path, 'r') as f:
113
+ neutral_stats_all = json.load(f)
114
+ else:
115
+ print(f"Warning: Neutral stats file not found at {neutral_stats_path}. Mean subtraction will use 0.0.")
116
+
117
+
118
+ # Load all emotion MLP models
119
+ emotion_mlps = {}
120
+ print(f"Loading emotion MLP models from: {MODEL_DIRECTORY}")
121
+ for pth_file in MODEL_DIRECTORY.glob("model_*_best.pth"):
122
+ model_key_name = pth_file.stem # e.g., "model_elation_best"
123
+ try:
124
+ mlp_model = MLP().to(device)
125
+ mlp_model.load_state_dict(torch.load(pth_file, map_location=device))
126
+ mlp_model.eval()
127
+ emotion_mlps[model_key_name] = mlp_model
128
+ # print(f"Loaded: {model_key_name}")
129
+ except Exception as e:
130
+ print(f"Error loading {model_key_name}: {e}")
131
+
132
+ if not emotion_mlps:
133
+ print(f"Error: No MLP models loaded. Check MODEL_DIRECTORY: {MODEL_DIRECTORY}")
134
+ else:
135
+ print(f"Successfully loaded {len(emotion_mlps)} emotion MLP models.")
136
+
137
+
138
+ # --- 3. Prepare Image and Get Embedding ---
139
+ def normalized(a, axis=-1, order=2):
140
+ l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
141
+ l2[l2 == 0] = 1
142
+ return a / np.expand_dims(l2, axis)
143
+
144
+ # === Replace with your actual image path ===
145
+ # image_path = "path/to/your/image.jpg"
146
+ # try:
147
+ # image = Image.open(image_path).convert("RGB")
148
+ # inputs = siglip_processor(images=[image], return_tensors="pt").to(device)
149
+ # with torch.no_grad():
150
+ # image_features = siglip_model.get_image_features(**inputs)
151
+ # embedding_numpy_normalized = normalized(image_features.cpu().numpy())
152
+ # embedding_tensor = torch.from_numpy(embedding_numpy_normalized).to(device).float()
153
+ # except FileNotFoundError:
154
+ # print(f"Error: Image not found at {image_path}")
155
+ # embedding_tensor = None # Or handle error as appropriate
156
+ # ==========================================
157
+
158
+ # --- For demonstration, let's use a random embedding if no image is processed ---
159
+ print("\nUsing a random embedding for demonstration purposes.")
160
+ embedding_tensor = torch.randn(1, 1152).to(device).float()
161
+ # ==============================================================================
162
+
163
+
164
+ # --- 4. Inference for all loaded models ---
165
+ results = {}
166
+ if embedding_tensor is not None and emotion_mlps:
167
+ with torch.no_grad():
168
+ for model_key_name, mlp_model_instance in emotion_mlps.items():
169
+ raw_score = mlp_model_instance(embedding_tensor).item()
170
+ neutral_mean = neutral_stats_all.get(model_key_name, {}).get("mean", 0.0)
171
+ mean_subtracted_score = raw_score - neutral_mean
172
+
173
+ emotion_name = model_key_name.replace("model_", "").replace("_best", "").replace("_", " ").title()
174
+ results[emotion_name] = {
175
+ "raw_score": raw_score,
176
+ "neutral_mean": neutral_mean,
177
+ "mean_subtracted_score": mean_subtracted_score
178
+ }
179
+
180
+ # Print results
181
+ print("\n--- Emotion Scores ---")
182
+ for emotion, scores in sorted(results.items()):
183
+ print(f"{emotion:<35}: Mean-Subtracted = {scores['mean_subtracted_score']:.4f} (Raw = {scores['raw_score']:.4f}, Neutral Mean = {scores['neutral_mean']:.4f})")
184
+ else:
185
+ print("Skipping inference as either embedding tensor is None or no MLP models were loaded.")
186
+
187
+ Performance on EMoNet-FACE HQ Benchmark
188
+
189
+ The Empathic-Insight-Face models demonstrate strong performance, achieving near human-expert-level agreement on the EMoNet-FACE HQ benchmark.
190
+
191
+ Key Metric: Weighted Kappa (κ<sub>w</sub>) Agreement with Human Annotators
192
+ (Aggregated pairwise agreement between model predictions and individual human expert annotations on the EMoNet-FACE HQ dataset)
193
+
194
+ Annotator Group Mean κ<sub>w</sub> (vs. Humans)
195
+ Human Annotators (vs. Humans) ~0.20 - 0.26*
196
+ Empathic-Insight-Face LARGE ~0.18
197
+ Empathic-Insight-Face SMALL ~0.14
198
+ Proprietary Models (e.g., HumeFace) ~0.11
199
+ VLMs (Multi-Shot Prompt) Highly Variable
200
+ VLMs (Zero-Shot Prompt) Highly Variable
201
+ Random Baseline ~0.00
202
+
203
+ Human inter-annotator agreement (pairwise κ<sub>w</sub>) varies per annotator; this is an approximate range from Table 6 in the paper.
204
+
205
+ Interpretation (from paper Figure 3 & Table 6):
206
+
207
+ Empathic-Insight-Face LARGE (our big models) achieves agreement scores that are statistically very close to human inter-annotator agreement and significantly outperforms other evaluated systems like proprietary models and general-purpose VLMs on this benchmark.
208
+
209
+ The performance indicates that with focused dataset construction and careful fine-tuning, specialized models can approach human-level reliability on synthetic facial emotion recognition tasks for fine-grained emotions.
210
+
211
+ For more detailed benchmark results, including per-emotion performance and comparisons with other models using Spearman's Rho, please refer to the full EMoNet-FACE paper (Figures 3, 4, 9 and Table 6 in particular).
212
+
213
+ Taxonomy
214
+
215
+ The 40 emotion categories are:
216
+ Affection, Amusement, Anger, Astonishment/Surprise, Awe, Bitterness, Concentration, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Helplessness, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States of Consciousness, Jealousy & Envy, Longing, Malevolence/Malice, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Sexual Lust, Shame, Sourness, Teasing, Thankfulness/Gratitude, Triumph.
217
+
218
+ (See Table 4 in the paper for associated descriptive words for each category).
219
+
220
+ Limitations
221
+
222
+ Synthetic Data: Models are trained on synthetic faces. Generalization to real-world, diverse, in-the-wild images is not guaranteed and requires further investigation.
223
+
224
+ Static Faces: Analysis is restricted to static facial expressions, without broader contextual or multimodal cues.
225
+
226
+ Cultural Universality: The 40-category taxonomy, while expert-validated, is one perspective; its universality across cultures is an open research question.
227
+
228
+ Subjectivity: Emotion perception is inherently subjective.
229
+
230
+ Ethical Considerations
231
+
232
+ The EMoNet-FACE suite was developed with ethical considerations in mind, including:
233
+
234
+ Mitigating Bias: Efforts were made to create demographically diverse synthetic datasets and prompts were manually filtered.
235
+
236
+ No PII: All images are synthetic, and no personally identifiable information was used.
237
+
238
+ Responsible Use: These models are released for research. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation or in ways that could lead to unfair or harmful outcomes.
239
+
240
+ Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fairness" sections in the EMoNet-FACE paper for a comprehensive discussion.
241
+
242
+ Citation
243
+
244
+ If you use these models or the EMoNet-FACE benchmark in your research, please cite the original paper:
245
+
246
+ @inproceedings{schuhmann2025emonetface,
247
+ title={{EMONET-FACE: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}},
248
+ author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Kraus, Maurice and Friedrich, Felix and Nguyen, Huu and Kalyan, Krishna and Nadi, Kourosh and Kersting, Kristian and Auer, Sören},
249
+ booktitle={Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
250
+ year={2025} % Or actual year of publication
251
+ % TODO: Add URL/DOI when available
252
+ }
253
+ IGNORE_WHEN_COPYING_START
254
+ content_copy
255
+ download
256
+ Use code with caution.
257
+ Bibtex
258
+ IGNORE_WHEN_COPYING_END
259
+
260
+ (Please update the year and add URL/DOI once the paper is officially published.)
261
+
262
+ Acknowledgements
263
+
264
+ We thank all the expert annotators for their invaluable contributions to the EMoNet-FACE datasets.
265
+ (Add any other specific acknowledgements if desired)
266
+
267
+ This README was generated based on the EMoNet-FACE paper. For full details, please refer to the publication.
268
+
269
+ **Key changes made:**
270
+
271
+ 1. **Author List Updated:** The author list in the introduction and the BibTeX citation has been updated to match the list you provided:
272
+ `Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, Maurice Kraus, Felix Friedrich, Huu Nguyen, Krishna Kalyan, Kourosh Nadi, Kristian Kersting, Sören Auer.`
273
+ 2. **`MODEL_DIRECTORY` in Code Example:**
274
+ * A new variable `MODEL_DIRECTORY = Path("./Empathic-Insight-Face-Large")` is introduced.
275
+ * **Crucially, users are instructed to `ADJUST THIS PATH`** to where they have actually downloaded/cloned the Hugging Face repository containing the `.pth` files and the `neutral_stats_cache...` file.
276
+ * The code now uses this `MODEL_DIRECTORY` to load the neutral stats and iterate through the `.pth` files.
277
+ 3. **Inference for all 40 Experts (Models) in Code Example:**
278
+ * The code snippet now iterates through all `model_*_best.pth` files found in the `MODEL_DIRECTORY`.
279
+ * It loads each MLP model, performs inference, applies mean subtraction using the corresponding neutral mean, and stores/prints the results for all detected emotion models.
280
+ * Added more robust error handling for file loading.
281
+ * Includes a placeholder for actual image processing, defaulting to a random embedding if an image path is not correctly set up by the user, to ensure the rest of the script can still demonstrate the MLP loading and inference loop.
282
+
283
+ This revised README should be more accurate and provide a more complete and usable code example for users.
284
+ IGNORE_WHEN_COPYING_START
285
+ content_copy
286
+ download
287
+ Use code with caution.
288
+ IGNORE_WHEN_COPYING_END