jiangchengchengNLP commited on
Commit
a7f3432
·
verified ·
1 Parent(s): a857c64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -10
README.md CHANGED
@@ -38,13 +38,16 @@ The model is trained using the following datasets:
38
 
39
  ## training method
40
 
41
- Combining the fine-tuning methods of layer_norm tuning, prefix tuning, and prompt tuning, the practical results show that the mixture of the three training methods can be comparable to or even exceed the performance of full fine-tuning in generalized visual emotion recognition by introducing only a small number of parameters. In addition, thanks to the adjustment of layer norm, it converges faster than prefix tuning and prompt tuning, achieving higher performance than EmotionCLIP-V1.
 
 
 
42
 
43
  ## Fine-tuning Weights
44
 
45
  This repository provides one fine-tuned weights:
46
 
47
- 1. **EmotionCLIP-V2 Weights**
48
  - Fine-tuned on the EmoSet 118K dataset, without additional training specifically for facial emotion recognition.
49
  - Final evaluation results:
50
  - Loss: 1.5465
@@ -125,13 +128,9 @@ for idx in range(num_images, rows * cols):
125
  plt.tight_layout()
126
  plt.show()
127
  ```
128
-
129
- ## Existing Issues
130
- The hybrid fine-tuning method improved the model by 2% in the prediction task after the introduction of the neutral category, but this introduction still has noise, which will interfere with the emotion recognition in other scenes. The introduction of prompt tuning is the key to surpassing the effect of full fine-tuning, and the introduction of layer norm tuning makes the convergence faster during training. But this also has disadvantages. After mixing so many fine-tuning methods, the generalization performance of the model has seriously declined. At the same time, the recognition of difficult categories disgust and anger has not been improved. Although I have deliberately added some disgusting pictures of humans, the effect is still not as expected. Therefore, it is still necessary to build a high-quality, large-scale visual emotion dataset. I can feel that the performance of the model is limited by the number of datasets that are far less than the pre-training dataset. At the same time, seeking breakthroughs in model structure will also provide great help for this problem.
131
-
132
-
133
- ### Summary
134
- I proposed a hybrid layer_norm prefix_tuning prompt_tuning training method for efficient fine-tuning CLIP, which can make the model converge faster and have performance comparable to full fine-tuning. However, the loss of generalization performance is still a serious problem. I released EmosetCLIP-V2 trained with this training method, which has an additional neutral category compared to EmosetCLIP-V1, and the performance is slightly improved. Future work aims to expand the training data for difficult categories and optimize the model architecture.
135
-
136
 
137
  ---
 
38
 
39
  ## training method
40
 
41
+ Prefix-Tuning
42
+
43
+
44
+
45
 
46
  ## Fine-tuning Weights
47
 
48
  This repository provides one fine-tuned weights:
49
 
50
+ 1. **EmotionCLIP Weights**
51
  - Fine-tuned on the EmoSet 118K dataset, without additional training specifically for facial emotion recognition.
52
  - Final evaluation results:
53
  - Loss: 1.5465
 
128
  plt.tight_layout()
129
  plt.show()
130
  ```
131
+ This repository provides two fine-tuned weights:
132
+ - Accuracy: 0.8042
133
+ - Recall: 0.8042
134
+ - F1: 0.8057
 
 
 
 
135
 
136
  ---