SoyeonHH commited on
Commit
9c197c4
ยท
verified ยท
1 Parent(s): 4677217

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Multimodal Sentiment Model with Augmentation
2
+
3
+ DeBERTa-v3-Large ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ฐ์„ฑ ๋ถ„์„ ๋ชจ๋ธ (CMU-MOSEI)
4
+
5
+ ## Model Description
6
+
7
+ ์ด ๋ชจ๋ธ์€ CMU-MOSEI ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ฐ์„ฑ ๋ถ„์„์„ ์œ„ํ•ด ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
8
+ IITP "๋‚˜๋น„ํšจ๊ณผ" ์—ฐ๊ตฌ ํ”„๋กœ์ ํŠธ์˜ ์ผํ™˜์œผ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
9
+
10
+ ### Architecture
11
+
12
+ - **Text Encoder**: DeBERTa-v3-Large (microsoft/deberta-v3-large)
13
+ - **Audio Encoder**: Transformer Encoder (2 layers)
14
+ - **Video Encoder**: Transformer Encoder (2 layers)
15
+ - **Fusion**: Cross-modal attention + Multi-head self-attention
16
+
17
+ ### Key Features
18
+
19
+ - Cross-modal attention between text, audio, and video
20
+ - Mixup augmentation for audio/video modalities
21
+ - Multi-task learning with auxiliary classifiers (T, A, V branches)
22
+ - Frozen first 20 layers of DeBERTa for efficient training
23
+
24
+ ## Performance
25
+
26
+ | Metric | Score |
27
+ |--------|-------|
28
+ | **Mult_acc_7** | **56.17%** |
29
+ | Mult_acc_5 | 57.83% |
30
+ | Has0_acc_2 | ~84% |
31
+ | MAE | - |
32
+ | Corr | - |
33
+
34
+ ### Comparison with Baselines
35
+
36
+ | Model | Mult_acc_7 |
37
+ |-------|-----------|
38
+ | MulT (2020) | 50.7% |
39
+ | MMML (2023) | 54.95% |
40
+ | **Ours** | **56.17%** |
41
+
42
+ ## Training Details
43
+
44
+ - **Dataset**: CMU-MOSEI (unaligned_50.pkl)
45
+ - **Batch Size**: 16
46
+ - **Learning Rate**: 2e-5 (other), 5e-6 (DeBERTa)
47
+ - **Epochs**: 50 (early stopping: 15)
48
+ - **Optimizer**: AdamW
49
+ - **Scheduler**: Cosine with warmup
50
+ - **Mixup**: alpha=0.4, prob=0.5
51
+ - **Loss weights**: cls=0.7, aux=0.1
52
+
53
+ ## Usage
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoTokenizer
58
+
59
+ # Load checkpoint
60
+ checkpoint = torch.load('best_model.pt')
61
+ args = checkpoint['args']
62
+
63
+ # Initialize model
64
+ from train_deberta_multimodal import DeBERTaMultimodalModel
65
+
66
+ model = DeBERTaMultimodalModel(
67
+ model_name='microsoft/deberta-v3-large',
68
+ hidden_size=512,
69
+ num_heads=8,
70
+ dropout=0.2,
71
+ freeze_deberta_layers=20
72
+ )
73
+ model.load_state_dict(checkpoint['model_state_dict'])
74
+ model.eval()
75
+
76
+ # Load tokenizer
77
+ tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v3-large')
78
+ ```
79
+
80
+ ## Input Format
81
+
82
+ - **Text**: Raw text string (tokenized by DeBERTa tokenizer)
83
+ - **Audio**: COVAREP features (74-dim, 500 timesteps)
84
+ - **Video**: OpenFace features (35-dim, 500 timesteps)
85
+
86
+ ## Citation
87
+
88
+ If you use this model, please cite:
89
+
90
+ ```bibtex
91
+ @misc{iknow2024mosei,
92
+ title={Multimodal Sentiment Analysis with DeBERTa and Cross-Modal Attention},
93
+ author={iKnow Lab},
94
+ year={2024},
95
+ publisher={Hugging Face}
96
+ }
97
+ ```
98
+
99
+ ## Acknowledgements
100
+
101
+ This work was supported by IITP (Institute of Information & Communications Technology Planning & Evaluation) grant funded by the Korea government (MSIT).
102
+
103
+ - Project: ์‚ฌ๋žŒ์ค‘์‹ฌ์ธ๊ณต์ง€๋Šฅํ•ต์‹ฌ์›์ฒœ๊ธฐ์ˆ ๊ฐœ๋ฐœ
104
+ - Task: ๋ณต์žกํ•œ ์ธ๊ณผ ๊ด€๊ณ„ ์ดํ•ด๋ฅผ ์œ„ํ•œ ์˜ด๋‹ˆ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ท€์ถ”์  ์ถ”๋ก  ํ”„๋ ˆ์ž„์›Œํฌ
105
+ - Grant Number: RS-2022-II220680
106
+
107
+ ## License
108
+
109
+ This model is released under the MIT License.