---
license: unknown
base_model:
- BleachNick/MMICL-Instructblip-T5-xxl
---

# Model Card

[**🌐Homepage**](https://github.com/DreamMr/WisdoM) | [**📖 Paper**](https://dl.acm.org/doi/abs/10.1145/3664647.3681403)

We trained using MMICL on the MSED training set for research in the field of multimodal sentiment analysis.

## Training Details

Our training code is sourced from: [here](https://github.com/HaozheZhao/MIC/blob/master/run_script/flickr/deep_speed_instructblip_t5xxl.sh).

- data format

  ```
  {
          "text": "Sentence: \"An overweight Hispanic woman and a young mixed race Hispanic and Caucasian man exercising together outdoors in an urban setting, running or jogging. They are smiling, looking at each other as they exercise.\". Use the image 0: <image0>图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图 as a visual aids to help you answer the question. Question: according to the image 0 and sentence, what is the sentiment polarity? Choose from the following options:\nA).positive\nB).neutral\nC).negative\n\nAnswer: ",
          "image": "xxx/1.jpg",
          "aspect": "",
          "label": "neutral"
  }
  ```

  
- Hyperparameters

  |            | batch size | learning rate | epoch |
  | ---------- | ---------- | ------------- | ----- |
  | MMICL-MSED | 4          | 1e-4          | 3     |


## Evaluation

The evaluation code 👉 [here](https://github.com/DreamMr/WisdoM)

## Citation

```
@inproceedings{wang2024wisdom,
  title={Wisdom: Improving multimodal sentiment analysis by fusing contextual world knowledge},
  author={Wang, Wenbin and Ding, Liang and Shen, Li and Luo, Yong and Hu, Han and Tao, Dacheng},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={2282--2291},
  year={2024}
}
```