Add pipeline tag and library name
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,3 +1,88 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: any-to-any
|
| 4 |
+
library_name: pytorch
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# File information
|
| 8 |
+
|
| 9 |
+
The repository contains the following file information:
|
| 10 |
+
|
| 11 |
+
<div align="center">
|
| 12 |
+
<h2 class="papername"> Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence </h2>
|
| 13 |
+
<div>
|
| 14 |
+
<div>
|
| 15 |
+
<a href="https://scholar.google.com/citations?user=Em5FqXYAAAAJ" target="_blank">Xiang He*</a>,
|
| 16 |
+
<a href="https://scholar.google.com/citations?user=2E9Drq8AAAAJ" target="_blank">Dongcheng Zhao*</a>,
|
| 17 |
+
<a href="https://scholar.google.com/citations?user=3QpRLTgAAAAJ" target="_blank">Yang Li</a>,
|
| 18 |
+
<a href="https://ieeexplore.ieee.org/author/37085719247" target="_blank">Qingqun Kong†</a>,
|
| 19 |
+
<a href="https://ieeexplore.ieee.org/author/37401423300" target="_blank">Xin Yang†</a>,
|
| 20 |
+
<a href="https://scholar.google.com/citations?user=Rl-YqPEAAAAJ" target="_blank">Yi Zeng†</a>
|
| 21 |
+
</div>
|
| 22 |
+
Institute of Automation, Chinese Academy of Sciences, Beijing<br>
|
| 23 |
+
*Equal contribution
|
| 24 |
+
†Corresponding author
|
| 25 |
+
|
| 26 |
+
\[[arxiv](https://arxiv.org/abs/2505.10176)\] \[[paper]()\] \[[code](https://github.com/Brain-Cog-Lab/IEMF)\]
|
| 27 |
+
|
| 28 |
+
</div>
|
| 29 |
+
<br>
|
| 30 |
+
|
| 31 |
+
</div>
|
| 32 |
+
|
| 33 |
+
Here is the PyTorch implementation of our paper.
|
| 34 |
+
If you find this work useful for your research, please kindly cite our paper and star our repo.
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
## Method
|
| 39 |
+
|
| 40 |
+
We propose an inverse effectiveness driven multimodal fusion (IEMF) method, which dynamically adjusts the update dynamics of the multimodal fusion module based on the relationship between the strength of individual modality cues and the strength of the fused multimodal signal.
|
| 41 |
+
|
| 42 |
+

|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
## Usage
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
+--- Audio Visual Classification
|
| 50 |
+
+--- Audio Visual Continual Learning
|
| 51 |
+
\--- Audio Visual Question Answering
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
Three folders provide three tasks each. They contain detailed **run scripts for each task, drawing programs, and the way to download the corresponding dataset.**
|
| 55 |
+
|
| 56 |
+
## Well-trained model
|
| 57 |
+
We also upload the weights of the trained model, as well as the log files from the training process here to ensure reproduction of the results in the paper. You can find them at [https://huggingface.co/xianghe/IEMF/tree/main](https://huggingface.co/xianghe/IEMF/tree/main).
|
| 58 |
+
|
| 59 |
+
## Dataset Download
|
| 60 |
+
You can find how to download the dataset under the folder corresponding to each task.
|
| 61 |
+
In particular, due to the processing complexity of the Kinetics-Sounds dataset, you can download our packaged raw video-audio dataset at [here](https://pan.baidu.com/s/1NHmpyhpPaXJVgtwFPkKHcw) (extraction code: bauh).
|
| 62 |
+
In addition to the original dataset, we also provide processed data in HDF5 format ready for network model input, which you can access [here](https://pan.baidu.com/s/1v28Pt9HUKHUv8JCagdGuTQ) (extraction code: jzbg).
|
| 63 |
+
|
| 64 |
+
## Citation
|
| 65 |
+
|
| 66 |
+
If our paper is useful for your research, please consider citing it:
|
| 67 |
+
|
| 68 |
+
```bash
|
| 69 |
+
@misc{he2025incorporatingbraininspiredmechanismsmultimodal,
|
| 70 |
+
title={Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence},
|
| 71 |
+
author={Xiang He and Dongcheng Zhao and Yang Li and Qingqun Kong and Xin Yang and Yi Zeng},
|
| 72 |
+
year={2025},
|
| 73 |
+
eprint={2505.10176},
|
| 74 |
+
archivePrefix={arXiv},
|
| 75 |
+
primaryClass={cs.NE},
|
| 76 |
+
url={https://arxiv.org/abs/2505.10176},
|
| 77 |
+
}
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
## Acknowledgements
|
| 85 |
+
|
| 86 |
+
The code for each of the three tasks refers to [OGM_GE](https://github.com/GeWu-Lab/OGM-GE_CVPR2022), [AV-CIL_ICCV2023](https://github.com/weiguoPian/AV-CIL_ICCV2023), [MUSIC_AVQA](https://github.com/GeWu-Lab/MUSIC-AVQA):, thanks for their excellent work!
|
| 87 |
+
|
| 88 |
+
If you are confused about using it or have other feedback and comments, please feel free to contact us via hexiang2021@ia.ac.cn. Have a good day!
|