zhoujiaming777 commited on
Commit
8aea612
·
verified ·
1 Parent(s): cceff89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -14,4 +14,6 @@ tags:
14
  [![🤗 Hugging Face](https://img.shields.io/badge/🤗Hugging%20Face-DIFFA-FFEB3B)](https://huggingface.co/zhoujiaming777/DIFFA-2)
15
  [![GitHub](https://img.shields.io/badge/Github-DIFFA-blue)](https://github.com/NKU-HLT/DIFFA)
16
 
17
- In this paper, We introduce DIFFA-2, a practical diffusion-based LALM for general audio understanding. DIFFA-2 upgrades the speech encoder, employs dual semantic and acoustic adapters, and is trained with a four-stage curriculum that combines semantic and acoustic alignment, large-scale supervised fine-tuning, and variance-reduced preference optimization, using only fully open-source corpora. Experiments on MMSU, MMAU, and MMAR show that DIFFA-2 consistently improves over DIFFA and is competitive to strong AR LALMs under practical training budgets, supporting diffusion-based modeling is a viable backbone for large-scale audio understanding.
 
 
 
14
  [![🤗 Hugging Face](https://img.shields.io/badge/🤗Hugging%20Face-DIFFA-FFEB3B)](https://huggingface.co/zhoujiaming777/DIFFA-2)
15
  [![GitHub](https://img.shields.io/badge/Github-DIFFA-blue)](https://github.com/NKU-HLT/DIFFA)
16
 
17
+ In this paper, We introduce DIFFA-2, a practical diffusion-based LALM for general audio understanding. DIFFA-2 upgrades the speech encoder, employs dual semantic and acoustic adapters, and is trained with a four-stage curriculum that combines semantic and acoustic alignment, large-scale supervised fine-tuning, and variance-reduced preference optimization, using only fully open-source corpora. Experiments on MMSU, MMAU, and MMAR show that DIFFA-2 consistently improves over DIFFA and is competitive to strong AR LALMs under practical training budgets, supporting diffusion-based modeling is a viable backbone for large-scale audio understanding.
18
+
19
+ We have open-sourced the checkpoints for stage 1 and stage 4. The files in the root directory of the repository are for stage4, and stage1 is located in the stage1 folder.