|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
--- |
|
|
# DIFFA: Large Language Diffusion Models Can Listen and Understand |
|
|
[](https://arxiv.org/abs/2507.18452) |
|
|
[](https://huggingface.co/zhoujiaming777/DIFFA) |
|
|
[](https://github.com/NKU-HLT/DIFFA) |
|
|
|
|
|
|
|
|
**DIFFA** is the first **diffusion-based large audio-language model** for spoken language understanding. |
|
|
It combines a frozen diffusion LLM with **dual adapters** (semantic + acoustic) to enhance **audio perception and reasoning**. |