DIFFA

File size: 639 Bytes

---
license: cc-by-nc-sa-4.0
---
# DIFFA: Large Language Diffusion Models Can Listen and Understand
[![arXiv](https://img.shields.io/badge/Paper-arXiv-red.svg)](https://arxiv.org/abs/2507.18452)
[![deploy](https://img.shields.io/badge/Hugging%20Face-DIFFA-FFEB3B)](https://huggingface.co/zhoujiaming777/DIFFA)
[![Github](https://img.shields.io/badge/Github-DIFFA-blue)](https://github.com/NKU-HLT/DIFFA)


**DIFFA** is the first **diffusion-based large audio-language model** for spoken language understanding.  
It combines a frozen diffusion LLM with **dual adapters** (semantic + acoustic) to enhance **audio perception and reasoning**.