File size: 2,196 Bytes
b647ac0
 
 
1b677bd
 
 
 
 
 
 
 
 
 
6e8ede7
1b677bd
 
 
2496d2d
2b08c46
 
 
1b677bd
 
 
 
 
 
 
71cd9c0
1b677bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b647ac0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: cc-by-nc-4.0
---
# DSpAST: Disentangled Spatial Audio Spectrogram Transformer

[arXiv](https://arxiv.org/abs/2509.13927) | [GitHub](https://github.com/wilkinghoff/DSpAST)

Checkpoints of [DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models](https://arxiv.org/abs/2509.13927).

***

## Performance

On our system, the performances obtained with our provided checkpoints are:

| Binaural Encoder | mAP (↑) | ER20° (↓) | MAE (↓) | DER (↓) |
| :---: | :---: | :---: | :---: | :---: |
| [SpatialAST](https://huggingface.co/datasets/zhisheng01/SpatialAudio/blob/main/SpatialAST/finetuned.pth) | 49.90 | 24.43 | 17.87 | 32.50 |
| [DSpAST (stage 1)](https://huggingface.co/kwilk90/DSpAST/blob/main/DSpAST-stage1) | 53.05 | 98.56 | 95.57 | 97.58 |
| [DSpAST (stage 2)](https://huggingface.co/kwilk90/DSpAST/blob/main/DSpAST-stage2) | 52.64 | 20.31 | **14.44** | 28.35 |
| [DSpAST (stage 3)](https://huggingface.co/kwilk90/DSpAST/blob/main/DSpAST-stage3) | **54.53** | **20.28** | **14.44** | **28.03** |

Similar performance improvements can also be observed when using DSpAST as a binaural encoder for spatial audio reasoning with LLMs. Please have a look at our [paper](https://arxiv.org/abs/2509.13927) for further information.

***

## References

If you use the checkpoints for your work, we kindly ask you to cite the following papers:

``` latex
@article{wilkinghoff2025dspast,
    author     = {Wilkinghoff, Kevin and
                  Tan, Zheng-Hua},
    title      = {{DSpAST:} Disentangled Representations for Spatial Audio Reasoning with Large Language Models},
    journal    = {arXiv:2509.13927},
    year       = {2025}
}
```
and the original [BAT](https://zhishengzheng.com/bat/) paper, which is the foundation of this work:
``` latex
@inproceedings{zheng2024bat,
  author       = {Zheng, Zhisheng and
                  Peng, Puyuan and
                  Ma, Ziyang and
                  Chen, Xie and
                  Choi, Eunsol and
                  Harwath, David},
  title        = {{BAT:} Learning to Reason about Spatial Sounds with Large Language Models},
  booktitle    = {Proc. ICML},
  year         = {2024}
}
```