kwilk90 commited on
Commit
1b677bd
·
verified ·
1 Parent(s): b7e19cf

initial readme

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: cc-by-nd-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DSpAST: Disentangled Spatial Audio Spectrogram Transformer
2
+
3
+ [arXiv](https://arxiv.org/abs/2509.13927) | [GitHub](https://github.com/wilkinghoff/DSpAST)
4
+
5
+ Checkpoints of [DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models](https://arxiv.org/abs/2509.13927).
6
+
7
+ ***
8
+
9
+ ## Performance
10
+
11
+ For inference, the script `scripts/inf.sh` can be used. On our system, the performances obtained with our provided checkpoints are:
12
+
13
+ | Binaural Encoder | mAP (↑) | ER20° (↓) | MAE (↓) | DER (↓) |
14
+ | :---: | :---: | :---: | :---: | :---: |
15
+ | [SpatialAST](https://github.com/zszheng147/Spatial-AST/tree/main) | 49.90 | 24.43 | 17.87 | 32.50 |
16
+ | DSpAST (stage 1) | 53.05 | 98.56 | 95.57 | 97.58 |
17
+ | DSpAST (stage 2) | 52.64 | 20.31 | **14.44** | 28.35 |
18
+ | DSpAST (stage 3) | **54.53** | **20.28** | **14.44** | **28.03** |
19
+
20
+ Similar performance improvements can also be observed when using DSpAST as a binaural encoder for spatial audio reasoning with LLMs. Please have a look at our [paper](https://arxiv.org/abs/2509.13927) for further information.
21
+
22
+ ***
23
+
24
+ ## References
25
+
26
+ If you use any part of this code for your work, we kindly ask you to cite the following papers:
27
+
28
+ ``` latex
29
+ @article{wilkinghoff2025dspast,
30
+ author = {Wilkinghoff, Kevin and
31
+ Tan, Zheng-Hua},
32
+ title = {{DSpAST:} Disentangled Representations for Spatial Audio Reasoning with Large Language Models},
33
+ journal = {arXiv:2509.13927},
34
+ year = {2025}
35
+ }
36
+ ```
37
+ and the original [BAT](https://zhishengzheng.com/bat/) paper, which is the foundation of this work:
38
+ ``` latex
39
+ @inproceedings{zheng2024bat,
40
+ author = {Zheng, Zhisheng and
41
+ Peng, Puyuan and
42
+ Ma, Ziyang and
43
+ Chen, Xie and
44
+ Choi, Eunsol and
45
+ Harwath, David},
46
+ title = {{BAT:} Learning to Reason about Spatial Sounds with Large Language Models},
47
+ booktitle = {Proc. ICML},
48
+ year = {2024}
49
+ }
50
+ ```