kwilk90
/

DSpAST

Model card Files Files and versions

xet

Community

kwilk90 commited on Oct 19, 2025

Commit

1b677bd

verified ·

1 Parent(s): b7e19cf

initial readme

Browse files

Files changed (1) hide show

README.md +50 -3

README.md CHANGED Viewed

@@ -1,3 +1,50 @@
----
-license: cc-by-nd-4.0
----

+# DSpAST: Disentangled Spatial Audio Spectrogram Transformer
+[arXiv](https://arxiv.org/abs/2509.13927) | [GitHub](https://github.com/wilkinghoff/DSpAST)
+Checkpoints of [DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models](https://arxiv.org/abs/2509.13927).
+***
+## Performance
+For inference, the script `scripts/inf.sh` can be used. On our system, the performances obtained with our provided checkpoints are:
+| Binaural Encoder | mAP (↑) | ER20° (↓) | MAE (↓) | DER (↓) |
+| :---: | :---: | :---: | :---: | :---: |
+| [SpatialAST](https://github.com/zszheng147/Spatial-AST/tree/main) | 49.90 | 24.43 | 17.87 | 32.50 |
+| DSpAST (stage 1) | 53.05 | 98.56 | 95.57 | 97.58 |
+| DSpAST (stage 2) | 52.64 | 20.31 | **14.44** | 28.35 |
+| DSpAST (stage 3) | **54.53** | **20.28** | **14.44** | **28.03** |
+Similar performance improvements can also be observed when using DSpAST as a binaural encoder for spatial audio reasoning with LLMs. Please have a look at our [paper](https://arxiv.org/abs/2509.13927) for further information.
+***
+## References
+If you use any part of this code for your work, we kindly ask you to cite the following papers:
+``` latex
+@article{wilkinghoff2025dspast,
+    author     = {Wilkinghoff, Kevin and
+                  Tan, Zheng-Hua},
+    title      = {{DSpAST:} Disentangled Representations for Spatial Audio Reasoning with Large Language Models},
+    journal    = {arXiv:2509.13927},
+    year       = {2025}
+}
+```
+and the original [BAT](https://zhishengzheng.com/bat/) paper, which is the foundation of this work:
+``` latex
+@inproceedings{zheng2024bat,
+  author       = {Zheng, Zhisheng and
+                  Peng, Puyuan and
+                  Ma, Ziyang and
+                  Chen, Xie and
+                  Choi, Eunsol and
+                  Harwath, David},
+  title        = {{BAT:} Learning to Reason about Spatial Sounds with Large Language Models},
+  booktitle    = {Proc. ICML},
+  year         = {2024}
+}
+```