update
Browse files
README.md
CHANGED
|
@@ -12,8 +12,8 @@ tags:
|
|
| 12 |
- zero-shot
|
| 13 |
- audio-text
|
| 14 |
---
|
| 15 |
-
# Mellow
|
| 16 |
-
[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
|
| 17 |
|
| 18 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
| 19 |
|
|
@@ -96,5 +96,13 @@ With Mellow, we aim to showcase that small audio-language models can engage in r
|
|
| 96 |
|
| 97 |
## Citation
|
| 98 |
```
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
```
|
|
|
|
| 12 |
- zero-shot
|
| 13 |
- audio-text
|
| 14 |
---
|
| 15 |
+
# Mellow: a small audio language model for reasoning
|
| 16 |
+
[[`Paper`](https://arxiv.org/abs/2503.08540)] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
|
| 17 |
|
| 18 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
| 19 |
|
|
|
|
| 96 |
|
| 97 |
## Citation
|
| 98 |
```
|
| 99 |
+
@misc{mellow,
|
| 100 |
+
title={Mellow: a small audio language model for reasoning},
|
| 101 |
+
author={Soham Deshmukh and Satvik Dixit and Rita Singh and Bhiksha Raj},
|
| 102 |
+
year={2025},
|
| 103 |
+
eprint={2503.08540},
|
| 104 |
+
archivePrefix={arXiv},
|
| 105 |
+
primaryClass={cs.SD},
|
| 106 |
+
url={https://arxiv.org/abs/2503.08540},
|
| 107 |
+
}
|
| 108 |
```
|