soham97
/

mellow

small audio-language model

audio reasoning

audio captioning

audio question answering

Model card Files Files and versions

soham97 commited on Mar 10, 2025

Commit

f00f693

·

1 Parent(s): 2c6e7ae

first

Files changed (1) hide show

README.md +16 -2

README.md CHANGED Viewed

@@ -1,5 +1,19 @@
 # Mellow
-[[`Paper`]()] [[`Checkpoint`]()]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
@@ -79,7 +93,7 @@ print(f"\noutput: {response}")
 The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
 ![alt text](resource/data.png)
-- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint \[drive\]](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
 - The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
 ## Limitation

+---
+license: mit
+tags:
+  - small audio-language model
+  - ALM
+  - audio
+  - music
+  - sound events
+  - audio reasoning
+  - audio captioning
+  - audio question answering
+  - zero-shot
+  - audio-text
+---
 # Mellow
+[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
 The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
 ![alt text](resource/data.png)
+- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
 - The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
 ## Limitation