first
Browse files
README.md
CHANGED
|
@@ -1,5 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Mellow
|
| 2 |
-
[[`Paper`]()] [[`Checkpoint`]()]
|
| 3 |
|
| 4 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
| 5 |
|
|
@@ -79,7 +93,7 @@ print(f"\noutput: {response}")
|
|
| 79 |
The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
|
| 80 |
|
| 81 |

|
| 82 |
-
- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint
|
| 83 |
- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
|
| 84 |
|
| 85 |
## Limitation
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- small audio-language model
|
| 5 |
+
- ALM
|
| 6 |
+
- audio
|
| 7 |
+
- music
|
| 8 |
+
- sound events
|
| 9 |
+
- audio reasoning
|
| 10 |
+
- audio captioning
|
| 11 |
+
- audio question answering
|
| 12 |
+
- zero-shot
|
| 13 |
+
- audio-text
|
| 14 |
+
---
|
| 15 |
# Mellow
|
| 16 |
+
[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)]
|
| 17 |
|
| 18 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
| 19 |
|
|
|
|
| 93 |
The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
|
| 94 |
|
| 95 |

|
| 96 |
+
- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
|
| 97 |
- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
|
| 98 |
|
| 99 |
## Limitation
|