first
Browse files- README.md +7 -11
- resource/data.png +0 -3
README.md
CHANGED
|
@@ -35,8 +35,11 @@ conda create -n mellow python=3.10.14 && \
|
|
| 35 |
conda activate mellow && \
|
| 36 |
pip install -r requirements.txt
|
| 37 |
```
|
| 38 |
-
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## Usage
|
| 42 |
The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
|
|
@@ -63,8 +66,8 @@ mellow = Mellow(config="<choice of config>", model_path="<model weights", device
|
|
| 63 |
|
| 64 |
# setup mellow
|
| 65 |
mellow = MellowWrapper(
|
| 66 |
-
config="
|
| 67 |
-
model = "v0
|
| 68 |
device=device,
|
| 69 |
use_cuda=cuda,
|
| 70 |
)
|
|
@@ -89,13 +92,6 @@ response = mellow.generate(examples=examples, max_len=300, top_p=0.8, temperatur
|
|
| 89 |
print(f"\noutput: {response}")
|
| 90 |
```
|
| 91 |
|
| 92 |
-
## ReasonAQA
|
| 93 |
-
The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
|
| 94 |
-
|
| 95 |
-

|
| 96 |
-
- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
|
| 97 |
-
- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
|
| 98 |
-
|
| 99 |
## Limitation
|
| 100 |
With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.
|
| 101 |
|
|
|
|
| 35 |
conda activate mellow && \
|
| 36 |
pip install -r requirements.txt
|
| 37 |
```
|
| 38 |
+
|
| 39 |
+
2. To test the setup is complete, run:
|
| 40 |
+
```shell
|
| 41 |
+
python example.py
|
| 42 |
+
```
|
| 43 |
|
| 44 |
## Usage
|
| 45 |
The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
|
|
|
|
| 66 |
|
| 67 |
# setup mellow
|
| 68 |
mellow = MellowWrapper(
|
| 69 |
+
config="v0",
|
| 70 |
+
model = "v0",
|
| 71 |
device=device,
|
| 72 |
use_cuda=cuda,
|
| 73 |
)
|
|
|
|
| 92 |
print(f"\noutput: {response}")
|
| 93 |
```
|
| 94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
## Limitation
|
| 96 |
With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.
|
| 97 |
|
resource/data.png
DELETED
Git LFS Details
|