first

Browse files

Files changed (2) hide show

README.md +7 -11
resource/data.png +0 -3

README.md CHANGED Viewed

@@ -35,8 +35,11 @@ conda create -n mellow python=3.10.14 && \
 conda activate mellow && \
 pip install -r requirements.txt
 ```
-2. Download Mellow weights: [checkpoint \[drive\]]()
-3. Move the `v0.ckpt` under `config` folder
 ## Usage
 The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
@@ -63,8 +66,8 @@ mellow = Mellow(config="<choice of config>", model_path="<model weights", device
 # setup mellow
 mellow = MellowWrapper(
-                    config="conf.yaml",
-                    model = "v0.ckpt",
                     device=device,
                     use_cuda=cuda,
                 )
@@ -89,13 +92,6 @@ response = mellow.generate(examples=examples, max_len=300, top_p=0.8, temperatur
 print(f"\noutput: {response}")
 ```
-## ReasonAQA
-The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
-![alt text](resource/data.png)
-- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
-- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
 ## Limitation
 With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.

 conda activate mellow && \
 pip install -r requirements.txt
 ```
+2. To test the setup is complete, run:
+```shell
+python example.py
+```
 ## Usage
 The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
 # setup mellow
 mellow = MellowWrapper(
+                    config="v0",
+                    model = "v0",
                     device=device,
                     use_cuda=cuda,
                 )
 print(f"\noutput: {response}")
 ```
 ## Limitation
 With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.

resource/data.png DELETED Viewed

Git LFS Details

SHA256: 0e4d4dc0b0699031235bea278f7a0dc226a767f3501718a1b6f7253c5e8f1682
Pointer size: 131 Bytes
Size of remote file: 492 kB