sino
commited on
Commit
·
9a4a887
1
Parent(s):
ed14d60
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,8 +11,15 @@ pipeline_tag: text-generation
|
|
| 11 |
</p>
|
| 12 |
<br>
|
| 13 |
|
| 14 |
-
Music tagging is a task to predict the tags of music recordings.
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
|
| 18 |
## Requirements
|
|
|
|
| 11 |
</p>
|
| 12 |
<br>
|
| 13 |
|
| 14 |
+
Music tagging is a task to predict the tags of music recordings.
|
| 15 |
+
However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.
|
| 16 |
+
In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (**JMLA**) model to address the open-set music tagging problem.
|
| 17 |
+
The **JMLA** model consists of an audio encoder modeled by a pretrained masked autoencoder and a decoder modeled by a Falcon7B.
|
| 18 |
+
We introduce preceiver resampler to convert arbitrary length audio into fixed length embeddings.
|
| 19 |
+
We introduce dense attention connections between encoder and decoder layers to improve the information flow between the encoder and decoder layers.
|
| 20 |
+
We collect a large-scale music and description dataset from the internet.
|
| 21 |
+
We propose to use ChatGPT to convert the raw descriptions into formalized and diverse descriptions to train the **JMLA** models.
|
| 22 |
+
Our proposed **JMLA** system achieves a zero-shot audio tagging accuracy of 64.82% on the GTZAN dataset, outperforming previous zero-shot systems and achieves comparable results to previous systems on the FMA and the MagnaTagATune datasets.
|
| 23 |
|
| 24 |
|
| 25 |
## Requirements
|