JoshuaW1997
/

FUTGA

English

music

Model card Files Files and versions

xet

Community

JoshuaW1997 commited on Jul 29, 2024

Commit

0cce61f

verified ·

1 Parent(s): a3703a2

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ license: apache-2.0
 ## News
-- [07/28] We released [**FUTGA-7B**](https://huggingface.co/JoshuaW1997/FUTGA) and **training/inference code** based on [**SALMONN-7B**](https://huggingface.co/tsinghua-ee/SALMONN) backbone!
 ## Overview
 FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
@@ -21,7 +21,7 @@ FUTGA is an audio LLM with fine-grained music understanding, learning from gener
 ## How to load the model
-We build FUTGA based on [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN). Follow the instructions from [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN) to load:
 1. [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```,
 2. [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`
 3. [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```,

 ## News
+- [07/28] We released [**FUTGA-7B**](https://huggingface.co/JoshuaW1997/FUTGA) and **training/inference code** based on [SALMONN-7B](https://huggingface.co/tsinghua-ee/SALMONN) backbone!
 ## Overview
 FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
 ## How to load the model
+We build **FUTGA-7B** based on SALMONN. Follow the instructions from [SALMONN](https://huggingface.co/tsinghua-ee/SALMONN) to load:
 1. [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```,
 2. [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`
 3. [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```,