English
music
JoshuaW1997 commited on
Commit
72067b5
·
verified ·
1 Parent(s): e6ee21a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -8,7 +8,7 @@ license: apache-2.0
8
 
9
  ## News
10
 
11
- - [07/28] We released [**model checkpoint**](https://huggingface.co/JoshuaW1997/FUTGA) and **training/inference code** based on [**SALMONN-7B**](https://huggingface.co/tsinghua-ee/SALMONN) backbone!
12
 
13
  ## Overview
14
  FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
@@ -27,3 +27,7 @@ We build FUTGA based on [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN
27
  3. [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```,
28
  4. [FUTGA-7b](https://huggingface.co/JoshuaW1997/FUTGA/blob/main/salomnn_7b.bin) to ```ckpt_path```.
29
 
 
 
 
 
 
8
 
9
  ## News
10
 
11
+ - [07/28] We released [**FUTGA-7B**](https://huggingface.co/JoshuaW1997/FUTGA) and **training/inference code** based on [**SALMONN-7B**](https://huggingface.co/tsinghua-ee/SALMONN) backbone!
12
 
13
  ## Overview
14
  FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
 
27
  3. [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```,
28
  4. [FUTGA-7b](https://huggingface.co/JoshuaW1997/FUTGA/blob/main/salomnn_7b.bin) to ```ckpt_path```.
29
 
30
+
31
+ ## Datasets
32
+ We generate dense captions for full-length songs in [MusicCaps](https://huggingface.co/JoshuaW1997/FUTGA/tree/main/Data-MusicCaps) and [SongDescriber](https://huggingface.co/JoshuaW1997/FUTGA/tree/main/Data-SongDescriber),
33
+ where ''raw captions'' are directly generated from FUTGA-7B and ''seg_captions_features'' contain automatically segmented captions with structures and textual-audio features for each segment.