earlab commited on
Commit
7c56114
ยท
verified ยท
1 Parent(s): 99f8fbd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +169 -1
README.md CHANGED
@@ -6,4 +6,172 @@ tags:
6
  - song
7
  - aesthetics
8
  - ASAE
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - song
7
  - aesthetics
8
  - ASAE
9
+ ---
10
+
11
+
12
+ # **HEAR**: Hierarchically Enhanced Aesthetic Representations for Multidimensional Music Evaluation
13
+ [**Paper**](https://arxiv.org/pdf/2511.18869) |
14
+ [**Model**](https://huggingface.co/earlab/EAR_HEAR)
15
+ <br>
16
+
17
+ Official PyTorch Implementation of ICASSP 2026 paper "HEAR: Hierarchically Enhanced Aesthetic Representations for Multidimensional Music Evaluation"
18
+
19
+ This repository contains the training and evaluation code for HEAR, a robust framework designed to address the challenges of multidimensional music aesthetic evaluation under limited labeled data.
20
+ ![](HEAR.png)
21
+ ## ๐ŸŒŸ Key Features
22
+ * **Excellent Performance**: Ranked 2nd/19 on Track 1 and 5th/17 on Track 2 in the [ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge](https://aslp-lab.github.io/Automatic-Song-Aesthetics-Evaluation-Challenge/).
23
+ * **Robustness**: Synergizes Multi-Source Multi-Scale Representations and Hierarchical Augmentation to capture robust features under limited labeled data.
24
+ * **Dual Capability**: Optimized for both exact score prediction and ranking (Top-Tier Song Identification).
25
+
26
+ ## ๐Ÿ“ฆ Installation
27
+ Clone the repository and install dependencies:
28
+ ```
29
+ git clone https://github.com:Eps-Acoustic-Revolution-Lab/EAR_HEAR.git
30
+ git submodule update --init --recursive
31
+
32
+ conda create -n hear python=3.10 -y
33
+ conda activate hear
34
+ pip install -r requirements.txt
35
+ ```
36
+
37
+ ## ๐Ÿš€ Quick Start
38
+ ```
39
+ # download pretrained model weights
40
+ export HF_ENDPOINT=https://hf-mirror.com # For users in Mainland China, this is needed for HuggingFace downloads
41
+ hf download earlab/EAR_HEAR --local-dir pretrained_models
42
+
43
+ # Track 1: Single-Label Inference (Musicality)
44
+ python inference.py \
45
+ --input_audio_path data_pipeline/origin_song_eval_dataset/mp3/0.mp3 \
46
+ --output_json_path output.json
47
+ --model_path pretrained_models/track_1.pth \
48
+ --model_config_path config_track_1.yaml
49
+
50
+
51
+ # Track 2:
52
+ python inference.py \
53
+ --input_audio_path data_pipeline/origin_song_eval_dataset/mp3/0.mp3 \
54
+ --output_json_path output.json
55
+ --model_path pretrained_models/track_2.pth \
56
+ --model_config_path config_track_2.yaml
57
+ ```
58
+
59
+ ## ๐ŸŽฏ Training
60
+
61
+ ### Step 1: Data Preparation
62
+
63
+ First, prepare the dataset by running the data pipeline:
64
+
65
+ ```bash
66
+ cd data_pipeline
67
+ bash run.sh
68
+ ```
69
+
70
+ This script will:
71
+ 1. **Download Dataset**: Download the [SongEval](https://huggingface.co/datasets/ASLP-lab/SongEval) dataset
72
+ 2. **Split Dataset**: Split the dataset into training and validation sets based on [the challenge's validation IDs
73
+ ](https://github.com/ASLP-lab/Automatic-Song-Aesthetics-Evaluation-Challenge/blob/main/static/val_ids.txt)
74
+ 3. **Audio Augmentation**: Apply audio augmentation to the training set
75
+ 4. **Extract Features**: Extract MuQ and MusicFM features for both training and test sets
76
+ 5. **Generate PKL Files**: Generate `train_set.pkl` and `test_set.pkl` files for training and evaluation
77
+
78
+
79
+ ### Step 2: Model Training
80
+
81
+ After data preparation, you can train the HEAR model for either Track 1 (single-label: Musicality) or Track 2 (multi-label: 5 dimensions).
82
+
83
+ #### Track 1: Single-Label Training (Musicality)
84
+
85
+ Train the model for musicality prediction:
86
+
87
+ ```bash
88
+ python train_track_1.py \
89
+ --experiment_name track1_exp \
90
+ --train-data /path/to/train_set.pkl \
91
+ --test-data /path/to/test_set.pkl \
92
+ --max-epoch 60 \
93
+ --batch-size 8 \
94
+ --lr 1e-5 \
95
+ --weight_decay 1e-3 \
96
+ --accum_steps 4 \
97
+ --lambda 0.15 \
98
+ --workers 8 \
99
+ --seed 0
100
+ ```
101
+
102
+ #### Track 2: Multi-Label Training (5 Dimensions)
103
+
104
+ Train the model for multi-dimensional aesthetic evaluation:
105
+
106
+ ```bash
107
+ python train_track_2.py \
108
+ --experiment_name track2_exp \
109
+ --train-data /path/to/train_set.pkl \
110
+ --test-data /path/to/test_set.pkl \
111
+ --max-epoch 60 \
112
+ --batch-size 8 \
113
+ --lr 1e-5 \
114
+ --weight_decay 1e-3 \
115
+ --accum_steps 4 \
116
+ --lambda 0.05 \
117
+ --workers 8 \
118
+ --seed 0
119
+ ```
120
+
121
+ #### Key Parameters
122
+
123
+ * `--max-epoch`: Maximum number of training epochs (default: 60)
124
+ * `--batch-size`: Batch size for training (default: 8)
125
+ * `--experiment_name`: Name of the experiment for saving models and logs
126
+ * `--lr`: Learning rate (default: 1e-5)
127
+ * `--weight_decay`: Weight decay for optimizer (default: 1e-3)
128
+ * `--accum_steps`: Gradient accumulation steps (default: 4)
129
+ * `--lambda`: Weight for ranking loss (Track 1: 0.15, Track 2: 0.05)
130
+ * `--workers`: Number of data loading workers (default: 8)
131
+ * `--seed`: Random seed for reproducibility (default: 0)
132
+ * `--train-data`: Path to training data pkl file (default: `data_pipeline/dataset_pkl/train_set.pkl`)
133
+ * `--test-data`: Path to test data pkl file (default: `data_pipeline/dataset_pkl/test_set.pkl`)
134
+ * `--log-dir`: Path to tensorboard log directory (default: `./log/tensorboard_records/{experiment_name}`)
135
+
136
+ #### Evaluation Mode
137
+
138
+ To evaluate a trained model, use the `--eval` flag:
139
+
140
+ ```bash
141
+ python train_track_1.py --eval --experiment_name track1_exp
142
+ python train_track_2.py --eval --experiment_name track2_exp
143
+ ```
144
+
145
+ #### Model Configuration
146
+
147
+ Model architectures are configured in:
148
+ * `config_track_1.yaml` - Configuration for Track 1
149
+ * `config_track_2.yaml` - Configuration for Track 2
150
+
151
+ Trained models are saved in `log/models/{experiment_name}/model.pth`, and training logs are saved to TensorBoard in `./log/tensorboard_records/{experiment_name}/` (or custom path specified by `--log-dir`).
152
+
153
+
154
+ ## ๐Ÿ™ Acknowledgement
155
+
156
+ We sincerely thank the authors and contributors of the following open-source projects.:
157
+
158
+ * **[SongEval](https://github.com/ASLP-lab/SongEval)**
159
+ * **[SongFormer](https://github.com/ASLP-lab/SongFormer)**
160
+ * **[Audiomentations](https://github.com/iver56/audiomentations)**
161
+ * **[Wespeaker](https://github.com/wenet-e2e/wespeaker)**
162
+ * **[allRank](https://github.com/allegro/allRank)**
163
+
164
+ We would like to express our special thanks to **Shizhe Chen** from **Shanghai Conservatory of Music** for his invaluable guidance and insights on music aesthetics.
165
+
166
+ ## ๐Ÿ“š Citation
167
+ ```bibtex
168
+ @misc{liu2025hearhierarchicallyenhancedaesthetic,
169
+ title={Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation},
170
+ author={Shuyang Liu and Yuan Jin and Rui Lin and Shizhe Chen and Junyu Dai and Tao Jiang},
171
+ year={2025},
172
+ eprint={2511.18869},
173
+ archivePrefix={arXiv},
174
+ primaryClass={cs.SD},
175
+ url={https://arxiv.org/abs/2511.18869},
176
+ }
177
+ ```