earlab
/

EAR_HEAR

 - song
 - aesthetics
 - ASAE
+---
+# **HEAR**: Hierarchically Enhanced Aesthetic Representations for Multidimensional Music Evaluation
+[**Paper**](https://arxiv.org/pdf/2511.18869) |
+[**Model**](https://huggingface.co/earlab/EAR_HEAR)
+<br>
+Official PyTorch Implementation of ICASSP 2026 paper "HEAR: Hierarchically Enhanced Aesthetic Representations for Multidimensional Music Evaluation"
+This repository contains the training and evaluation code for HEAR, a robust framework designed to address the challenges of multidimensional music aesthetic evaluation under limited labeled data.
+![](HEAR.png)
+## 🌟 Key Features
+* **Excellent Performance**: Ranked 2nd/19 on Track 1 and 5th/17 on Track 2 in the [ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge](https://aslp-lab.github.io/Automatic-Song-Aesthetics-Evaluation-Challenge/).
+* **Robustness**: Synergizes Multi-Source Multi-Scale Representations and Hierarchical Augmentation to capture robust features under limited labeled data.
+* **Dual Capability**: Optimized for both exact score prediction and ranking (Top-Tier Song Identification).
+## 📦 Installation
+Clone the repository and install dependencies:
+```
+git clone https://github.com:Eps-Acoustic-Revolution-Lab/EAR_HEAR.git
+git submodule update --init --recursive
+conda create -n hear python=3.10 -y
+conda activate hear
+pip install -r requirements.txt
+```
+## 🚀 Quick Start
+```
+# download pretrained model weights
+export HF_ENDPOINT=https://hf-mirror.com  # For users in Mainland China, this is needed for HuggingFace downloads
+hf download earlab/EAR_HEAR --local-dir pretrained_models
+# Track 1: Single-Label Inference (Musicality)
+python inference.py \
+    --input_audio_path data_pipeline/origin_song_eval_dataset/mp3/0.mp3 \
+    --output_json_path output.json
+    --model_path pretrained_models/track_1.pth \
+    --model_config_path config_track_1.yaml
+# Track 2:
+python inference.py \
+    --input_audio_path data_pipeline/origin_song_eval_dataset/mp3/0.mp3 \
+    --output_json_path output.json
+    --model_path pretrained_models/track_2.pth \
+    --model_config_path config_track_2.yaml
+```
+## 🎯 Training
+### Step 1: Data Preparation
+First, prepare the dataset by running the data pipeline:
+```bash
+cd data_pipeline
+bash run.sh
+```
+This script will:
+1. **Download Dataset**: Download the [SongEval](https://huggingface.co/datasets/ASLP-lab/SongEval) dataset
+2. **Split Dataset**: Split the dataset into training and validation sets based on [the challenge's validation IDs
+](https://github.com/ASLP-lab/Automatic-Song-Aesthetics-Evaluation-Challenge/blob/main/static/val_ids.txt)
+3. **Audio Augmentation**: Apply audio augmentation to the training set
+4. **Extract Features**: Extract MuQ and MusicFM features for both training and test sets
+5. **Generate PKL Files**: Generate `train_set.pkl` and `test_set.pkl` files for training and evaluation
+### Step 2: Model Training
+After data preparation, you can train the HEAR model for either Track 1 (single-label: Musicality) or Track 2 (multi-label: 5 dimensions).
+#### Track 1: Single-Label Training (Musicality)
+Train the model for musicality prediction:
+```bash
+python train_track_1.py \
+    --experiment_name track1_exp \
+    --train-data /path/to/train_set.pkl \
+    --test-data /path/to/test_set.pkl \
+    --max-epoch 60 \
+    --batch-size 8 \
+    --lr 1e-5 \
+    --weight_decay 1e-3 \
+    --accum_steps 4 \
+    --lambda 0.15 \
+    --workers 8 \
+    --seed 0
+```
+#### Track 2: Multi-Label Training (5 Dimensions)
+Train the model for multi-dimensional aesthetic evaluation:
+```bash
+python train_track_2.py \
+    --experiment_name track2_exp \
+    --train-data /path/to/train_set.pkl \
+    --test-data /path/to/test_set.pkl \
+    --max-epoch 60 \
+    --batch-size 8 \
+    --lr 1e-5 \
+    --weight_decay 1e-3 \
+    --accum_steps 4 \
+    --lambda 0.05 \
+    --workers 8 \
+    --seed 0
+```
+#### Key Parameters
+* `--max-epoch`: Maximum number of training epochs (default: 60)
+* `--batch-size`: Batch size for training (default: 8)
+* `--experiment_name`: Name of the experiment for saving models and logs
+* `--lr`: Learning rate (default: 1e-5)
+* `--weight_decay`: Weight decay for optimizer (default: 1e-3)
+* `--accum_steps`: Gradient accumulation steps (default: 4)
+* `--lambda`: Weight for ranking loss (Track 1: 0.15, Track 2: 0.05)
+* `--workers`: Number of data loading workers (default: 8)
+* `--seed`: Random seed for reproducibility (default: 0)
+* `--train-data`: Path to training data pkl file (default: `data_pipeline/dataset_pkl/train_set.pkl`)
+* `--test-data`: Path to test data pkl file (default: `data_pipeline/dataset_pkl/test_set.pkl`)
+* `--log-dir`: Path to tensorboard log directory (default: `./log/tensorboard_records/{experiment_name}`)
+#### Evaluation Mode
+To evaluate a trained model, use the `--eval` flag:
+```bash
+python train_track_1.py --eval --experiment_name track1_exp
+python train_track_2.py --eval --experiment_name track2_exp
+```
+#### Model Configuration
+Model architectures are configured in:
+* `config_track_1.yaml` - Configuration for Track 1
+* `config_track_2.yaml` - Configuration for Track 2
+Trained models are saved in `log/models/{experiment_name}/model.pth`, and training logs are saved to TensorBoard in `./log/tensorboard_records/{experiment_name}/` (or custom path specified by `--log-dir`).
+## 🙏 Acknowledgement
+We sincerely thank the authors and contributors of the following open-source projects.:
+* **[SongEval](https://github.com/ASLP-lab/SongEval)**
+* **[SongFormer](https://github.com/ASLP-lab/SongFormer)**
+* **[Audiomentations](https://github.com/iver56/audiomentations)**
+* **[Wespeaker](https://github.com/wenet-e2e/wespeaker)**
+* **[allRank](https://github.com/allegro/allRank)**
+We would like to express our special thanks to **Shizhe Chen** from **Shanghai Conservatory of Music** for his invaluable guidance and insights on music aesthetics.
+## 📚 Citation
+```bibtex
+@misc{liu2025hearhierarchicallyenhancedaesthetic,
+      title={Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation},
+      author={Shuyang Liu and Yuan Jin and Rui Lin and Shizhe Chen and Junyu Dai and Tao Jiang},
+      year={2025},
+      eprint={2511.18869},
+      archivePrefix={arXiv},
+      primaryClass={cs.SD},
+      url={https://arxiv.org/abs/2511.18869},
+}
+```