EAR_HEAR / README.md
earlab's picture
Update README.md
4e09c6b verified
---
license: apache-2.0
pipeline_tag: audio-classification
tags:
- music
- song
- aesthetics
- ASAE
---
# **HEAR**: Hierarchically Enhanced Aesthetic Representations for Multidimensional Music Evaluation
[**Paper**](https://arxiv.org/pdf/2511.18869) |
[**Model**](https://huggingface.co/earlab/EAR_HEAR)
<br>
Official PyTorch Implementation of ICASSP 2026 paper "HEAR: Hierarchically Enhanced Aesthetic Representations for Multidimensional Music Evaluation"
This repository contains the training and evaluation code for HEAR, a robust framework designed to address the challenges of multidimensional music aesthetic evaluation under limited labeled data.
![](HEAR.png)
## 🌟 Key Features
* **Excellent Performance**: Ranked 2nd/19 on Track 1 and 5th/17 on Track 2 in the [ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge](https://aslp-lab.github.io/Automatic-Song-Aesthetics-Evaluation-Challenge/).
* **Robustness**: Synergizes Multi-Source Multi-Scale Representations and Hierarchical Augmentation to capture robust features under limited labeled data.
* **Dual Capability**: Optimized for both exact score prediction and ranking (Top-Tier Song Identification).
## πŸ“¦ Installation
Clone the repository and install dependencies:
```
git clone https://github.com:Eps-Acoustic-Revolution-Lab/EAR_HEAR.git
git submodule update --init --recursive
conda create -n hear python=3.10 -y
conda activate hear
pip install -r requirements.txt
```
## πŸš€ Quick Start
```
# Download pretrained model weights
export HF_ENDPOINT=https://hf-mirror.com # For users in Mainland China, this is needed for HuggingFace downloads
hf download earlab/EAR_HEAR --local-dir pretrained_models
# Track 1: Single-Label Inference (Musicality)
python inference.py \
--input_audio_path data_pipeline/origin_song_eval_dataset/mp3/0.mp3 \
--output_json_path output.json
--model_path pretrained_models/track_1.pth \
--model_config_path config_track_1.yaml
# Track 2: Multi-Label Inference (5 Dimensions)
python inference.py \
--input_audio_path data_pipeline/origin_song_eval_dataset/mp3/0.mp3 \
--output_json_path output.json
--model_path pretrained_models/track_2.pth \
--model_config_path config_track_2.yaml
```
## 🎯 Training
### Step 1: Data Preparation
First, prepare the dataset by running the data pipeline:
```bash
cd data_pipeline
bash run.sh
```
This script will:
1. **Download Dataset**: Download the [SongEval](https://huggingface.co/datasets/ASLP-lab/SongEval) dataset
2. **Split Dataset**: Split the dataset into training and validation sets based on [the challenge's validation IDs
](https://github.com/ASLP-lab/Automatic-Song-Aesthetics-Evaluation-Challenge/blob/main/static/val_ids.txt)
3. **Audio Augmentation**: Apply audio augmentation to the training set
4. **Extract Features**: Extract MuQ and MusicFM features for both training and test sets
5. **Generate PKL Files**: Generate `train_set.pkl` and `test_set.pkl` files for training and evaluation
### Step 2: Model Training
After data preparation, you can train the HEAR model for either Track 1 (single-label: Musicality) or Track 2 (multi-label: 5 dimensions).
#### Track 1: Single-Label Training (Musicality)
Train the model for musicality prediction:
```bash
python train_track_1.py \
--experiment_name track1_exp \
--train-data /path/to/train_set.pkl \
--test-data /path/to/test_set.pkl \
--max-epoch 60 \
--batch-size 8 \
--lr 1e-5 \
--weight_decay 1e-3 \
--accum_steps 4 \
--lambda 0.15 \
--workers 8 \
--seed 0
```
#### Track 2: Multi-Label Training (5 Dimensions)
Train the model for multi-dimensional aesthetic evaluation:
```bash
python train_track_2.py \
--experiment_name track2_exp \
--train-data /path/to/train_set.pkl \
--test-data /path/to/test_set.pkl \
--max-epoch 60 \
--batch-size 8 \
--lr 1e-5 \
--weight_decay 1e-3 \
--accum_steps 4 \
--lambda 0.05 \
--workers 8 \
--seed 0
```
#### Key Parameters
* `--max-epoch`: Maximum number of training epochs (default: 60)
* `--batch-size`: Batch size for training (default: 8)
* `--experiment_name`: Name of the experiment for saving models and logs
* `--lr`: Learning rate (default: 1e-5)
* `--weight_decay`: Weight decay for optimizer (default: 1e-3)
* `--accum_steps`: Gradient accumulation steps (default: 4)
* `--lambda`: Weight for ranking loss (Track 1: 0.15, Track 2: 0.05)
* `--workers`: Number of data loading workers (default: 8)
* `--seed`: Random seed for reproducibility (default: 0)
* `--train-data`: Path to training data pkl file (default: `data_pipeline/dataset_pkl/train_set.pkl`)
* `--test-data`: Path to test data pkl file (default: `data_pipeline/dataset_pkl/test_set.pkl`)
* `--log-dir`: Path to tensorboard log directory (default: `./log/tensorboard_records/{experiment_name}`)
#### Evaluation Mode
To evaluate a trained model, use the `--eval` flag:
```bash
python train_track_1.py --eval --experiment_name track1_exp
python train_track_2.py --eval --experiment_name track2_exp
```
#### Model Configuration
Model architectures are configured in:
* `config_track_1.yaml` - Configuration for Track 1
* `config_track_2.yaml` - Configuration for Track 2
Trained models are saved in `log/models/{experiment_name}/model.pth`, and training logs are saved to TensorBoard in `./log/tensorboard_records/{experiment_name}/` (or custom path specified by `--log-dir`).
## πŸ™ Acknowledgement
We sincerely thank the authors and contributors of the following open-source projects.:
* **[SongEval](https://github.com/ASLP-lab/SongEval)**
* **[SongFormer](https://github.com/ASLP-lab/SongFormer)**
* **[Audiomentations](https://github.com/iver56/audiomentations)**
* **[Wespeaker](https://github.com/wenet-e2e/wespeaker)**
* **[allRank](https://github.com/allegro/allRank)**
We would like to express our special thanks to **Shizhe Chen** from **Shanghai Conservatory of Music** for his invaluable guidance and insights on music aesthetics.
## πŸ“š Citation
```bibtex
@misc{liu2025hearhierarchicallyenhancedaesthetic,
title={Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation},
author={Shuyang Liu and Yuan Jin and Rui Lin and Shizhe Chen and Junyu Dai and Tao Jiang},
year={2025},
eprint={2511.18869},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2511.18869},
}
```