Feature Extraction
Transformers
PyTorch
English
apex
music
audio
popularity-prediction
aesthetic-quality
multi-task-learning
mert
ai-generated-music
suno
udio
custom_code
Instructions to use amaai-lab/apex with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amaai-lab/apex with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="amaai-lab/apex", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("amaai-lab/apex", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - music | |
| - audio | |
| - popularity-prediction | |
| - aesthetic-quality | |
| - multi-task-learning | |
| - mert | |
| - ai-generated-music | |
| - suno | |
| - udio | |
| language: | |
| - en | |
| library_name: transformers | |
| # APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music | |
| APEX is the first large-scale multi-task learning framework for jointly predicting **popularity** and **aesthetic quality** of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) audio embeddings. | |
| --- | |
| ## What does APEX predict? | |
| Given any audio file, APEX predicts 7 scores: | |
| **Popularity:** | |
| | Score | Range | Description | | |
| |---|---|---| | |
| | `score_streams` | 0β100 | Predicted streaming engagement score | | |
| | `score_likes` | 0β100 | Predicted likes engagement score | | |
| **Aesthetic Quality (from [SongEval](https://github.com/ASLP-lab/SongEval)):** | |
| | Score | Range | Description | | |
| |---|---|---| | |
| | `coherence` | 1β5 | Structural and harmonic coherence | | |
| | `musicality` | 1β5 | Overall musical quality | | |
| | `memorability` | 1β5 | How memorable the song is | | |
| | `clarity` | 1β5 | Clarity of production and mix | | |
| | `naturalness` | 1β5 | Naturalness of the generated audio | | |
| --- | |
| ## Architecture | |
|  | |
| --- | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| pip uninstall -y torch torchvision torchaudio transformers -q | |
| pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 | |
| pip install transformers soundfile librosa "numpy<2" "scipy<1.16" | |
| ``` | |
| ### Inference | |
| ```python | |
| from transformers import AutoModel | |
| import torch | |
| model = AutoModel.from_pretrained( | |
| "amaai-lab/apex", | |
| trust_remote_code = True, | |
| device_map = None, | |
| low_cpu_mem_usage = False, | |
| ignore_mismatched_sizes = True | |
| ) | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model = model.to(device) | |
| results = model.predict("/path/to/your/mp3/file", save_json="results.json") | |
| print(f"Streams Score : {results['score_streams']:.2f}") | |
| print(f"Likes Score : {results['score_likes']:.2f}") | |
| print(f"Coherence : {results['coherence']:.2f}") | |
| print(f"Musicality : {results['musicality']:.2f}") | |
| print(f"Memorability : {results['memorability']:.2f}") | |
| print(f"Clarity : {results['clarity']:.2f}") | |
| print(f"Naturalness : {results['naturalness']:.2f}") | |
| ``` | |
| --- | |
| ## Citation | |
| ```bash | |
| @misc{husain2026apexlargescalemultitaskaestheticinformed, | |
| title={APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music}, | |
| author={Jaavid Aktar Husain and Dorien Herremans}, | |
| year={2026}, | |
| eprint={2605.03395}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.SD}, | |
| url={https://arxiv.org/abs/2605.03395}, | |
| } | |
| ``` |