| # Adversarial-MidiBERT | |
| The description is generated by Grok3. | |
| ## Model Details | |
| - **Model Name**: Adversarial-MidiBERT | |
| - **Model Type**: Transformer-based model for symbolic music understanding | |
| - **Version**: 1.0 | |
| - **Release Date**: August 2025 | |
| - **Developers**: Zijian Zhao | |
| - **Organization**: SYSU | |
| - **License**: Apache License 2.0 | |
| - **Paper**: [Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training](https://dl.acm.org/doi/abs/10.1145/3731715.3733483), ACM ICMR 2025 | |
| - **Arxiv**: https://arxiv.org/abs/2407.08306 | |
| - Citation: | |
| ``` | |
| @inproceedings{zhao2025let, | |
| title={Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training}, | |
| author={Zhao, Zijian}, | |
| booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval}, | |
| pages={2128--2132}, | |
| year={2025} | |
| } | |
| ``` | |
| - **Contact**: zhaozj28@mail2.sysu.edu.cn | |
| - **Repository**: https://github.com/RS2002/Adversarial-MidiBERT | |
| ## Model Description | |
| Adversarial-MidiBERT is a transformer-based model designed for symbolic music understanding, leveraging large-scale adversarial pre-training. It builds upon the [MidiBERT-Piano](https://github.com/wazenmai/MIDI-BERT) framework and extends it with adversarial pre-training techniques to enhance performance on music-related tasks. The model processes symbolic music data in an octuple format and can be fine-tuned for various downstream tasks such as music generation, classification, and analysis. | |
| - **Architecture**: Transformer-based (based on MidiBERT) | |
| - **Input Format**: Octuple representation of symbolic music (batch_size, sequence_length, 8) | |
| - **Output Format**: Hidden states of dimension [batch_size, sequence_length, 768] | |
| - **Hidden Size**: 768 | |
| - **Training Objective**: Adversarial pre-training followed by task-specific fine-tuning | |
| - **Tasks Supported**: Symbolic music understanding tasks | |
| ## Training Data | |
| The model was pre-trained and fine-tuned on the following datasets: | |
| - **POP1K7**: A dataset of popular music MIDI files. | |
| - **POP909**: A dataset of 909 pop songs in MIDI format. | |
| - **Pinaist8**: A dataset of piano performances. | |
| - **EMOPIA**: A dataset for emotion-based music analysis. | |
| - **GiantMIDI**: A large-scale MIDI dataset. | |
| For details on dataset preprocessing and dictionary files, refer to the [PianoBART repository](https://github.com/RS2002/PianoBart). Pre-training data should be placed in `./Data/output_pretrain`. | |
| ## Usage | |
| ### Installation | |
| ```shell | |
| git clone https://huggingface.co/RS2002/Adversarial-MidiBERT | |
| ``` | |
| Please ensure that the `model.py` and `Octuple.pkl` files are located in the same folder. | |
| ### Example Code | |
| ```python | |
| import torch | |
| from model import Adversarial_MidiBERT | |
| # Load the model | |
| model = Adversarial_MidiBERT.from_pretrained("RS2002/Adversarial-MidiBERT") | |
| # Example input | |
| input_ids = torch.randint(0, 10, (2, 1024, 8)) | |
| attention_mask = torch.zeros((2, 1024)) | |
| # Forward pass | |
| y = model(input_ids, attention_mask) | |
| print(y.last_hidden_state.shape) # Output: [2, 1024, 768] | |
| ``` |