RS2002
/

Adversarial-MidiBERT

Model card Files Files and versions

Adversarial-MidiBERT / README.md

RS2002's picture

Update README.md

62856ac verified 6 months ago

|

history blame contribute delete

3.11 kB

	# Adversarial-MidiBERT

	The description is generated by Grok3.



	## Model Details

	- Model Name: Adversarial-MidiBERT

	- Model Type: Transformer-based model for symbolic music understanding

	- Version: 1.0

	- Release Date: August 2025

	- Developers: Zijian Zhao

	- Organization: SYSU

	- License: Apache License 2.0

	- Paper: [Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training](https://dl.acm.org/doi/abs/10.1145/3731715.3733483), ACM ICMR 2025

	- Arxiv: https://arxiv.org/abs/2407.08306

	- Citation:

	```
	@inproceedings{zhao2025let,
	title={Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training},
	author={Zhao, Zijian},
	booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval},
	pages={2128--2132},
	year={2025}
	}
	```

	- Contact: zhaozj28@mail2.sysu.edu.cn

	- Repository: https://github.com/RS2002/Adversarial-MidiBERT



	## Model Description

	Adversarial-MidiBERT is a transformer-based model designed for symbolic music understanding, leveraging large-scale adversarial pre-training. It builds upon the [MidiBERT-Piano](https://github.com/wazenmai/MIDI-BERT) framework and extends it with adversarial pre-training techniques to enhance performance on music-related tasks. The model processes symbolic music data in an octuple format and can be fine-tuned for various downstream tasks such as music generation, classification, and analysis.

	- Architecture: Transformer-based (based on MidiBERT)
	- Input Format: Octuple representation of symbolic music (batch_size, sequence_length, 8)
	- Output Format: Hidden states of dimension [batch_size, sequence_length, 768]
	- Hidden Size: 768
	- Training Objective: Adversarial pre-training followed by task-specific fine-tuning
	- Tasks Supported: Symbolic music understanding tasks

	## Training Data

	The model was pre-trained and fine-tuned on the following datasets:

	- POP1K7: A dataset of popular music MIDI files.
	- POP909: A dataset of 909 pop songs in MIDI format.
	- Pinaist8: A dataset of piano performances.
	- EMOPIA: A dataset for emotion-based music analysis.
	- GiantMIDI: A large-scale MIDI dataset.

	For details on dataset preprocessing and dictionary files, refer to the [PianoBART repository](https://github.com/RS2002/PianoBart). Pre-training data should be placed in `./Data/output_pretrain`.



	## Usage

	### Installation

	```shell
	git clone https://huggingface.co/RS2002/Adversarial-MidiBERT
	```

	Please ensure that the `model.py` and `Octuple.pkl` files are located in the same folder.

	### Example Code

	```python
	import torch
	from model import Adversarial_MidiBERT

	# Load the model
	model = Adversarial_MidiBERT.from_pretrained("RS2002/Adversarial-MidiBERT")

	# Example input
	input_ids = torch.randint(0, 10, (2, 1024, 8))
	attention_mask = torch.zeros((2, 1024))

	# Forward pass
	y = model(input_ids, attention_mask)
	print(y.last_hidden_state.shape) # Output: [2, 1024, 768]
	```