--- pipeline_tag: other language: en library_name: pytorch license: apache-2.0 tags: - music - midi - mir - deduplication - caugbert model-index: - name: LMD Deduplication - CAugBERT results: - task: type: representation-learning name: symbolic music representation learning dataset: type: midi name: Lakh MIDI Dataset metrics: - type: F1 value: 0.493 --- # LMD Deduplication Supplements This repository provides the pre-trained CAugBERT model checkpoint used in: **"On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)** [[Paper]](https://ismir2025program.ismir.net/poster_188.html) | [[GitHub Code]](https://github.com/jech2/LMD_Deduplication) --- # Usage You can either integrate this checkpoint into the main repository for inference, or load it directly: ```bash # Option 1: Run inference in the main repo poetry run python inference.py # make sure yamls/inference.yaml paths are correct ``` ```python # Option 2: Load checkpoint manually import torch from contrastive_musicbert.model.BERT import BERT_Lightning model = BERT_Lightning(...).to(device) # see .hydra/config.yaml for arguments checkpoint = torch.load(checkpoint_path, map_location="cpu") model.load_state_dict(checkpoint['state_dict']) ``` # Note If you have any questions regarding the checkpoint, please contact: Eunjin Choi (jech@kaist.ac.kr)