File size: 1,415 Bytes

ce225f0
d0af2eb
ce225f0

---
pipeline_tag: other
language: en
library_name: pytorch
license: apache-2.0
tags:
  - music
  - midi
  - mir
  - deduplication
  - caugbert
model-index:
  - name: LMD Deduplication - CAugBERT
    results: 
    - task:
        type: representation-learning
        name: symbolic music representation learning
      dataset:
        type: midi
        name: Lakh MIDI Dataset 
      metrics:
        - type: F1
          value: 0.493
---

# LMD Deduplication Supplements
This repository provides the pre-trained CAugBERT model checkpoint used in: 
**"On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)**  
[[Paper]](https://ismir2025program.ismir.net/poster_188.html) | [[GitHub Code]](https://github.com/jech2/LMD_Deduplication)

---

# Usage
You can either integrate this checkpoint into the main repository for inference, or load it directly:
```bash
# Option 1: Run inference in the main repo
poetry run python inference.py  # make sure yamls/inference.yaml paths are correct
```
```python
# Option 2: Load checkpoint manually
import torch
from contrastive_musicbert.model.BERT import BERT_Lightning

model = BERT_Lightning(...).to(device)  # see .hydra/config.yaml for arguments
checkpoint = torch.load(checkpoint_path, map_location="cpu")
model.load_state_dict(checkpoint['state_dict'])
```

# Note
If you have any questions regarding the checkpoint, please contact:
Eunjin Choi (jech@kaist.ac.kr)