jech2
/

lmd-dedup-caugbert

Model card Files Files and versions

jech2 commited on Sep 29, 2025

Commit

ce225f0

·

1 Parent(s): cd05ac4

update readme

Files changed (1) hide show

README.md +51 -3

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e2ee0d54961d9df216b42e5bc729def934387e61abf8563eb62a04b552708e3e
-size 1435

+---
+pipeline_tag: symbolic-music-retrieval
+language: en
+library_name: pytorch
+license: apache-2.0
+tags:
+  - music
+  - midi
+  - mir
+  - deduplication
+  - caugbert
+model-index:
+  - name: LMD Deduplication - CAugBERT
+    results:
+    - task:
+        type: representation-learning
+        name: symbolic music representation learning
+      dataset:
+        type: midi
+        name: Lakh MIDI Dataset
+      metrics:
+        - type: F1
+          value: 0.493
+---
+# LMD Deduplication Supplements
+This repository provides the pre-trained CAugBERT model checkpoint used in:
+**"On the De-duplication of the Lakh MIDI Dataset" (ISMIR 2025)**
+[[Paper]](https://ismir2025program.ismir.net/poster_188.html) | [[GitHub Code]](https://github.com/jech2/LMD_Deduplication)
+---
+# Usage
+You can either integrate this checkpoint into the main repository for inference, or load it directly:
+```bash
+# Option 1: Run inference in the main repo
+poetry run python inference.py  # make sure yamls/inference.yaml paths are correct
+```
+```python
+# Option 2: Load checkpoint manually
+import torch
+from contrastive_musicbert.model.BERT import BERT_Lightning
+model = BERT_Lightning(...).to(device)  # see .hydra/config.yaml for arguments
+checkpoint = torch.load(checkpoint_path, map_location="cpu")
+model.load_state_dict(checkpoint['state_dict'])
+```
+# Note
+If you have any questions regarding the checkpoint, please contact:
+Eunjin Choi (jech@kaist.ac.kr)