abc2vec / README.md
pianistprogrammer's picture
Upload README.md with huggingface_hub
908f281 verified
---
license: mit
library_name: pytorch
tags:
- music
- folk-music
- irish-traditional-music
- abc-notation
- symbolic-music
- representation-learning
- self-supervised
- transformer
pipeline_tag: feature-extraction
---
# ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music
This is the official pre-trained ABC2Vec model from the paper:
**"ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music"**
## Model Description
ABC2Vec is a self-supervised Transformer encoder that learns dense, semantically meaningful embeddings from ABC notation (symbolic music format). It is specifically designed for Irish traditional folk music and trained on 211,524 tunes.
### Key Features
- 🎵 **Purpose-built for folk music** - Addresses transposition equivalence, modal tonality, and variant detection
- 🔄 **Transposition Invariance** - Novel TI objective for pitch-invariant representations
- 📊 **Bar-level Patchification** - 16× sequence length reduction for efficiency
- 🎯 **Self-supervised** - No text annotations or audio required
-**Efficient** - Trained in 18 hours on Apple M4 Mac
## Model Architecture
- **Layers:** 6
- **Hidden Size (d_model):** 256
- **Attention Heads:** 8
- **FFN Size (d_ff):** 1024
- **Embedding Size:** 128
- **Vocabulary Size:** 98
- **Max Bars:** 64
- **Max Bar Length:** 64
- **Parameters:** ~5M
## Training Details
- **Dataset:** 211,524 Irish traditional tunes (IrishMAN corpus)
- **Training Objectives:**
- Masked Music Modeling (MMM)
- Transposition Invariance (TI) contrastive learning
- **Training Steps:** 40,000 steps (40 epochs)
- **Final Validation Loss:** 2.36
- **Hardware:** Apple M4 Mac (48GB unified memory)
- **Training Time:** ~18 hours
## Performance
| Task | Accuracy | Notes |
|------|----------|-------|
| Tune Type Classification | 78.4% ± 1.2% | 6 classes (jig, reel, polka, etc.) |
| Mode Classification | 78.8% ± 1.6% | 4 classes (major, minor, dorian, mixolydian) |
| Key Root (Linear Probe) | 62.3% ± 0.9% | 8 most common keys |
| Tune Length (Linear Probe) | 89.5% ± 0.7% | 3 classes (short, medium, long) |
## Usage
```python
import torch
import json
from pathlib import Path
# Load model configuration
config_path = "model_config.json"
with open(config_path) as f:
config_dict = json.load(f)
# Initialize model (you'll need the ABC2Vec model code)
from abc2vec.core.model import ABC2VecModel
from abc2vec.core.model.encoder import ABC2VecConfig
config = ABC2VecConfig(**config_dict)
model = ABC2VecModel(config)
# Load pre-trained weights
checkpoint = torch.load("best_model.pt", map_location="cpu")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Load vocabulary for tokenization
with open("vocab.json") as f:
vocab_data = json.load(f)
# Extract embeddings for a tune
from abc2vec.core.tokenizer import ABCVocabulary, BarPatchifier
vocab = ABCVocabulary.load("vocab.json")
patchifier = BarPatchifier(
vocab=vocab,
max_bars=config.max_bars,
max_bar_length=config.max_bar_length
)
# Example ABC tune
abc_tune = "M:6/8\nK:D\n|:A2A ABc|ded cBA|A2A ABc|ded cAG|"
patches = patchifier.patchify(abc_tune)
# Get embedding
with torch.no_grad():
bar_indices = patches["bar_indices"].unsqueeze(0)
char_mask = patches["char_mask"].unsqueeze(0)
bar_mask = patches["bar_mask"].unsqueeze(0)
embedding = model.get_embedding(bar_indices, char_mask, bar_mask)
# embedding shape: (1, 128)
```
## Code Repository
Full training code, evaluation scripts, and usage examples:
- **GitHub:** https://github.com/pianistprogrammer/ABC2VEC
## Dataset
The processed dataset with train/validation/test splits:
- **HuggingFace:** https://huggingface.co/datasets/pianistprogrammer/abc2vec-irish-folk-dataset
## Citation
If you use this model, please cite:
```bibtex
@article{abc2vec2025,
title={ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music},
author={[Your Name]},
journal={[Journal Name]},
year={2025},
note={Model: https://huggingface.co/pianistprogrammer/abc2vec-model}
}
```
## License
MIT License
## Acknowledgements
We thank The Session community for curating and maintaining the Irish traditional music archive that made this work possible.
## Model Card Authors
[Your Name]
## Contact
For questions or issues:
- GitHub: https://github.com/pianistprogrammer/ABC2VEC
- HuggingFace: https://huggingface.co/pianistprogrammer