strum / README.md
opria123's picture
docs: model card
6f8997f verified
---
license: mit
tags:
- audio
- music
- midi
- drum-transcription
- guitar-transcription
- clone-hero
- yarg
library_name: pytorch
---
# STRUM β€” Spectral Transcription & Rhythm Understanding Model
End-to-end pipeline that turns a song (`.wav` / `.mp3` / `.ogg`) into a fully
playable Clone Hero / YARG chart package: drums, guitar, bass, vocals (with
lyrics), and keys.
Source: <https://github.com/opria123/strum>
## What's in this repo
| Folder | What it is | Used by |
|--------|------------|---------|
| `drums/drums_v14/` | TwoStageDrumsCRNN onset detector (mel input, 22050 Hz) | `batch_infer_hybrid.py` Stage 1 |
| `drums/drums_mc_onset/` | Multi-class onset head fine-tuned on V14 backbone | Stage-1 alt head |
| `drums/drums_phase3/` | Phase-3 multi-class rescue model | Late-stage rescue / reclassify |
| `drums/drums_cymbal_onset/` | Cymbal-specialist onset head | Cymbal-specific rescue |
| `drums/tom_refinement_demucs/` | Tom vs. cymbal CNN running on Demucs drum stem | Tom/cymbal disambiguation |
| `drums_classifier_ensemble/` | 6-model OnsetClassifier ensemble (V2, V4, V6, V12c, V15, V16) + V17 | Per-onset 8-lane classification |
| `guitar/guitar_v2_onset/` | Guitar onset CRNN (Event F1 0.81) | Hybrid guitar pipeline |
| `guitar/fret_mapper_v4.pt` | Pitch β†’ 5-fret mapper (replaces librosa rule mapper) | Hybrid guitar pipeline |
| `section_classifier/` | Verse/chorus/bridge section labeler | Chart sections |
## Performance
Held-out test set (from 3,299 human-authored Pro Drum charts):
| Component | Metric | Score |
|-----------|--------|-------|
| Drums onset detection (V14) | Frame F1 | 93.9% |
| Drums lane classification (6-ensemble) | Per-onset F1 | 85.2% |
End-to-end vs ground-truth Clone Hero / YARG charts on an **in-envelope
benchmark** of 29 songs sampled from a 3,299-song held-out pool. Songs were
pre-screened with a single audio-feature gate (median Demucs `htdemucs_6s`
drum-stem RMS β‰₯ 0.018, 1 s windows at 22050 Hz mono). Eval is Expert
difficulty, Β±100 ms tolerance, with a per-song global offset search
(Β±200 ms / 10 ms steps).
| Instrument | F1 | Precision | Recall |
|------------|-------|-----------|--------|
| Drums | 83.8% | 82.4% | 85.4% |
| Guitar | 65.1% | 74.5% | 57.8% |
| Bass | 69.4% | 65.8% | 73.4% |
| Vocals | 53.9% | 63.2% | 47.0% |
See the source repo's `benchmark_results.json` for per-song breakdown and
`scripts/eval_benchmark.py` for the harness.
## Usage
The checkpoints are loaded by the STRUM pipeline scripts. Clone the repo and
download the checkpoints into `checkpoints/` preserving the layout:
```bash
git clone https://github.com/opria123/strum
cd strum
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Pull weights from the Hub
huggingface-cli download opria123/strum --local-dir checkpoints/ \
--local-dir-use-symlinks False
# Run the full pipeline on a folder of audio files
python scripts/batch_pipeline.py /path/to/songs /path/to/charts
```
The pipeline expects this layout (mirrors the `drums/` and `guitar/`
subfolders here, just under `checkpoints/`):
```
checkpoints/
β”œβ”€β”€ drums_v14/best.pt
β”œβ”€β”€ drums_mc_onset/best.pt
β”œβ”€β”€ drums_phase3/best.pt
β”œβ”€β”€ drums_cymbal_onset/best_union_f1.pt
β”œβ”€β”€ tom_refinement_demucs/best.pt
β”œβ”€β”€ onset_classifier/best_f1.pt
β”œβ”€β”€ onset_classifier_v4/best_f1.pt
β”œβ”€β”€ onset_classifier_v6/best_f1.pt
β”œβ”€β”€ onset_classifier_v12_clean/best_f1.pt
β”œβ”€β”€ onset_classifier_v12c_community/best_f1.pt
β”œβ”€β”€ onset_classifier_v15/best_f1.pt
β”œβ”€β”€ onset_classifier_v16/best_f1.pt
β”œβ”€β”€ onset_classifier_v17/best_f1.pt
β”œβ”€β”€ guitar_v2/guitar_v2_onset/best.pt
β”œβ”€β”€ fret_mapper_v4.pt
└── section_classifier/best.pt
```
A small reorganisation script `scripts/sync_from_hf.sh` in the source repo
handles the `drums/` β†’ flat-checkpoints/ mapping.
## License
MIT. See the source repository for full attribution of the underlying
training data (Clone Hero / YARG community charters) and dependencies
(Demucs v4, librosa, OpenAI Whisper, Spotify Basic Pitch).