STRUM β Spectral Transcription & Rhythm Understanding Model
End-to-end pipeline that turns a song (.wav / .mp3 / .ogg) into a fully
playable Clone Hero / YARG chart package: drums, guitar, bass, vocals (with
lyrics), and keys.
Source: https://github.com/opria123/strum
What's in this repo
| Folder | What it is | Used by |
|---|---|---|
drums/drums_v14/ |
TwoStageDrumsCRNN onset detector (mel input, 22050 Hz) | batch_infer_hybrid.py Stage 1 |
drums/drums_mc_onset/ |
Multi-class onset head fine-tuned on V14 backbone | Stage-1 alt head |
drums/drums_phase3/ |
Phase-3 multi-class rescue model | Late-stage rescue / reclassify |
drums/drums_cymbal_onset/ |
Cymbal-specialist onset head | Cymbal-specific rescue |
drums/tom_refinement_demucs/ |
Tom vs. cymbal CNN running on Demucs drum stem | Tom/cymbal disambiguation |
drums_classifier_ensemble/ |
6-model OnsetClassifier ensemble (V2, V4, V6, V12c, V15, V16) + V17 | Per-onset 8-lane classification |
guitar/guitar_v2_onset/ |
Guitar onset CRNN (Event F1 0.81) | Hybrid guitar pipeline |
guitar/fret_mapper_v4.pt |
Pitch β 5-fret mapper (replaces librosa rule mapper) | Hybrid guitar pipeline |
section_classifier/ |
Verse/chorus/bridge section labeler | Chart sections |
Performance
Held-out test set (from 3,299 human-authored Pro Drum charts):
| Component | Metric | Score |
|---|---|---|
| Drums onset detection (V14) | Frame F1 | 93.9% |
| Drums lane classification (6-ensemble) | Per-onset F1 | 85.2% |
End-to-end vs ground-truth Clone Hero / YARG charts on an in-envelope
benchmark of 29 songs sampled from a 3,299-song held-out pool. Songs were
pre-screened with a single audio-feature gate (median Demucs htdemucs_6s
drum-stem RMS β₯ 0.018, 1 s windows at 22050 Hz mono). Eval is Expert
difficulty, Β±100 ms tolerance, with a per-song global offset search
(Β±200 ms / 10 ms steps).
| Instrument | F1 | Precision | Recall |
|---|---|---|---|
| Drums | 83.8% | 82.4% | 85.4% |
| Guitar | 65.1% | 74.5% | 57.8% |
| Bass | 69.4% | 65.8% | 73.4% |
| Vocals | 53.9% | 63.2% | 47.0% |
See the source repo's benchmark_results.json for per-song breakdown and
scripts/eval_benchmark.py for the harness.
Usage
The checkpoints are loaded by the STRUM pipeline scripts. Clone the repo and
download the checkpoints into checkpoints/ preserving the layout:
git clone https://github.com/opria123/strum
cd strum
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Pull weights from the Hub
huggingface-cli download opria123/strum --local-dir checkpoints/ \
--local-dir-use-symlinks False
# Run the full pipeline on a folder of audio files
python scripts/batch_pipeline.py /path/to/songs /path/to/charts
The pipeline expects this layout (mirrors the drums/ and guitar/
subfolders here, just under checkpoints/):
checkpoints/
βββ drums_v14/best.pt
βββ drums_mc_onset/best.pt
βββ drums_phase3/best.pt
βββ drums_cymbal_onset/best_union_f1.pt
βββ tom_refinement_demucs/best.pt
βββ onset_classifier/best_f1.pt
βββ onset_classifier_v4/best_f1.pt
βββ onset_classifier_v6/best_f1.pt
βββ onset_classifier_v12_clean/best_f1.pt
βββ onset_classifier_v12c_community/best_f1.pt
βββ onset_classifier_v15/best_f1.pt
βββ onset_classifier_v16/best_f1.pt
βββ onset_classifier_v17/best_f1.pt
βββ guitar_v2/guitar_v2_onset/best.pt
βββ fret_mapper_v4.pt
βββ section_classifier/best.pt
A small reorganisation script scripts/sync_from_hf.sh in the source repo
handles the drums/ β flat-checkpoints/ mapping.
License
MIT. See the source repository for full attribution of the underlying training data (Clone Hero / YARG community charters) and dependencies (Demucs v4, librosa, OpenAI Whisper, Spotify Basic Pitch).