| --- |
| license: mit |
| tags: |
| - audio |
| - music |
| - midi |
| - drum-transcription |
| - guitar-transcription |
| - clone-hero |
| - yarg |
| library_name: pytorch |
| --- |
| |
| # STRUM β Spectral Transcription & Rhythm Understanding Model |
|
|
| End-to-end pipeline that turns a song (`.wav` / `.mp3` / `.ogg`) into a fully |
| playable Clone Hero / YARG chart package: drums, guitar, bass, vocals (with |
| lyrics), and keys. |
|
|
| Source: <https://github.com/opria123/strum> |
|
|
| ## What's in this repo |
|
|
| | Folder | What it is | Used by | |
| |--------|------------|---------| |
| | `drums/drums_v14/` | TwoStageDrumsCRNN onset detector (mel input, 22050 Hz) | `batch_infer_hybrid.py` Stage 1 | |
| | `drums/drums_mc_onset/` | Multi-class onset head fine-tuned on V14 backbone | Stage-1 alt head | |
| | `drums/drums_phase3/` | Phase-3 multi-class rescue model | Late-stage rescue / reclassify | |
| | `drums/drums_cymbal_onset/` | Cymbal-specialist onset head | Cymbal-specific rescue | |
| | `drums/tom_refinement_demucs/` | Tom vs. cymbal CNN running on Demucs drum stem | Tom/cymbal disambiguation | |
| | `drums_classifier_ensemble/` | 6-model OnsetClassifier ensemble (V2, V4, V6, V12c, V15, V16) + V17 | Per-onset 8-lane classification | |
| | `guitar/guitar_v2_onset/` | Guitar onset CRNN (Event F1 0.81) | Hybrid guitar pipeline | |
| | `guitar/fret_mapper_v4.pt` | Pitch β 5-fret mapper (replaces librosa rule mapper) | Hybrid guitar pipeline | |
| | `section_classifier/` | Verse/chorus/bridge section labeler | Chart sections | |
|
|
| ## Performance |
|
|
| Held-out test set (from 3,299 human-authored Pro Drum charts): |
|
|
| | Component | Metric | Score | |
| |-----------|--------|-------| |
| | Drums onset detection (V14) | Frame F1 | 93.9% | |
| | Drums lane classification (6-ensemble) | Per-onset F1 | 85.2% | |
|
|
| End-to-end vs ground-truth Clone Hero / YARG charts on an **in-envelope |
| benchmark** of 29 songs sampled from a 3,299-song held-out pool. Songs were |
| pre-screened with a single audio-feature gate (median Demucs `htdemucs_6s` |
| drum-stem RMS β₯ 0.018, 1 s windows at 22050 Hz mono). Eval is Expert |
| difficulty, Β±100 ms tolerance, with a per-song global offset search |
| (Β±200 ms / 10 ms steps). |
|
|
| | Instrument | F1 | Precision | Recall | |
| |------------|-------|-----------|--------| |
| | Drums | 83.8% | 82.4% | 85.4% | |
| | Guitar | 65.1% | 74.5% | 57.8% | |
| | Bass | 69.4% | 65.8% | 73.4% | |
| | Vocals | 53.9% | 63.2% | 47.0% | |
|
|
| See the source repo's `benchmark_results.json` for per-song breakdown and |
| `scripts/eval_benchmark.py` for the harness. |
|
|
| ## Usage |
|
|
| The checkpoints are loaded by the STRUM pipeline scripts. Clone the repo and |
| download the checkpoints into `checkpoints/` preserving the layout: |
|
|
| ```bash |
| git clone https://github.com/opria123/strum |
| cd strum |
| python -m venv .venv && source .venv/bin/activate |
| pip install -e . |
| |
| # Pull weights from the Hub |
| huggingface-cli download opria123/strum --local-dir checkpoints/ \ |
| --local-dir-use-symlinks False |
| |
| # Run the full pipeline on a folder of audio files |
| python scripts/batch_pipeline.py /path/to/songs /path/to/charts |
| ``` |
|
|
| The pipeline expects this layout (mirrors the `drums/` and `guitar/` |
| subfolders here, just under `checkpoints/`): |
|
|
| ``` |
| checkpoints/ |
| βββ drums_v14/best.pt |
| βββ drums_mc_onset/best.pt |
| βββ drums_phase3/best.pt |
| βββ drums_cymbal_onset/best_union_f1.pt |
| βββ tom_refinement_demucs/best.pt |
| βββ onset_classifier/best_f1.pt |
| βββ onset_classifier_v4/best_f1.pt |
| βββ onset_classifier_v6/best_f1.pt |
| βββ onset_classifier_v12_clean/best_f1.pt |
| βββ onset_classifier_v12c_community/best_f1.pt |
| βββ onset_classifier_v15/best_f1.pt |
| βββ onset_classifier_v16/best_f1.pt |
| βββ onset_classifier_v17/best_f1.pt |
| βββ guitar_v2/guitar_v2_onset/best.pt |
| βββ fret_mapper_v4.pt |
| βββ section_classifier/best.pt |
| ``` |
|
|
| A small reorganisation script `scripts/sync_from_hf.sh` in the source repo |
| handles the `drums/` β flat-checkpoints/ mapping. |
|
|
| ## License |
|
|
| MIT. See the source repository for full attribution of the underlying |
| training data (Clone Hero / YARG community charters) and dependencies |
| (Demucs v4, librosa, OpenAI Whisper, Spotify Basic Pitch). |
|
|