--- license: mit tags: - audio - music - midi - drum-transcription - guitar-transcription - clone-hero - yarg library_name: pytorch --- # STRUM — Spectral Transcription & Rhythm Understanding Model End-to-end pipeline that turns a song (`.wav` / `.mp3` / `.ogg`) into a fully playable Clone Hero / YARG chart package: drums, guitar, bass, vocals (with lyrics), and keys. Source: ## What's in this repo | Folder | What it is | Used by | |--------|------------|---------| | `drums/drums_v14/` | TwoStageDrumsCRNN onset detector (mel input, 22050 Hz) | `batch_infer_hybrid.py` Stage 1 | | `drums/drums_mc_onset/` | Multi-class onset head fine-tuned on V14 backbone | Stage-1 alt head | | `drums/drums_phase3/` | Phase-3 multi-class rescue model | Late-stage rescue / reclassify | | `drums/drums_cymbal_onset/` | Cymbal-specialist onset head | Cymbal-specific rescue | | `drums/tom_refinement_demucs/` | Tom vs. cymbal CNN running on Demucs drum stem | Tom/cymbal disambiguation | | `drums_classifier_ensemble/` | 6-model OnsetClassifier ensemble (V2, V4, V6, V12c, V15, V16) + V17 | Per-onset 8-lane classification | | `guitar/guitar_v2_onset/` | Guitar onset CRNN (Event F1 0.81) | Hybrid guitar pipeline | | `guitar/fret_mapper_v4.pt` | Pitch → 5-fret mapper (replaces librosa rule mapper) | Hybrid guitar pipeline | | `section_classifier/` | Verse/chorus/bridge section labeler | Chart sections | ## Performance Held-out test set (from 3,299 human-authored Pro Drum charts): | Component | Metric | Score | |-----------|--------|-------| | Drums onset detection (V14) | Frame F1 | 93.9% | | Drums lane classification (6-ensemble) | Per-onset F1 | 85.2% | End-to-end vs ground-truth Clone Hero / YARG charts on an **in-envelope benchmark** of 29 songs sampled from a 3,299-song held-out pool. Songs were pre-screened with a single audio-feature gate (median Demucs `htdemucs_6s` drum-stem RMS ≥ 0.018, 1 s windows at 22050 Hz mono). Eval is Expert difficulty, ±100 ms tolerance, with a per-song global offset search (±200 ms / 10 ms steps). | Instrument | F1 | Precision | Recall | |------------|-------|-----------|--------| | Drums | 83.8% | 82.4% | 85.4% | | Guitar | 65.1% | 74.5% | 57.8% | | Bass | 69.4% | 65.8% | 73.4% | | Vocals | 53.9% | 63.2% | 47.0% | See the source repo's `benchmark_results.json` for per-song breakdown and `scripts/eval_benchmark.py` for the harness. ## Usage The checkpoints are loaded by the STRUM pipeline scripts. Clone the repo and download the checkpoints into `checkpoints/` preserving the layout: ```bash git clone https://github.com/opria123/strum cd strum python -m venv .venv && source .venv/bin/activate pip install -e . # Pull weights from the Hub huggingface-cli download opria123/strum --local-dir checkpoints/ \ --local-dir-use-symlinks False # Run the full pipeline on a folder of audio files python scripts/batch_pipeline.py /path/to/songs /path/to/charts ``` The pipeline expects this layout (mirrors the `drums/` and `guitar/` subfolders here, just under `checkpoints/`): ``` checkpoints/ ├── drums_v14/best.pt ├── drums_mc_onset/best.pt ├── drums_phase3/best.pt ├── drums_cymbal_onset/best_union_f1.pt ├── tom_refinement_demucs/best.pt ├── onset_classifier/best_f1.pt ├── onset_classifier_v4/best_f1.pt ├── onset_classifier_v6/best_f1.pt ├── onset_classifier_v12_clean/best_f1.pt ├── onset_classifier_v12c_community/best_f1.pt ├── onset_classifier_v15/best_f1.pt ├── onset_classifier_v16/best_f1.pt ├── onset_classifier_v17/best_f1.pt ├── guitar_v2/guitar_v2_onset/best.pt ├── fret_mapper_v4.pt └── section_classifier/best.pt ``` A small reorganisation script `scripts/sync_from_hf.sh` in the source repo handles the `drums/` → flat-checkpoints/ mapping. ## License MIT. See the source repository for full attribution of the underlying training data (Clone Hero / YARG community charters) and dependencies (Demucs v4, librosa, OpenAI Whisper, Spotify Basic Pitch).