---
license: mit
tags:
  - audio
  - music
  - midi
  - drum-transcription
  - guitar-transcription
  - clone-hero
  - yarg
library_name: pytorch
---

# STRUM — Spectral Transcription & Rhythm Understanding Model

End-to-end pipeline that turns a song (`.wav` / `.mp3` / `.ogg`) into a fully
playable Clone Hero / YARG chart package: drums, guitar, bass, vocals (with
lyrics), and keys.

Source: <https://github.com/opria123/strum>

## What's in this repo

| Folder | What it is | Used by |
|--------|------------|---------|
| `drums/drums_v14/`              | TwoStageDrumsCRNN onset detector (mel input, 22050 Hz) | `batch_infer_hybrid.py` Stage 1 |
| `drums/drums_mc_onset/`         | Multi-class onset head fine-tuned on V14 backbone | Stage-1 alt head |
| `drums/drums_phase3/`           | Phase-3 multi-class rescue model | Late-stage rescue / reclassify |
| `drums/drums_cymbal_onset/`     | Cymbal-specialist onset head | Cymbal-specific rescue |
| `drums/tom_refinement_demucs/`  | Tom vs. cymbal CNN running on Demucs drum stem | Tom/cymbal disambiguation |
| `drums_classifier_ensemble/`    | 6-model OnsetClassifier ensemble (V2, V4, V6, V12c, V15, V16) + V17 | Per-onset 8-lane classification |
| `guitar/guitar_v2_onset/`       | Guitar onset CRNN (Event F1 0.81) | Hybrid guitar pipeline |
| `guitar/fret_mapper_v4.pt`      | Pitch → 5-fret mapper (replaces librosa rule mapper) | Hybrid guitar pipeline |
| `section_classifier/`           | Verse/chorus/bridge section labeler | Chart sections |

## Performance

Held-out test set (from 3,299 human-authored Pro Drum charts):

| Component | Metric | Score |
|-----------|--------|-------|
| Drums onset detection (V14)            | Frame F1     | 93.9% |
| Drums lane classification (6-ensemble) | Per-onset F1 | 85.2% |

End-to-end vs ground-truth Clone Hero / YARG charts on an **in-envelope
benchmark** of 29 songs sampled from a 3,299-song held-out pool. Songs were
pre-screened with a single audio-feature gate (median Demucs `htdemucs_6s`
drum-stem RMS ≥ 0.018, 1 s windows at 22050 Hz mono). Eval is Expert
difficulty, ±100 ms tolerance, with a per-song global offset search
(±200 ms / 10 ms steps).

| Instrument | F1    | Precision | Recall |
|------------|-------|-----------|--------|
| Drums      | 83.8% | 82.4%     | 85.4%  |
| Guitar     | 65.1% | 74.5%     | 57.8%  |
| Bass       | 69.4% | 65.8%     | 73.4%  |
| Vocals     | 53.9% | 63.2%     | 47.0%  |

See the source repo's `benchmark_results.json` for per-song breakdown and
`scripts/eval_benchmark.py` for the harness.

## Usage

The checkpoints are loaded by the STRUM pipeline scripts. Clone the repo and
download the checkpoints into `checkpoints/` preserving the layout:

```bash
git clone https://github.com/opria123/strum
cd strum
python -m venv .venv && source .venv/bin/activate
pip install -e .

# Pull weights from the Hub
huggingface-cli download opria123/strum --local-dir checkpoints/ \
    --local-dir-use-symlinks False

# Run the full pipeline on a folder of audio files
python scripts/batch_pipeline.py /path/to/songs /path/to/charts
```

The pipeline expects this layout (mirrors the `drums/` and `guitar/`
subfolders here, just under `checkpoints/`):

```
checkpoints/
├── drums_v14/best.pt
├── drums_mc_onset/best.pt
├── drums_phase3/best.pt
├── drums_cymbal_onset/best_union_f1.pt
├── tom_refinement_demucs/best.pt
├── onset_classifier/best_f1.pt
├── onset_classifier_v4/best_f1.pt
├── onset_classifier_v6/best_f1.pt
├── onset_classifier_v12_clean/best_f1.pt
├── onset_classifier_v12c_community/best_f1.pt
├── onset_classifier_v15/best_f1.pt
├── onset_classifier_v16/best_f1.pt
├── onset_classifier_v17/best_f1.pt
├── guitar_v2/guitar_v2_onset/best.pt
├── fret_mapper_v4.pt
└── section_classifier/best.pt
```

A small reorganisation script `scripts/sync_from_hf.sh` in the source repo
handles the `drums/` → flat-checkpoints/ mapping.

## License

MIT. See the source repository for full attribution of the underlying
training data (Clone Hero / YARG community charters) and dependencies
(Demucs v4, librosa, OpenAI Whisper, Spotify Basic Pitch).