BebopNet — Weimar Jazz Database (500K steps)

Trained checkpoints for BebopNet (Transformer-XL for melodic jazz improvisation), retrained from scratch on the Weimar Jazz Database.

Used by the comparison pipeline of the diploma research kudrmax/jazz-generation-research alongside CMT and MINGUS.

Test metrics

Final evaluation on the canonical cross-model test=40 (wjazzd_split.json), held out from both training (gradient steps) and best-checkpoint selection:

	combined ppl	pitch top1	pitch top3	pitch top5	duration top1	duration top3
`paper-default/` (this run)	46.60	32.25%	52.39%	67.96%	43.55%	75.76%
Madaghiele 2021 (paper retrain)	44.70	—	—	—	—	—

val_loss / val_ppl for best-checkpoint selection (on split.json[eval]=43):

	combined ppl	pitch top1	duration top1
`paper-default/`	45.89	33.81%	40.04%

Numbers reproduce paper baseline within ~4% — pipeline is correct.

Files

paper-default/
├── model.pt                       # final weights (step 500_000)
├── model_best.pt                  # best by val_loss — used for inference + final test eval
├── model_best_p_acc.pt            # best by pitch top-1 accuracy
├── model_best_d_acc.pt            # best by duration top-1 accuracy
├── args.json                      # model architecture (passed to MemTransformerLM)
├── converter_and_duration.pkl     # vocabulary; required to load any of the .pt
├── train_model.yml                # exact training config used
├── optimizer.pt                   # Ranger optimizer state (resume only)
└── train_state.json               # resume metadata: train_step, best_val_loss, ...

model_best.pt is what GeneratorBebopnet and BebopNet's authorial generate_from_xml.py expect to load. The *_best_*_acc.pt variants are saved by author's train.py automatically (different selection criteria) — kept for completeness.

Training setup

Architecture: Transformer-XL (MemTransformerLM). 4 layers / 8 heads / d_model=400 / d_inner=1028 / mem_len=64 / tgt_len=64.
Embeddings: pitch=64, duration=64, offset=16.
Conditioning: chord (root+scale+chord-pitches+kind) per token, no chord_bias flag.
Dataset: Weimar Jazz Database. Routing through wjazzd_split_prep.py:
- split.json[train] (344 files) → train.pkl (gradient steps)
- split.json[eval] (43) → val.pkl (best-checkpoint selection)
- split.json[test] (40) → test_canonical/ (post-train evaluate only)
Training: 500 000 steps, batch 32, BPTT 64, Ranger + cosine LR, dropout 0.3, lr=1e-3.
Compute: ~3.9 hours on a single Colab A100 (14 062 sec).

Download

pip install -U huggingface_hub

# Final / best weights for inference
hf download maxkudryashov/bebopnet-1 \
  paper-default/model_best.pt \
  paper-default/converter_and_duration.pkl \
  paper-default/args.json \
  paper-default/train_model.yml \
  --local-dir result

# Resume checkpoints (optional — only if continuing training)
hf download maxkudryashov/bebopnet-1 \
  paper-default/optimizer.pt \
  paper-default/train_state.json \
  paper-default/model.pt \
  --local-dir result

Reproducibility

Trained via the Colab notebook at models/bebopnet-code/training/colab_train.ipynb in our BebopNet fork. Notebook is idempotent: rerunning resumes from the last train_state.json checkpoint via --restart --restart_dir <work_dir>.

Per-step epochs.csv, summary.json, and log.txt are committed in results/bebopnet-wjazzd-500K/.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support