Baseline3 improvements (beats + downbeats)
This document summarizes the changes that were made in exp/baseline3 relative to exp/baseline2 during this session, with an emphasis on improvements intended to increase beat/downbeat F1 and continuity while keeping the training/eval workflow consistent with baseline2.
Scope / goals
- Keep the same overall pipeline as baseline2 (same dataset, same context window, same mel multi-view preprocessing, same peak-picking evaluation).
- Add SE-inspired improvements to the model (baseline3) while preserving the baseline2 ResNet backbone structure.
- Make training and TensorBoard curves comparable to baseline2.
- Support faster iteration when needed (optional), but allow returning to baseline2-style “full” training defaults.
Model improvements (affects both beats + downbeats)
1) Extra SE-inspired gating (temporal excitation)
- File:
exp/baseline3/model.py - Added an additional SE-style gating mechanism that is time-dependent (a “temporal excitation” in addition to channel excitation).
- The intent is to help the network emphasize temporally-salient patterns that correspond to rhythmic events, improving peak sharpness and reducing spurious activations.
2) SE block robustness
- File:
exp/baseline3/model.py - Made the SE hidden dimension robust for small channel counts (ensuring the intermediate dimension is never zero).
Data / sampling improvements (optional; applies to both beats + downbeats)
3) Track capping support (optional)
- File:
exp/baseline3/data.py - Added support for limiting the number of tracks used when building indices.
- This was introduced for fast iteration runs (debugging / quick experiments). When not used, training uses the full dataset like baseline2.
4) Hard-negative sampling near events (optional)
- File:
exp/baseline3/data.py - Added optional “hard negatives” close to ground-truth frames:
- For each beat/downbeat frame, add negative frames at offsets ±d for d=2..R.
- Controlled by
hard_neg_radiusandhard_neg_fraction.
- Rationale: random negatives are often too easy; near-event negatives help reduce double-peaks/jitter and can improve continuity.
- Status: kept off by default when running in baseline2-style mode.
Training-loop improvements
5) Output directories fixed to avoid overwriting baseline2
- File:
exp/baseline3/train.py(and earlier in the session also baseline3 eval defaults) - Baseline3 outputs were adjusted to use baseline3-specific output directories so baseline2 artifacts aren’t overwritten.
6) Loss logging parity with baseline2
- File:
exp/baseline3/train.py - Baseline2 uses unweighted BCE (
nn.BCELoss). Baseline3 introduced an optional weighted BCE objective for imbalance experiments. - A key issue was discovered: TensorBoard curves looked “worse” in baseline3 because it was logging weighted BCE as the main loss.
- Fix:
train/batch_lossandtrain/epoch_lossare now unweighted BCE (baseline2-comparable).- If weighting is enabled, the optimized objective is logged separately as
*_weighted.
7) Optional imbalance-aware objective (pos weighting)
- File:
exp/baseline3/train.py - Added an optional weighted BCE objective, controlled by
--pos-weight. - Default is
--pos-weight 0.0, which matches baseline2 behavior.
8) Optional gradient clipping
- File:
exp/baseline3/train.py - Added
--grad-clipsupport to stabilize training when experimenting. - For baseline2-style mode, default was set back to disabled (
--grad-clip 0.0).
9) Fast-iteration controls (optional)
- File:
exp/baseline3/train.py - Added optional caps for quicker experiments:
--max-train-tracks,--max-val-tracks--max-train-steps,--max-val-steps,--max-steps-total
- These are intended only for debugging/iteration. Baseline2-style training leaves them unset (0/unlimited).
10) Back to baseline2-style default training mode
- File:
exp/baseline3/train.py - Returned baseline3 defaults to match baseline2 training mode:
--epochs 3--patience 5- objective defaults to unweighted BCE when
--pos-weight 0.0 - no grad clipping by default
Evaluation improvements
11) Mix-and-match beats and downbeats checkpoints
- File:
exp/baseline3/eval.py - Added support to evaluate using different model directories for beats vs downbeats:
--beats-model-dir--downbeats-model-dir
- This enables workflows like “new beats run + keep downbeats fixed”.
Beats-specific notes
- All model/training/eval improvements above apply to beats.
- A key gotcha found during quick experiments: some runs only saved the checkpoint under a
final/subfolder. When evaluating, using the correct folder matters.
Latest mixed eval result (beats improved)
Eval command used:
- Beats:
outputs/baseline3_b2mode_full3/beats - Downbeats:
outputs/baseline3_smoketest/downbeats - Output:
outputs/eval_mix_b3_b2modebeats_smoketestdownbeats
Key metrics (116 tracks):
- Mean Beat Weighted F1: 0.3531
- Beat continuity: CMLt 0.3567, AMLt 0.3607, CMLc 0.0603, AMLc 0.0624
Summary plot:
outputs/eval_mix_b3_b2modebeats_smoketestdownbeats/evaluation_summary.png
Downbeats-specific notes
- Downbeats training uses the same dataset/indexing logic, model architecture, and preprocessing as beats.
- The improvements (temporal excitation, loss logging parity, optional hard negatives, optional fast-iteration, mixed-checkpoint evaluation) all apply identically.
- In the mixed eval above, downbeats were held fixed using the baseline3 smoketest checkpoint.
Repro commands
Full baseline2-style training (beats only)
uv run -m exp.baseline3.train --target beats --output-dir outputs/baseline3_b2mode_full3
Mixed evaluation (beats from a new run + downbeats from baseline3 smoketest)
uv run -m exp.baseline3.eval \
--beats-model-dir outputs/baseline3_b2mode_full3/beats \
--downbeats-model-dir outputs/baseline3_smoketest/downbeats \
--output-dir outputs/eval_mix_b3_b2modebeats_smoketestdownbeats \
--summary-plot
Known warnings
- You may see repeated torchaudio warnings like:
- “At least one mel filterbank has all zero values…”
- This is produced by
torchaudiomel filterbank construction for some parameter combinations and is not specific to baseline3.