beat-tracking-alex-20260110 / BASELINE3_IMPROVEMENTS.md
JacobLinCool's picture
Upload folder using huggingface_hub
64bf319 verified

Baseline3 improvements (beats + downbeats)

This document summarizes the changes that were made in exp/baseline3 relative to exp/baseline2 during this session, with an emphasis on improvements intended to increase beat/downbeat F1 and continuity while keeping the training/eval workflow consistent with baseline2.

Scope / goals

  • Keep the same overall pipeline as baseline2 (same dataset, same context window, same mel multi-view preprocessing, same peak-picking evaluation).
  • Add SE-inspired improvements to the model (baseline3) while preserving the baseline2 ResNet backbone structure.
  • Make training and TensorBoard curves comparable to baseline2.
  • Support faster iteration when needed (optional), but allow returning to baseline2-style “full” training defaults.

Model improvements (affects both beats + downbeats)

1) Extra SE-inspired gating (temporal excitation)

  • File: exp/baseline3/model.py
  • Added an additional SE-style gating mechanism that is time-dependent (a “temporal excitation” in addition to channel excitation).
  • The intent is to help the network emphasize temporally-salient patterns that correspond to rhythmic events, improving peak sharpness and reducing spurious activations.

2) SE block robustness

  • File: exp/baseline3/model.py
  • Made the SE hidden dimension robust for small channel counts (ensuring the intermediate dimension is never zero).

Data / sampling improvements (optional; applies to both beats + downbeats)

3) Track capping support (optional)

  • File: exp/baseline3/data.py
  • Added support for limiting the number of tracks used when building indices.
  • This was introduced for fast iteration runs (debugging / quick experiments). When not used, training uses the full dataset like baseline2.

4) Hard-negative sampling near events (optional)

  • File: exp/baseline3/data.py
  • Added optional “hard negatives” close to ground-truth frames:
    • For each beat/downbeat frame, add negative frames at offsets ±d for d=2..R.
    • Controlled by hard_neg_radius and hard_neg_fraction.
  • Rationale: random negatives are often too easy; near-event negatives help reduce double-peaks/jitter and can improve continuity.
  • Status: kept off by default when running in baseline2-style mode.

Training-loop improvements

5) Output directories fixed to avoid overwriting baseline2

  • File: exp/baseline3/train.py (and earlier in the session also baseline3 eval defaults)
  • Baseline3 outputs were adjusted to use baseline3-specific output directories so baseline2 artifacts aren’t overwritten.

6) Loss logging parity with baseline2

  • File: exp/baseline3/train.py
  • Baseline2 uses unweighted BCE (nn.BCELoss). Baseline3 introduced an optional weighted BCE objective for imbalance experiments.
  • A key issue was discovered: TensorBoard curves looked “worse” in baseline3 because it was logging weighted BCE as the main loss.
  • Fix:
    • train/batch_loss and train/epoch_loss are now unweighted BCE (baseline2-comparable).
    • If weighting is enabled, the optimized objective is logged separately as *_weighted.

7) Optional imbalance-aware objective (pos weighting)

  • File: exp/baseline3/train.py
  • Added an optional weighted BCE objective, controlled by --pos-weight.
  • Default is --pos-weight 0.0, which matches baseline2 behavior.

8) Optional gradient clipping

  • File: exp/baseline3/train.py
  • Added --grad-clip support to stabilize training when experimenting.
  • For baseline2-style mode, default was set back to disabled (--grad-clip 0.0).

9) Fast-iteration controls (optional)

  • File: exp/baseline3/train.py
  • Added optional caps for quicker experiments:
    • --max-train-tracks, --max-val-tracks
    • --max-train-steps, --max-val-steps, --max-steps-total
  • These are intended only for debugging/iteration. Baseline2-style training leaves them unset (0/unlimited).

10) Back to baseline2-style default training mode

  • File: exp/baseline3/train.py
  • Returned baseline3 defaults to match baseline2 training mode:
    • --epochs 3
    • --patience 5
    • objective defaults to unweighted BCE when --pos-weight 0.0
    • no grad clipping by default

Evaluation improvements

11) Mix-and-match beats and downbeats checkpoints

  • File: exp/baseline3/eval.py
  • Added support to evaluate using different model directories for beats vs downbeats:
    • --beats-model-dir
    • --downbeats-model-dir
  • This enables workflows like “new beats run + keep downbeats fixed”.

Beats-specific notes

  • All model/training/eval improvements above apply to beats.
  • A key gotcha found during quick experiments: some runs only saved the checkpoint under a final/ subfolder. When evaluating, using the correct folder matters.

Latest mixed eval result (beats improved)

Eval command used:

  • Beats: outputs/baseline3_b2mode_full3/beats
  • Downbeats: outputs/baseline3_smoketest/downbeats
  • Output: outputs/eval_mix_b3_b2modebeats_smoketestdownbeats

Key metrics (116 tracks):

  • Mean Beat Weighted F1: 0.3531
  • Beat continuity: CMLt 0.3567, AMLt 0.3607, CMLc 0.0603, AMLc 0.0624

Summary plot:

  • outputs/eval_mix_b3_b2modebeats_smoketestdownbeats/evaluation_summary.png

Downbeats-specific notes

  • Downbeats training uses the same dataset/indexing logic, model architecture, and preprocessing as beats.
  • The improvements (temporal excitation, loss logging parity, optional hard negatives, optional fast-iteration, mixed-checkpoint evaluation) all apply identically.
  • In the mixed eval above, downbeats were held fixed using the baseline3 smoketest checkpoint.

Repro commands

Full baseline2-style training (beats only)

uv run -m exp.baseline3.train --target beats --output-dir outputs/baseline3_b2mode_full3

Mixed evaluation (beats from a new run + downbeats from baseline3 smoketest)

uv run -m exp.baseline3.eval \
  --beats-model-dir outputs/baseline3_b2mode_full3/beats \
  --downbeats-model-dir outputs/baseline3_smoketest/downbeats \
  --output-dir outputs/eval_mix_b3_b2modebeats_smoketestdownbeats \
  --summary-plot

Known warnings

  • You may see repeated torchaudio warnings like:
    • “At least one mel filterbank has all zero values…”
  • This is produced by torchaudio mel filterbank construction for some parameter combinations and is not specific to baseline3.