File size: 5,374 Bytes
b2dc23c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
653e421
b2dc23c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: other
license_name: nscl-a2sb-and-polyform-nc
license_link: https://raw.githubusercontent.com/NVIDIA/diffusion-audio-restoration/refs/heads/main/LICENSE
tags:
  - audio
  - audio-restoration
  - schrodinger-bridge
  - diffusion
  - festival-audio
  - non-commercial
library_name: pytorch
pipeline_tag: audio-to-audio
---

# Soundboard

Schrödinger Bridge denoiser fine-tuned for musical recording audio restoration —
recovers a soundboard-style mix from heavily-corrupted audience recordings
(room reverb + audience-mic blend + lossy codec artifacts).

Fine-tuned from NVIDIA's
[A2SB](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)
(`twosplit_0.5_1.0` split) on a synthetic-corruption training pipeline driven
by **profile-based augmentation** — corruption parameters are calibrated
from real (clean, festival-recording) pairs and sampled at training time
from the recovered distribution. See [Locutius](https://github.com/protodotdesign/locutius)
for the full corruption chain, profiling, and training scaffold.

## Quick facts

| | |
|---|---|
| Architecture | AttnUNetF (565.5M params) |
| Audio format | 44.1 kHz, 2-channel, 32-bit float |
| Segment length | 130560 samples (2.96 s) |
| STFT | n_fft=2048, hop=512, window=hann |
| Representation | 3-channel `[mag^0.25, cos(phase), sin(phase)]` |
| Trained at step | 50,000 |
| Base checkpoint | NVIDIA A2SB `twosplit_0.5_1.0` |
| Checkpoint size | 2.1 GB |
| Diffusion | Schrödinger Bridge, β_max=1.0 |

## Usage

Load with the [Locutius](https://github.com/protodotdesign/locutius)
training package:

```python
import torch
from huggingface_hub import hf_hub_download
from locutius_train.config import TrainConfig
from locutius_train.network import AttnUNetF, SinusoidalTemporalEmbedding
from locutius_train.diffusion import Diffusion
from locutius_train.representation import WaveformToInput, InputToWaveform
from locutius_train.restore import restore_spectrogram

ckpt_path = hf_hub_download(repo_id="protodotdesign/Soundboard", filename="model.pt")
sd = torch.load(ckpt_path, map_location="cuda", weights_only=False)

cfg = TrainConfig()
model = AttnUNetF(
    n_updown_levels=cfg.model.n_updown_levels,
    in_channels=cfg.model.in_channels,
    hidden_channels=list(cfg.model.hidden_channels),
    out_channels=cfg.model.out_channels,
    emb_channels=cfg.diffusion.n_timestep_channels,
    band_embedding_dim=cfg.model.band_embedding_dim,
    n_attn_heads=cfg.model.n_attn_heads,
    attention_levels=list(cfg.model.attention_levels),
    use_attn_input_norm=cfg.model.use_attn_input_norm,
    num_res_blocks=cfg.model.num_res_blocks,
).to("cuda").eval()
model.load_state_dict(sd["model"])
```

See `restore.py` in the Locutius repo for a complete CLI that takes a
clean source, applies the calibrated festival-corruption profile, and
runs the reverse Schrödinger Bridge to produce a restored output.

## Calibrated corruption profile

This model was trained against a single calibrated profile recovered
from a real (studio FLAC, festival M4A) pair via per-kick local
Wiener deconvolution. The profile is bundled in `profile.json`:

```json
{
"name": "edc_festival",
"ir_path": "../impulses/EchoThief/Brutalism/San Diego Supercomputer Center Outdoor Patio California.wav",
"delay_ms_range": [
15.0,
25.0
],
"studio_gain_range": [
0.6,
0.7
],
"room_gain_range": [
0.55,
0.65
]
}
```

Each training-step corruption draws fresh values from these ranges,
so the model has been exposed to ~50,000 distinct delay/blend
combinations within the same venue character.

## Training data

Trained on a focused subset of electronic music FLACs. **No festival
recordings or other licensed audio were stored or distributed** —
only the studio source material was used; festival-corrupted versions
were synthesized on-the-fly from the calibrated profile during each
training step.

## Limitations

- **Single profile**: trained against one calibrated venue (`edc_festival`).
  Performance on festival recordings from very different venues / mix
  chains will degrade.
- **Electronic music bias**: training set was EDM-heavy. Restoration
  quality on rock, classical, or vocal-led material may be uneven.
- **No crowd-noise model**: the calibrated profile didn't include
  additive crowd-noise (no real crowd recordings were available
  during calibration). Recordings with heavy crowd vocals may have
  residual artifacts.
- **Non-commercial use only** — see the license below.

## License

Dual non-commercial license:

- [NVIDIA Source Code License for A2SB](LICENSE.NSCL-A2SB) (the upstream
  license inherited from the A2SB base checkpoint)
- [PolyForm Noncommercial 1.0.0](LICENSE.PolyForm-NC) (additional terms
  on top, source-availability + patent retaliation)

You must comply with **both** licenses. Use is restricted to research
and evaluation only — no commercial use is permitted. See
[LICENSING.md](https://github.com/protodotdesign/locutius/blob/main/LICENSING.md)
for the full plain-English breakdown.

## Citation

If you use this model in research, please cite the upstream A2SB paper
and reference this fine-tune:

```bibtex
@misc{soundboard,
  title={Soundboard: festival audio restoration via profile-calibrated Schrödinger Bridge fine-tuning},
  author={Locutius},
  year={2026},
  howpublished={\url{https://huggingface.co/protodotdesign/Soundboard}},
}
```