Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
tags:
|
| 4 |
+
- speech-enhancement
|
| 5 |
+
- noise-suppression
|
| 6 |
+
- incremental-learning
|
| 7 |
+
- sepformer
|
| 8 |
+
- pytorch
|
| 9 |
+
license: mit
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# LNA — Learning Noise Adapters for Incremental Speech Enhancement
|
| 13 |
+
|
| 14 |
+
Reimplementation of the paper **"Learning Noise Adapters for Incremental Speech Enhancement"**.
|
| 15 |
+
Code: [annkisluk/speech](https://github.com/annkisluk/speech)
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Files
|
| 20 |
+
|
| 21 |
+
| File | Description |
|
| 22 |
+
|------|-------------|
|
| 23 |
+
| `session0_pretrain/lna_pretrained.pt` | Pretrained LNA backbone (40 epochs, 10 NOISEX-92 noise types) |
|
| 24 |
+
| `session1_incremental/lna_session1.pt` | After incremental session 1 (alarm noise) |
|
| 25 |
+
| `session2_incremental/lna_session2.pt` | After incremental session 2 (cough noise) |
|
| 26 |
+
| `session3_incremental/lna_session3.pt` | After incremental session 3 (destroyerops noise) |
|
| 27 |
+
| `session4_incremental/lna_session4.pt` | After incremental session 4 (machinegun noise) |
|
| 28 |
+
| `config.json` | Model architecture and training hyperparameters |
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## Architecture
|
| 33 |
+
|
| 34 |
+
- **Backbone**: SepFormer (N=256, L=16, 2 DPT blocks × 8 layers, 8 heads, d_ffn=1024)
|
| 35 |
+
- **Adapters**: Noise adapters with bottleneck dim Ĉ=1 (FFL + MHA per transformer layer)
|
| 36 |
+
- **Parameters**: ~25.6M total (~98k per new noise adapter)
|
| 37 |
+
- **Input**: 8 kHz mono waveform
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Usage
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
import torch
|
| 45 |
+
from huggingface_hub import hf_hub_download
|
| 46 |
+
import sys, os
|
| 47 |
+
|
| 48 |
+
# Clone the code repo
|
| 49 |
+
# git clone https://github.com/annkisluk/speech && cd speech
|
| 50 |
+
|
| 51 |
+
from src.models.lna_model import LNAModel
|
| 52 |
+
|
| 53 |
+
# Download checkpoint
|
| 54 |
+
ckpt_path = hf_hub_download(repo_id="Annkisluk/lna-speech",
|
| 55 |
+
filename="session0_pretrain/lna_pretrained.pt")
|
| 56 |
+
|
| 57 |
+
# Build model
|
| 58 |
+
model = LNAModel(
|
| 59 |
+
n_basis=256, kernel_size=16, num_layers=8, num_blocks=2,
|
| 60 |
+
nhead=8, dim_feedforward=1024, dropout=0.1,
|
| 61 |
+
adapter_bottleneck_dim=1, max_sessions=6
|
| 62 |
+
)
|
| 63 |
+
model.load_checkpoint(ckpt_path)
|
| 64 |
+
model.eval()
|
| 65 |
+
|
| 66 |
+
# Enhance speech (noisy: [1, T] tensor at 8 kHz)
|
| 67 |
+
noisy = torch.randn(1, 32000) # 4-second example
|
| 68 |
+
with torch.no_grad():
|
| 69 |
+
enhanced = model(noisy, session_id=None) # session_id=None → base model only
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
For incremental sessions, load `lna_session{N}.pt` and add adapters first:
|
| 73 |
+
|
| 74 |
+
```python
|
| 75 |
+
# Load session 4 model (knows all 4 noise domains)
|
| 76 |
+
ckpt_path = hf_hub_download(repo_id="Annkisluk/lna-speech",
|
| 77 |
+
filename="session4_incremental/lna_session4.pt")
|
| 78 |
+
|
| 79 |
+
model = LNAModel(n_basis=256, kernel_size=16, num_layers=8, num_blocks=2,
|
| 80 |
+
nhead=8, dim_feedforward=1024, dropout=0.1,
|
| 81 |
+
adapter_bottleneck_dim=1, max_sessions=6)
|
| 82 |
+
for sid in range(1, 5):
|
| 83 |
+
model.add_new_session(session_id=sid, bottleneck_dim=1)
|
| 84 |
+
model.load_checkpoint(ckpt_path)
|
| 85 |
+
model.eval()
|
| 86 |
+
|
| 87 |
+
with torch.no_grad():
|
| 88 |
+
enhanced = model(noisy, session_id=2) # route to session 2 adapter (cough)
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
---
|
| 92 |
+
|
| 93 |
+
## Training Data
|
| 94 |
+
|
| 95 |
+
| Session | Noise types | Train samples | Test samples |
|
| 96 |
+
|---------|-------------|---------------|--------------|
|
| 97 |
+
| 0 (pretrain) | 10 NOISEX-92 | 40,400 | 6,510 |
|
| 98 |
+
| 1–4 (incremental) | 1 new noise each | 1,212/session | 651/session |
|
| 99 |
+
|
| 100 |
+
SNR range: {−5, 0, 5, 10} dB. Sample rate: 8 kHz. Speech: LibriSpeech train-clean-100.
|