Annkisluk commited on
Commit
9361011
·
verified ·
1 Parent(s): f6486b1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - speech-enhancement
5
+ - noise-suppression
6
+ - incremental-learning
7
+ - sepformer
8
+ - pytorch
9
+ license: mit
10
+ ---
11
+
12
+ # LNA — Learning Noise Adapters for Incremental Speech Enhancement
13
+
14
+ Reimplementation of the paper **"Learning Noise Adapters for Incremental Speech Enhancement"**.
15
+ Code: [annkisluk/speech](https://github.com/annkisluk/speech)
16
+
17
+ ---
18
+
19
+ ## Files
20
+
21
+ | File | Description |
22
+ |------|-------------|
23
+ | `session0_pretrain/lna_pretrained.pt` | Pretrained LNA backbone (40 epochs, 10 NOISEX-92 noise types) |
24
+ | `session1_incremental/lna_session1.pt` | After incremental session 1 (alarm noise) |
25
+ | `session2_incremental/lna_session2.pt` | After incremental session 2 (cough noise) |
26
+ | `session3_incremental/lna_session3.pt` | After incremental session 3 (destroyerops noise) |
27
+ | `session4_incremental/lna_session4.pt` | After incremental session 4 (machinegun noise) |
28
+ | `config.json` | Model architecture and training hyperparameters |
29
+
30
+ ---
31
+
32
+ ## Architecture
33
+
34
+ - **Backbone**: SepFormer (N=256, L=16, 2 DPT blocks × 8 layers, 8 heads, d_ffn=1024)
35
+ - **Adapters**: Noise adapters with bottleneck dim Ĉ=1 (FFL + MHA per transformer layer)
36
+ - **Parameters**: ~25.6M total (~98k per new noise adapter)
37
+ - **Input**: 8 kHz mono waveform
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ ```python
44
+ import torch
45
+ from huggingface_hub import hf_hub_download
46
+ import sys, os
47
+
48
+ # Clone the code repo
49
+ # git clone https://github.com/annkisluk/speech && cd speech
50
+
51
+ from src.models.lna_model import LNAModel
52
+
53
+ # Download checkpoint
54
+ ckpt_path = hf_hub_download(repo_id="Annkisluk/lna-speech",
55
+ filename="session0_pretrain/lna_pretrained.pt")
56
+
57
+ # Build model
58
+ model = LNAModel(
59
+ n_basis=256, kernel_size=16, num_layers=8, num_blocks=2,
60
+ nhead=8, dim_feedforward=1024, dropout=0.1,
61
+ adapter_bottleneck_dim=1, max_sessions=6
62
+ )
63
+ model.load_checkpoint(ckpt_path)
64
+ model.eval()
65
+
66
+ # Enhance speech (noisy: [1, T] tensor at 8 kHz)
67
+ noisy = torch.randn(1, 32000) # 4-second example
68
+ with torch.no_grad():
69
+ enhanced = model(noisy, session_id=None) # session_id=None → base model only
70
+ ```
71
+
72
+ For incremental sessions, load `lna_session{N}.pt` and add adapters first:
73
+
74
+ ```python
75
+ # Load session 4 model (knows all 4 noise domains)
76
+ ckpt_path = hf_hub_download(repo_id="Annkisluk/lna-speech",
77
+ filename="session4_incremental/lna_session4.pt")
78
+
79
+ model = LNAModel(n_basis=256, kernel_size=16, num_layers=8, num_blocks=2,
80
+ nhead=8, dim_feedforward=1024, dropout=0.1,
81
+ adapter_bottleneck_dim=1, max_sessions=6)
82
+ for sid in range(1, 5):
83
+ model.add_new_session(session_id=sid, bottleneck_dim=1)
84
+ model.load_checkpoint(ckpt_path)
85
+ model.eval()
86
+
87
+ with torch.no_grad():
88
+ enhanced = model(noisy, session_id=2) # route to session 2 adapter (cough)
89
+ ```
90
+
91
+ ---
92
+
93
+ ## Training Data
94
+
95
+ | Session | Noise types | Train samples | Test samples |
96
+ |---------|-------------|---------------|--------------|
97
+ | 0 (pretrain) | 10 NOISEX-92 | 40,400 | 6,510 |
98
+ | 1–4 (incremental) | 1 new noise each | 1,212/session | 651/session |
99
+
100
+ SNR range: {−5, 0, 5, 10} dB. Sample rate: 8 kHz. Speech: LibriSpeech train-clean-100.