adelelsayed1991 commited on
Commit
4ef4eef
Β·
verified Β·
1 Parent(s): 1065131

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -3
README.md CHANGED
@@ -1,3 +1,181 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+ ---
7
+ language: en
8
+ tags:
9
+ - audio
10
+ - audio-classification
11
+ - respiratory-sounds
12
+ - healthcare
13
+ - medical
14
+ - hear
15
+ - vit
16
+ - lora
17
+ - pytorch
18
+ license: apache-2.0
19
+ datasets:
20
+ - SPRSound
21
+ metrics:
22
+ - accuracy
23
+ - f1
24
+ - roc_auc
25
+ base_model: google/hear-pytorch
26
+ pipeline_tag: audio-classification
27
+ ---
28
+
29
+ # HeAR-SPRSound: Respiratory Sound Abnormality Classifier
30
+
31
+ ## Model Summary
32
+
33
+ A fine-tuned respiratory sound classifier built on top of **Google's HeAR** (Health Acoustic Representations) foundation model. The model performs **binary classification** β€” distinguishing **normal** from **abnormal** respiratory sounds β€” and is trained on the **SPRSound** dataset spanning BioCAS challenge years 2022–2025.
34
+
35
+ The architecture combines the HeAR ViT backbone (fine-tuned with LoRA) with a **Gated Attention Pooling** layer that intelligently aggregates variable-length audio sequences chunk by chunk, followed by a two-layer MLP classifier.
36
+
37
+ ---
38
+
39
+ ## Architecture
40
+
41
+ ```
42
+ Audio Input (16 kHz WAV)
43
+ ↓
44
+ HeAR Preprocessing (2-second chunks, log-mel spectrograms [1 Γ— 192 Γ— 128])
45
+ ↓
46
+ HeAR ViT Encoder (google/hear-pytorch)
47
+ └─ LoRA adapters on Q & V projections in last 6 transformer blocks
48
+ ↓
49
+ Per-chunk CLS Embeddings [B Γ— T Γ— 512]
50
+ ↓
51
+ Gated Attention Pooling (length-masked softmax attention over chunks)
52
+ ↓
53
+ Pooled Representation [B Γ— 512]
54
+ ↓
55
+ MLP Classifier (512 β†’ 256 β†’ 2, GELU, Dropout 0.4)
56
+ ↓
57
+ Normal / Abnormal
58
+ ```
59
+
60
+ **Key components:**
61
+ - **Backbone**: `google/hear-pytorch` (frozen except LoRA layers + LayerNorms)
62
+ - **LoRA**: rank=16, alpha=16, dropout=0.3, applied to Q+V projections in last 6 blocks
63
+ - **Pooling**: Gated Attention Pool (dual-path tanh Γ— sigmoid gating, hidden dim 512)
64
+ - **Loss**: Focal Loss (Ξ³=2.0) with class-balanced sample weighting
65
+ - **Inference**: Per-class threshold optimization (one-vs-rest F1 on validation set)
66
+
67
+ ---
68
+
69
+ ## Training Details
70
+
71
+ | Hyperparameter | Value |
72
+ |---|---|
73
+ | Base model | `google/hear-pytorch` |
74
+ | Input sample rate | 16,000 Hz |
75
+ | Chunk size | 2 seconds (32,000 samples) |
76
+ | Max audio duration | 10 seconds (up to 5 chunks) |
77
+ | Optimizer | AdamW |
78
+ | Learning rate | 5e-5 |
79
+ | Weight decay | 0.2 |
80
+ | Warmup epochs | 10 |
81
+ | Max epochs | 100 |
82
+ | Batch size | 96 |
83
+ | Early stopping patience | 20 epochs |
84
+
85
+ ---
86
+
87
+ ## Dataset
88
+
89
+ **SPRSound** β€” multi-year BioCAS challenge respiratory auscultation dataset.
90
+
91
+ | Year | Split |
92
+ |---|---|
93
+ | BioCAS 2022 | Train + Inter/Intra test |
94
+ | BioCAS 2023 | Test |
95
+ | BioCAS 2024 | Test |
96
+ | BioCAS 2025 | Test |
97
+
98
+ All data was **re-split at the patient level** (70% train / 15% val / 15% test) to prevent data leakage. No patient appears in more than one split. Labels were consolidated to a binary scheme:
99
+
100
+ - **normal**: all event annotations are "Normal"
101
+ - **abnormal**: any non-normal respiratory event present (wheeze, crackle, rhonchus, etc.)
102
+
103
+ Class imbalance was addressed through `WeightedRandomSampler` and Focal Loss.
104
+
105
+ ---
106
+
107
+ ## Data Augmentation
108
+
109
+ A custom `PhoneLikeAugment` pipeline was applied during training (p=0.5) to simulate real-world acoustic variability:
110
+
111
+ - Random gain (βˆ’18 to +8 dB)
112
+ - Phone band-limiting (HP: 120–200 Hz, LP: 4–8 kHz)
113
+ - Fast echo / room simulation (10–80 ms delay taps)
114
+ - Colored noise addition (SNR 3–25 dB)
115
+ - Soft AGC / tanh compression
116
+ - Random time shift (Β±80 ms)
117
+ - Rare clipping (p=0.15)
118
+
119
+ ---
120
+
121
+ ## Usage
122
+
123
+ ```python
124
+ import torch
125
+ import torchaudio
126
+ from transformers import AutoModel
127
+ # Load model
128
+ model = AdaptiveRespiratoryModel(
129
+ num_classes=2,
130
+ dropout=0.4,
131
+ use_lora=True,
132
+ lora_r=16,
133
+ lora_alpha=16,
134
+ lora_dropout=0.3,
135
+ lora_last_n_blocks=6
136
+ )
137
+ checkpoint = torch.load("best_model.pth", map_location="cpu", weights_only=False)
138
+ model.load_state_dict(checkpoint["model"], strict=False)
139
+ model.eval()
140
+
141
+ # Audio must be 16 kHz, processed through HeAR's preprocess_audio
142
+ # into chunks of shape [T, 1, 192, 128]
143
+ ```
144
+
145
+ > ⚠️ Requires `google/hear-pytorch` and the [HEAR](https://github.com/Google-Health/hear) library for audio preprocessing.
146
+
147
+ ---
148
+
149
+ ## Limitations & Intended Use
150
+
151
+ - **Intended use**: Research and prototyping in respiratory sound analysis. **Not validated for clinical use.**
152
+ - The model was trained on auscultation recordings from SPRSound; performance may degrade on recordings from different stethoscope types, microphones, or patient populations.
153
+ - Binary classification only β€” does not distinguish between specific pathology types (e.g., wheeze vs. crackle).
154
+ - Threshold calibration was performed on the validation set; recalibration is recommended when deploying to new domains.
155
+
156
+ ---
157
+
158
+ ## Citation
159
+
160
+ If you use this model, please cite the SPRSound dataset and the HeAR foundation model:
161
+
162
+ ```bibtex
163
+ @misc{sprsound,
164
+ title = {SPRSound: Open-Source SJTU Paediatric Respiratory Sound Database},
165
+ year = {2022},
166
+ note = {BioCAS 2022–2025 challenge dataset}
167
+ }
168
+
169
+ @misc{hear2024,
170
+ title = {HeAR: Health Acoustic Representations},
171
+ author = {Google Health},
172
+ year = {2024},
173
+ url = {https://github.com/Google-Health/hear}
174
+ }
175
+ ```
176
+
177
+ ---
178
+
179
+ ## License
180
+
181
+ This model is released under the **Apache 2.0** license. The HeAR backbone model is subject to Google's original license terms. SPRSound data is subject to its own terms β€” please refer to the dataset authors.