Spaces:
Sleeping
Sleeping
Update methodology.md
#3
by mandarmgd-03 - opened
- methodology.md +30 -23
methodology.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
# Methodology
|
| 2 |
|
| 3 |
### 1) Problem Framing
|
| 4 |
We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
|
|
@@ -14,12 +14,19 @@ This decouples anomaly detection from mode identification and reduces class conf
|
|
| 14 |
|
| 15 |
- **Source:** Short `.wav` recordings of washing-machine cycles (mono).
|
| 16 |
- **Label Taxonomy:**
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
ββ
|
| 22 |
-
ββ
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
- **Granularity:** Each file is a single clip labeled at the folder level.
|
| 25 |
|
|
@@ -31,9 +38,9 @@ This decouples anomaly detection from mode identification and reduces class conf
|
|
| 31 |
|
| 32 |
- **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
|
| 33 |
- **Transform:**
|
| 34 |
-
1. Load mono audio: \( y \in \mathbb{R}^{T} \)
|
| 35 |
-
2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
|
| 36 |
-
3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
|
| 37 |
- **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224Γ224` target size.
|
| 38 |
- **Normalization:** Divide pixel values by `255.0` at model input.
|
| 39 |
|
|
@@ -125,33 +132,33 @@ return {
|
|
| 125 |
```
|
| 126 |
|
| 127 |
### 8) Evaluation
|
| 128 |
-
Per-stage metrics: accuracy, macro-F1, confusion matrices.
|
| 129 |
|
| 130 |
-
End-to-end metric: hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
|
| 131 |
|
| 132 |
-
Calibration: reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
|
| 133 |
|
| 134 |
-
Robustness checks: background noise levels, recording device variance, different drum loads.
|
| 135 |
|
| 136 |
-
Leakage control: ensure clips from the same recording session are in one split only.
|
| 137 |
|
| 138 |
### 9) Deployment Considerations
|
| 139 |
-
App: Gradio front-end calls the same spectrogram + inference pipeline.
|
| 140 |
|
| 141 |
-
Artifacts: saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
|
| 142 |
|
| 143 |
-
Reproducibility: fixed audio/spectrogram params and consistent class order.
|
| 144 |
|
| 145 |
-
Latency: spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
|
| 146 |
|
| 147 |
### 10) Limitations & Future Work
|
| 148 |
-
Domain shift: different washers/rooms/mics can reduce accuracy β consider domain adaptation / augmentation.
|
| 149 |
|
| 150 |
-
Simple CNN: replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
|
| 151 |
|
| 152 |
-
Sequence modeling: incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
|
| 153 |
|
| 154 |
-
On-device: quantize models (TFLite) for edge deployment.
|
| 155 |
|
| 156 |
|
| 157 |
|
|
|
|
| 1 |
+
# Hierarchical Audio Classification for Washing Machine Sound Anomaly Detection - Methodology
|
| 2 |
|
| 3 |
### 1) Problem Framing
|
| 4 |
We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
|
|
|
|
| 14 |
|
| 15 |
- **Source:** Short `.wav` recordings of washing-machine cycles (mono).
|
| 16 |
- **Label Taxonomy:**
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
00-Abnormal/
|
| 20 |
+
ββ 00-1 - Background noise/
|
| 21 |
+
ββ 00-2 - Dehydration mode noise/
|
| 22 |
+
ββ 00-3 - Wash mode noise/
|
| 23 |
+
|
| 24 |
+
01-Normal/
|
| 25 |
+
ββ 01-1 - Background noise/
|
| 26 |
+
ββ 01-2 - Dehydration mode noise/
|
| 27 |
+
ββ 01-3 - Wash mode noise/
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
|
| 31 |
- **Granularity:** Each file is a single clip labeled at the folder level.
|
| 32 |
|
|
|
|
| 38 |
|
| 39 |
- **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
|
| 40 |
- **Transform:**
|
| 41 |
+
1. Load mono audio: \( y \in \mathbb{R}^{T} \)
|
| 42 |
+
2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
|
| 43 |
+
3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
|
| 44 |
- **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224Γ224` target size.
|
| 45 |
- **Normalization:** Divide pixel values by `255.0` at model input.
|
| 46 |
|
|
|
|
| 132 |
```
|
| 133 |
|
| 134 |
### 8) Evaluation
|
| 135 |
+
- **Per-stage metrics:** accuracy, macro-F1, confusion matrices.
|
| 136 |
|
| 137 |
+
- **End-to-end metric:** hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
|
| 138 |
|
| 139 |
+
- **Calibration:** reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
|
| 140 |
|
| 141 |
+
- **Robustness checks:** background noise levels, recording device variance, different drum loads.
|
| 142 |
|
| 143 |
+
- **Leakage control:** ensure clips from the same recording session are in one split only.
|
| 144 |
|
| 145 |
### 9) Deployment Considerations
|
| 146 |
+
- **App:** Gradio front-end calls the same spectrogram + inference pipeline.
|
| 147 |
|
| 148 |
+
- **Artifacts:** saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
|
| 149 |
|
| 150 |
+
- **Reproducibility:** fixed audio/spectrogram params and consistent class order.
|
| 151 |
|
| 152 |
+
- **Latency:** spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
|
| 153 |
|
| 154 |
### 10) Limitations & Future Work
|
| 155 |
+
- **Domain shift:** different washers/rooms/mics can reduce accuracy β consider domain adaptation / augmentation.
|
| 156 |
|
| 157 |
+
- **Simple CNN:** replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
|
| 158 |
|
| 159 |
+
- **Sequence modeling:** incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
|
| 160 |
|
| 161 |
+
- **On-device:** quantize models (TFLite) for edge deployment.
|
| 162 |
|
| 163 |
|
| 164 |
|