Update methodology.md

#3
by mandarmgd-03 - opened
Files changed (1) hide show
  1. methodology.md +30 -23
methodology.md CHANGED
@@ -1,4 +1,4 @@
1
- # Methodology
2
 
3
  ### 1) Problem Framing
4
  We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
@@ -14,12 +14,19 @@ This decouples anomaly detection from mode identification and reduces class conf
14
 
15
  - **Source:** Short `.wav` recordings of washing-machine cycles (mono).
16
  - **Label Taxonomy:**
17
- 00 - Abnormal/
18
- β”œβ”€ Bearing noise/
19
- └─ Dehydration mode noise/
20
- 01 - Normal/
21
- β”œβ”€ Wash mode/
22
- └─ Spin mode/
 
 
 
 
 
 
 
23
 
24
  - **Granularity:** Each file is a single clip labeled at the folder level.
25
 
@@ -31,9 +38,9 @@ This decouples anomaly detection from mode identification and reduces class conf
31
 
32
  - **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
33
  - **Transform:**
34
- 1. Load mono audio: \( y \in \mathbb{R}^{T} \)
35
- 2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
36
- 3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
37
  - **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224Γ—224` target size.
38
  - **Normalization:** Divide pixel values by `255.0` at model input.
39
 
@@ -125,33 +132,33 @@ return {
125
  ```
126
 
127
  ### 8) Evaluation
128
- Per-stage metrics: accuracy, macro-F1, confusion matrices.
129
 
130
- End-to-end metric: hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
131
 
132
- Calibration: reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
133
 
134
- Robustness checks: background noise levels, recording device variance, different drum loads.
135
 
136
- Leakage control: ensure clips from the same recording session are in one split only.
137
 
138
  ### 9) Deployment Considerations
139
- App: Gradio front-end calls the same spectrogram + inference pipeline.
140
 
141
- Artifacts: saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
142
 
143
- Reproducibility: fixed audio/spectrogram params and consistent class order.
144
 
145
- Latency: spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
146
 
147
  ### 10) Limitations & Future Work
148
- Domain shift: different washers/rooms/mics can reduce accuracy β†’ consider domain adaptation / augmentation.
149
 
150
- Simple CNN: replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
151
 
152
- Sequence modeling: incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
153
 
154
- On-device: quantize models (TFLite) for edge deployment.
155
 
156
 
157
 
 
1
+ # Hierarchical Audio Classification for Washing Machine Sound Anomaly Detection - Methodology
2
 
3
  ### 1) Problem Framing
4
  We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
 
14
 
15
  - **Source:** Short `.wav` recordings of washing-machine cycles (mono).
16
  - **Label Taxonomy:**
17
+
18
+ ```bash
19
+ 00-Abnormal/
20
+ β”œβ”€ 00-1 - Background noise/
21
+ β”œβ”€ 00-2 - Dehydration mode noise/
22
+ └─ 00-3 - Wash mode noise/
23
+
24
+ 01-Normal/
25
+ β”œβ”€ 01-1 - Background noise/
26
+ β”œβ”€ 01-2 - Dehydration mode noise/
27
+ └─ 01-3 - Wash mode noise/
28
+ ```
29
+
30
 
31
  - **Granularity:** Each file is a single clip labeled at the folder level.
32
 
 
38
 
39
  - **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
40
  - **Transform:**
41
+ 1. Load mono audio: \( y \in \mathbb{R}^{T} \)
42
+ 2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
43
+ 3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
44
  - **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224Γ—224` target size.
45
  - **Normalization:** Divide pixel values by `255.0` at model input.
46
 
 
132
  ```
133
 
134
  ### 8) Evaluation
135
+ - **Per-stage metrics:** accuracy, macro-F1, confusion matrices.
136
 
137
+ - **End-to-end metric:** hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
138
 
139
+ - **Calibration:** reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
140
 
141
+ - **Robustness checks:** background noise levels, recording device variance, different drum loads.
142
 
143
+ - **Leakage control:** ensure clips from the same recording session are in one split only.
144
 
145
  ### 9) Deployment Considerations
146
+ - **App:** Gradio front-end calls the same spectrogram + inference pipeline.
147
 
148
+ - **Artifacts:** saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
149
 
150
+ - **Reproducibility:** fixed audio/spectrogram params and consistent class order.
151
 
152
+ - **Latency:** spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
153
 
154
  ### 10) Limitations & Future Work
155
+ - **Domain shift:** different washers/rooms/mics can reduce accuracy β†’ consider domain adaptation / augmentation.
156
 
157
+ - **Simple CNN:** replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
158
 
159
+ - **Sequence modeling:** incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
160
 
161
+ - **On-device:** quantize models (TFLite) for edge deployment.
162
 
163
 
164