Spaces:

Anvit25
/

new_audio

Sleeping

App Files Files Community

Update methodology.md

by mandarmgd-03 - opened Sep 29, 2025

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+30

-23

Files changed (1) hide show

methodology.md +30 -23

methodology.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Methodology
 ### 1) Problem Framing
 We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
@@ -14,12 +14,19 @@ This decouples anomaly detection from mode identification and reduces class conf
 - **Source:** Short `.wav` recordings of washing-machine cycles (mono).
 - **Label Taxonomy:**
-00 - Abnormal/
-├─ Bearing noise/
-└─ Dehydration mode noise/
-01 - Normal/
-├─ Wash mode/
-└─ Spin mode/
 - **Granularity:** Each file is a single clip labeled at the folder level.
@@ -31,9 +38,9 @@ This decouples anomaly detection from mode identification and reduces class conf
 - **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
 - **Transform:**
-1. Load mono audio: \( y \in \mathbb{R}^{T} \)
-2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
-3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
 - **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224×224` target size.
 - **Normalization:** Divide pixel values by `255.0` at model input.
@@ -125,33 +132,33 @@ return {
 ```
 ### 8) Evaluation
-Per-stage metrics: accuracy, macro-F1, confusion matrices.
-End-to-end metric: hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
-Calibration: reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
-Robustness checks: background noise levels, recording device variance, different drum loads.
-Leakage control: ensure clips from the same recording session are in one split only.
 ### 9) Deployment Considerations
-App: Gradio front-end calls the same spectrogram + inference pipeline.
-Artifacts: saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
-Reproducibility: fixed audio/spectrogram params and consistent class order.
-Latency: spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
 ### 10) Limitations & Future Work
-Domain shift: different washers/rooms/mics can reduce accuracy → consider domain adaptation / augmentation.
-Simple CNN: replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
-Sequence modeling: incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
-On-device: quantize models (TFLite) for edge deployment.

+# Hierarchical Audio Classification for Washing Machine Sound Anomaly Detection - Methodology
 ### 1) Problem Framing
 We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
 - **Source:** Short `.wav` recordings of washing-machine cycles (mono).
 - **Label Taxonomy:**
+```bash
+00-Abnormal/
+├─ 00-1 - Background noise/
+├─ 00-2 - Dehydration mode noise/
+└─ 00-3 - Wash mode noise/
+01-Normal/
+├─ 01-1 - Background noise/
+├─ 01-2 - Dehydration mode noise/
+└─ 01-3 - Wash mode noise/
+```
 - **Granularity:** Each file is a single clip labeled at the folder level.
 - **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
 - **Transform:**
+  1. Load mono audio: \( y \in \mathbb{R}^{T} \)
+  2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
+  3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
 - **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224×224` target size.
 - **Normalization:** Divide pixel values by `255.0` at model input.
 ```
 ### 8) Evaluation
+- **Per-stage metrics:** accuracy, macro-F1, confusion matrices.
+- **End-to-end metric:** hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
+- **Calibration:** reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
+- **Robustness checks:** background noise levels, recording device variance, different drum loads.
+- **Leakage control:** ensure clips from the same recording session are in one split only.
 ### 9) Deployment Considerations
+- **App:** Gradio front-end calls the same spectrogram + inference pipeline.
+- **Artifacts:** saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
+- **Reproducibility:** fixed audio/spectrogram params and consistent class order.
+- **Latency:** spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
 ### 10) Limitations & Future Work
+- **Domain shift:** different washers/rooms/mics can reduce accuracy → consider domain adaptation / augmentation.
+- **Simple CNN:** replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
+- **Sequence modeling:** incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
+- **On-device:** quantize models (TFLite) for edge deployment.