Spaces:
Sleeping
Sleeping
Create methodology.md
#2
by mandarmgd-03 - opened
- methodology.md +160 -0
methodology.md
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Methodology
|
| 2 |
+
|
| 3 |
+
### 1) Problem Framing
|
| 4 |
+
We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
|
| 5 |
+
|
| 6 |
+
1. **Stage-1 (Coarse):** Detect whether a sound is **Abnormal** or **Normal** from its Mel-spectrogram.
|
| 7 |
+
2. **Stage-2 (Fine):** If **Abnormal**, classify the failure mode (e.g., *Bearing noise*, *Dehydration mode noise*). If **Normal**, classify the operating mode (e.g., *Wash*, *Spin*).
|
| 8 |
+
|
| 9 |
+
This decouples anomaly detection from mode identification and reduces class confusion.
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
### 2) Data & Labeling
|
| 14 |
+
|
| 15 |
+
- **Source:** Short `.wav` recordings of washing-machine cycles (mono).
|
| 16 |
+
- **Label Taxonomy:**
|
| 17 |
+
00 - Abnormal/
|
| 18 |
+
ββ Bearing noise/
|
| 19 |
+
ββ Dehydration mode noise/
|
| 20 |
+
01 - Normal/
|
| 21 |
+
ββ Wash mode/
|
| 22 |
+
ββ Spin mode/
|
| 23 |
+
|
| 24 |
+
- **Granularity:** Each file is a single clip labeled at the folder level.
|
| 25 |
+
|
| 26 |
+
> To avoid label leakage, clips from the **same physical machine / session** should not be split across train and validation sets (group-aware split).
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
### 3) Preprocessing β Mel-Spectrograms
|
| 31 |
+
|
| 32 |
+
- **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
|
| 33 |
+
- **Transform:**
|
| 34 |
+
1. Load mono audio: \( y \in \mathbb{R}^{T} \)
|
| 35 |
+
2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
|
| 36 |
+
3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
|
| 37 |
+
- **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224Γ224` target size.
|
| 38 |
+
- **Normalization:** Divide pixel values by `255.0` at model input.
|
| 39 |
+
|
| 40 |
+
All scripts use the same constants to ensure train/test consistency.
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
### 4) Dataset Construction
|
| 45 |
+
|
| 46 |
+
- **Stage-1 dataset:** `MelSpectrograms/` with the two top-level folders (`00 - Abnormal`, `01 - Normal`).
|
| 47 |
+
- **Stage-2 datasets:**
|
| 48 |
+
- **Abnormal head:** `MelSpectrograms/00 - Abnormal/*`
|
| 49 |
+
- **Normal head:** `MelSpectrograms/01 - Normal/*`
|
| 50 |
+
- **Splits:** `validation_split=0.2`, `seed=42` via `image_dataset_from_directory`.
|
| 51 |
+
- **Class Order:** Persisted in `saved_models/label_meta.json` to guarantee consistent label β index mapping at inference.
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
### 5) Models & Architecture
|
| 56 |
+
|
| 57 |
+
Both stages use a compact CNN to keep inference light:
|
| 58 |
+
|
| 59 |
+
- **Backbone (per head):**
|
| 60 |
+
- `Conv2D(32, 3Γ3) β ReLU β MaxPool(2Γ2)`
|
| 61 |
+
- `Conv2D(64, 3Γ3) β ReLU β MaxPool(2Γ2)`
|
| 62 |
+
- `Conv2D(128, 3Γ3) β ReLU β MaxPool(2Γ2)`
|
| 63 |
+
- `Flatten β Dense(128) β ReLU β Dropout(0.3) β Dense(num_classes) β Softmax`
|
| 64 |
+
- **Input:** `224Γ224Γ3` spectrogram images
|
| 65 |
+
- **Loss:** `SparseCategoricalCrossentropy`
|
| 66 |
+
- **Optimizer:** `Adam`
|
| 67 |
+
- **Metrics:** `Accuracy`
|
| 68 |
+
|
| 69 |
+
> Rationale: A simple CNN is sufficient for a strong baseline; the hierarchy offloads fine-grained distinctions to specialized heads.
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
### 6) Training Protocol
|
| 74 |
+
|
| 75 |
+
- **Stage-1:** Train on `Normal` vs `Abnormal` spectrograms.
|
| 76 |
+
- **Stage-2 Abnormal:** Train only on abnormal subclasses.
|
| 77 |
+
- **Stage-2 Normal:** Train only on normal subclasses.
|
| 78 |
+
- **Epochs:** `10` (baseline; tune as needed)
|
| 79 |
+
- **Batch size:** `32`
|
| 80 |
+
- **Pipelines:** `cache β (shuffle) β prefetch` with `tf.data.AUTOTUNE`
|
| 81 |
+
- **Checkpointing:** Save each head to `saved_models/*.h5` and class orders to `label_meta.json`.
|
| 82 |
+
|
| 83 |
+
Optional (recommended):
|
| 84 |
+
- **Augmentations:** time masking, frequency masking, Gaussian noise on spectrograms, random time shifts on audio.
|
| 85 |
+
- **Class imbalance:** oversampling minority subclasses or focal loss in Stage-2 heads.
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
|
| 89 |
+
### 7) Inference Flow (Hierarchical)
|
| 90 |
+
|
| 91 |
+
**Input:** `.wav` β Mel-spectrogram β `224Γ224`
|
| 92 |
+
|
| 93 |
+
1. **Stage-1:** `p_stage1 = f_stage1(img)` β `y1 = argmax(p_stage1)`
|
| 94 |
+
|
| 95 |
+
2. **Route:**
|
| 96 |
+
- If `y1 == "00 - Abnormal"` β use `abnormal_model`
|
| 97 |
+
- Else β use `normal_model`
|
| 98 |
+
|
| 99 |
+
3. **Stage-2:** `p_stage2 = f_head(img)` β `y2 = argmax(p_stage2)`
|
| 100 |
+
|
| 101 |
+
4. **Output:**
|
| 102 |
+
`final = f"{y1.split(' - ')[1]} β {class2}"`
|
| 103 |
+
plus confidences: `max(p_stage1)`, `max(p_stage2)`
|
| 104 |
+
|
| 105 |
+
**Pseudocode**
|
| 106 |
+
```python
|
| 107 |
+
spec = to_mel_spectrogram(wav)
|
| 108 |
+
img = preprocess(spec) # 224x224, /255.0
|
| 109 |
+
|
| 110 |
+
p1 = stage1_model(img) # [2]
|
| 111 |
+
y1 = argmax(p1)
|
| 112 |
+
|
| 113 |
+
head = abnormal_model if y1_is_abnormal else normal_model
|
| 114 |
+
p2 = head(img) # [num_subclasses]
|
| 115 |
+
y2 = argmax(p2)
|
| 116 |
+
|
| 117 |
+
return {
|
| 118 |
+
"stage1_class": class_names_stage1[y1],
|
| 119 |
+
"stage1_confidence": max(p1),
|
| 120 |
+
"stage2_class": class_names_stage2[y2],
|
| 121 |
+
"stage2_confidence": max(p2),
|
| 122 |
+
"final_prediction": ...
|
| 123 |
+
}
|
| 124 |
+
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
### 8) Evaluation
|
| 128 |
+
Per-stage metrics: accuracy, macro-F1, confusion matrices.
|
| 129 |
+
|
| 130 |
+
End-to-end metric: hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
|
| 131 |
+
|
| 132 |
+
Calibration: reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
|
| 133 |
+
|
| 134 |
+
Robustness checks: background noise levels, recording device variance, different drum loads.
|
| 135 |
+
|
| 136 |
+
Leakage control: ensure clips from the same recording session are in one split only.
|
| 137 |
+
|
| 138 |
+
### 9) Deployment Considerations
|
| 139 |
+
App: Gradio front-end calls the same spectrogram + inference pipeline.
|
| 140 |
+
|
| 141 |
+
Artifacts: saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
|
| 142 |
+
|
| 143 |
+
Reproducibility: fixed audio/spectrogram params and consistent class order.
|
| 144 |
+
|
| 145 |
+
Latency: spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
|
| 146 |
+
|
| 147 |
+
### 10) Limitations & Future Work
|
| 148 |
+
Domain shift: different washers/rooms/mics can reduce accuracy β consider domain adaptation / augmentation.
|
| 149 |
+
|
| 150 |
+
Simple CNN: replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
|
| 151 |
+
|
| 152 |
+
Sequence modeling: incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
|
| 153 |
+
|
| 154 |
+
On-device: quantize models (TFLite) for edge deployment.
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
|