Create methodology.md

#2
by mandarmgd-03 - opened
Files changed (1) hide show
  1. methodology.md +160 -0
methodology.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Methodology
2
+
3
+ ### 1) Problem Framing
4
+ We treat washing-machine sound understanding as a **two-stage hierarchical image classification** task:
5
+
6
+ 1. **Stage-1 (Coarse):** Detect whether a sound is **Abnormal** or **Normal** from its Mel-spectrogram.
7
+ 2. **Stage-2 (Fine):** If **Abnormal**, classify the failure mode (e.g., *Bearing noise*, *Dehydration mode noise*). If **Normal**, classify the operating mode (e.g., *Wash*, *Spin*).
8
+
9
+ This decouples anomaly detection from mode identification and reduces class confusion.
10
+
11
+ ---
12
+
13
+ ### 2) Data & Labeling
14
+
15
+ - **Source:** Short `.wav` recordings of washing-machine cycles (mono).
16
+ - **Label Taxonomy:**
17
+ 00 - Abnormal/
18
+ β”œβ”€ Bearing noise/
19
+ └─ Dehydration mode noise/
20
+ 01 - Normal/
21
+ β”œβ”€ Wash mode/
22
+ └─ Spin mode/
23
+
24
+ - **Granularity:** Each file is a single clip labeled at the folder level.
25
+
26
+ > To avoid label leakage, clips from the **same physical machine / session** should not be split across train and validation sets (group-aware split).
27
+
28
+ ---
29
+
30
+ ### 3) Preprocessing β†’ Mel-Spectrograms
31
+
32
+ - **Audio params:** `sr=22050`, `n_fft=2048`, `hop_length=512`, `n_mels=128`
33
+ - **Transform:**
34
+ 1. Load mono audio: \( y \in \mathbb{R}^{T} \)
35
+ 2. Mel power spectrogram: \( S = \text{MelSpec}(y; sr, n\_mels, n\_fft, hop) \)
36
+ 3. Log scaling (dB): \( S_{dB} = 10 \log_{10} \left(\frac{S}{\max(S)}\right) \)
37
+ - **Rendering:** `librosa.display.specshow(S_db, cmap="magma")`, save to PNG, **no axes**, `224Γ—224` target size.
38
+ - **Normalization:** Divide pixel values by `255.0` at model input.
39
+
40
+ All scripts use the same constants to ensure train/test consistency.
41
+
42
+ ---
43
+
44
+ ### 4) Dataset Construction
45
+
46
+ - **Stage-1 dataset:** `MelSpectrograms/` with the two top-level folders (`00 - Abnormal`, `01 - Normal`).
47
+ - **Stage-2 datasets:**
48
+ - **Abnormal head:** `MelSpectrograms/00 - Abnormal/*`
49
+ - **Normal head:** `MelSpectrograms/01 - Normal/*`
50
+ - **Splits:** `validation_split=0.2`, `seed=42` via `image_dataset_from_directory`.
51
+ - **Class Order:** Persisted in `saved_models/label_meta.json` to guarantee consistent label ↔ index mapping at inference.
52
+
53
+ ---
54
+
55
+ ### 5) Models & Architecture
56
+
57
+ Both stages use a compact CNN to keep inference light:
58
+
59
+ - **Backbone (per head):**
60
+ - `Conv2D(32, 3Γ—3) β†’ ReLU β†’ MaxPool(2Γ—2)`
61
+ - `Conv2D(64, 3Γ—3) β†’ ReLU β†’ MaxPool(2Γ—2)`
62
+ - `Conv2D(128, 3Γ—3) β†’ ReLU β†’ MaxPool(2Γ—2)`
63
+ - `Flatten β†’ Dense(128) β†’ ReLU β†’ Dropout(0.3) β†’ Dense(num_classes) β†’ Softmax`
64
+ - **Input:** `224Γ—224Γ—3` spectrogram images
65
+ - **Loss:** `SparseCategoricalCrossentropy`
66
+ - **Optimizer:** `Adam`
67
+ - **Metrics:** `Accuracy`
68
+
69
+ > Rationale: A simple CNN is sufficient for a strong baseline; the hierarchy offloads fine-grained distinctions to specialized heads.
70
+
71
+ ---
72
+
73
+ ### 6) Training Protocol
74
+
75
+ - **Stage-1:** Train on `Normal` vs `Abnormal` spectrograms.
76
+ - **Stage-2 Abnormal:** Train only on abnormal subclasses.
77
+ - **Stage-2 Normal:** Train only on normal subclasses.
78
+ - **Epochs:** `10` (baseline; tune as needed)
79
+ - **Batch size:** `32`
80
+ - **Pipelines:** `cache β†’ (shuffle) β†’ prefetch` with `tf.data.AUTOTUNE`
81
+ - **Checkpointing:** Save each head to `saved_models/*.h5` and class orders to `label_meta.json`.
82
+
83
+ Optional (recommended):
84
+ - **Augmentations:** time masking, frequency masking, Gaussian noise on spectrograms, random time shifts on audio.
85
+ - **Class imbalance:** oversampling minority subclasses or focal loss in Stage-2 heads.
86
+
87
+ ---
88
+
89
+ ### 7) Inference Flow (Hierarchical)
90
+
91
+ **Input:** `.wav` β†’ Mel-spectrogram β†’ `224Γ—224`
92
+
93
+ 1. **Stage-1:** `p_stage1 = f_stage1(img)` β†’ `y1 = argmax(p_stage1)`
94
+
95
+ 2. **Route:**
96
+ - If `y1 == "00 - Abnormal"` β†’ use `abnormal_model`
97
+ - Else β†’ use `normal_model`
98
+
99
+ 3. **Stage-2:** `p_stage2 = f_head(img)` β†’ `y2 = argmax(p_stage2)`
100
+
101
+ 4. **Output:**
102
+ `final = f"{y1.split(' - ')[1]} β†’ {class2}"`
103
+ plus confidences: `max(p_stage1)`, `max(p_stage2)`
104
+
105
+ **Pseudocode**
106
+ ```python
107
+ spec = to_mel_spectrogram(wav)
108
+ img = preprocess(spec) # 224x224, /255.0
109
+
110
+ p1 = stage1_model(img) # [2]
111
+ y1 = argmax(p1)
112
+
113
+ head = abnormal_model if y1_is_abnormal else normal_model
114
+ p2 = head(img) # [num_subclasses]
115
+ y2 = argmax(p2)
116
+
117
+ return {
118
+ "stage1_class": class_names_stage1[y1],
119
+ "stage1_confidence": max(p1),
120
+ "stage2_class": class_names_stage2[y2],
121
+ "stage2_confidence": max(p2),
122
+ "final_prediction": ...
123
+ }
124
+
125
+ ```
126
+
127
+ ### 8) Evaluation
128
+ Per-stage metrics: accuracy, macro-F1, confusion matrices.
129
+
130
+ End-to-end metric: hierarchical accuracy = % of samples where both Stage-1 and Stage-2 predictions are correct.
131
+
132
+ Calibration: reliability curves / ECE on max_softmax for Stage-1 and Stage-2; optionally apply temperature scaling.
133
+
134
+ Robustness checks: background noise levels, recording device variance, different drum loads.
135
+
136
+ Leakage control: ensure clips from the same recording session are in one split only.
137
+
138
+ ### 9) Deployment Considerations
139
+ App: Gradio front-end calls the same spectrogram + inference pipeline.
140
+
141
+ Artifacts: saved_models/{stage1,abnormal,normal}.h5 + saved_models/label_meta.json
142
+
143
+ Reproducibility: fixed audio/spectrogram params and consistent class order.
144
+
145
+ Latency: spectrogram generation dominates; keep n_fft/hop_length fixed and consider caching frequent uploads.
146
+
147
+ ### 10) Limitations & Future Work
148
+ Domain shift: different washers/rooms/mics can reduce accuracy β†’ consider domain adaptation / augmentation.
149
+
150
+ Simple CNN: replace with MobileNetV2/EfficientNet for improved accuracy at similar latency.
151
+
152
+ Sequence modeling: incorporate temporal context (e.g., ConvLSTM / Transformer over spectrogram patches).
153
+
154
+ On-device: quantize models (TFLite) for edge deployment.
155
+
156
+
157
+
158
+
159
+
160
+