Electrical Outlets & Switches Diagnostic Pipeline
Non-intrusive AI diagnostic system for electrical outlets and switches using image classification and audio analysis with decision-level fusion.
Overview
This pipeline analyzes photos and/or audio recordings of electrical outlets to detect potential safety issues without requiring physical inspection. It uses two independent models fused at the decision level for robust predictions.
Image Model
- Architecture: EfficientNet-B0 (frozen backbone) + MLP head (512 β 5 classes)
- Classes: burn/overheating, cracked faceplate, loose outlet, normal, water exposed
- Performance: 77.3% accuracy, 66.7% minimum per-class recall
- Training data: 1,299 images across 10 source categories merged into 5 classes
Audio Model
- Architecture: 3-layer Spectrogram CNN (32β64β128 channels + adaptive pooling)
- Classes: normal, buzzing, crackling/arcing, arcing pop
- Performance: 100% macro recall on validation
- Training data: 100 WAV files (22050 Hz, mel spectrograms with SpecAugment)
Fusion
- Decision-level fusion combining both modalities
- Safety-first: prefers "uncertain" over "normal" when in doubt
- Severity = max(image_severity, audio_severity)
- Configurable confidence thresholds in
config/thresholds.yaml
Project Structure
CV/
βββ config/
β βββ label_mapping.json # Class definitions & folderβclass mapping
β βββ image_train_config.yaml # Image training hyperparameters
β βββ audio_train_config.yaml # Audio training hyperparameters
β βββ thresholds.yaml # Fusion confidence thresholds
β βββ schema.yaml # API output schema
βββ src/
β βββ data/
β β βββ image_dataset.py # Image dataset with stratified splits
β β βββ audio_dataset.py # Audio dataset with stratified splits
β βββ models/
β β βββ image_model.py # EfficientNet-B0 + MLP classifier
β β βββ audio_model.py # Spectrogram CNN classifier
β βββ fusion/
β β βββ fusion_logic.py # Decision-level fusion
β βββ inference/
β βββ wrapper.py # End-to-end inference pipeline
βββ training/
β βββ train_image.py # Image model training (2-stage)
β βββ train_audio.py # Audio model training
βββ api/
β βββ main.py # FastAPI endpoint
βββ weights/
β βββ electrical_outlets_image_best.pt # Trained image model
β βββ electrical_outlets_audio_best.pt # Trained audio model
βββ tests/
β βββ test_fusion.py # Fusion logic tests
βββ test_single_image.py # Quick single-image testing
βββ requirements.txt
βββ README.md
Setup
Requirements
- Python 3.10+
- NVIDIA GPU with CUDA (recommended: RTX 3090 or better)
Installation
git clone https://huggingface.co/<your-repo>/electrical-outlets-diagnostic
cd electrical-outlets-diagnostic
pip install -r requirements.txt
# If GPU: install CUDA-enabled PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Also needed on Windows:
pip install soundfile
Download Weights
Download the model weights from the HuggingFace repository and place them in weights/:
weights/
βββ electrical_outlets_image_best.pt (~ 17 MB)
βββ electrical_outlets_audio_best.pt (~ 2 MB)
Usage
Test a Single Image
python test_single_image.py --image path/to/outlet_photo.jpg
Output: ```
burned_outlet.jpg
β burn_overheating (high severity) β 87.3% confidence β issue_detected
burn_overheating 87.3% ββββββββββββββββββββββββββ β cracked_faceplate 5.2% β loose_outlet 3.1% β normal 2.8% β water_exposed 1.6% β
### API Server
```bash
uvicorn api.main:app --host 0.0.0.0 --port 8000
Endpoints
POST /v1/diagnose/electrical_outlets
Upload image and/or audio for diagnosis:
# Image only
curl -X POST http://localhost:8000/v1/diagnose/electrical_outlets \
-F "image=@outlet_photo.jpg"
# Image + Audio
curl -X POST http://localhost:8000/v1/diagnose/electrical_outlets \
-F "image=@outlet_photo.jpg" \
-F "audio=@outlet_recording.wav"
Response:
{
"diagnostic_element": "electrical_outlets",
"result": "issue_detected",
"issue_type": "burn_overheating",
"severity": "high",
"confidence": 0.873,
"modality_contributions": null,
"primary_issue": "burn_overheating",
"secondary_issue": null
}
GET /health β Check model availability
Python API
from src.inference.wrapper import run_electrical_outlets_inference
result = run_electrical_outlets_inference(
image_path="path/to/photo.jpg",
audio_path="path/to/recording.wav", # optional
)
print(result)
Training
Image Model
python training/train_image.py --device cuda
Two-stage training:
- Stage 1: Frozen EfficientNet-B0 backbone, train MLP head only (80-100 epochs)
- Stage 2: Unfreeze last 2 backbone blocks, fine-tune with low LR (25 epochs)
Audio Model
python training/train_audio.py --device cuda
Single-stage with SpecAugment, class-weighted loss, cosine LR schedule.
Class Mapping
Image Classes (5)
| Class | Issue Type | Severity | Source Folders |
|---|---|---|---|
| 0 | burn_overheating | high | Burn marks (250), Discoloration (100), Sparking damage (150) |
| 1 | cracked_faceplate | medium | Cracked faceplate (150), Damaged switches (50) |
| 2 | loose_outlet | medium | Loose outlet (200), Exposed wiring (150) |
| 3 | normal | low | Normal outlets (50), Normal switches (50) |
| 4 | water_exposed | high | Water intrusion (150) |
Audio Classes (4)
| Class | Issue Type | Severity |
|---|---|---|
| 0 | normal | low |
| 1 | buzzing | high |
| 2 | crackling_arcing | high |
| 3 | arcing_pop | critical |
Severity Levels
| Level | Action Required |
|---|---|
| low | Monitor β no immediate action |
| medium | Schedule repair |
| high | Shut off circuit immediately |
| critical | Shut off main breaker immediately |
Fusion Logic
The fusion layer combines image and audio predictions:
- If both agree on issue β
issue_detectedwith max severity - If both agree on normal with high confidence β
normal - If they disagree β
uncertain(unless one has >92% confidence) - Safety-first: defaults to
uncertainovernormalwhen confidence is low
Limitations
- Image model trained on web-sourced images (some watermarked/AI-generated)
- Audio model trained on 100 synthetic clips β use as supporting evidence only
- Water damage and cracked faceplate classes have lower recall (64-67%)
- No GFCI failure detection (no training data available)
- Real-world accuracy will be lower than validation metrics
Evaluation Results
Image Model (V5.1 β Final)
Dataset: 1,299 images
Validation Split: 194 images
Best Epoch: 76
Overall Metrics
| Metric | Value |
|---|---|
| Accuracy | 77.3% |
| Minimum Per-Class Recall | 66.7% |
| Macro Recall | 77.0% |
| Trainable Parameters | 658,437 (14.1%) |
Per-Class Recall
| Class | Recall | Notes |
|---|---|---|
| burn_overheating | 68% | Confused with dark loose_outlet cases |
| cracked_faceplate | 63% | Lowest data (200 images) |
| loose_outlet | 98% | Strong visual pattern |
| normal | 93% | Despite only 100 images |
| water_exposed | 64% | Subtle cues, limited data |
Audio Model
Dataset: 100 WAV files
Validation Recall: 100% macro recall
Converged: Epoch 15
β Audio validation dataset is small and partially synthetic; real-world generalization may differ.
Training Configuration
Image Model
- Backbone: EfficientNet-B0 (ImageNet pretrained)
- Stage 1: Frozen backbone, head training (80β100 epochs)
- Stage 2: Partial unfreeze (last 2 blocks), low LR fine-tuning
- Optimizer: AdamW
- Fine-tune LR: 2e-4
- Head: 512 hidden units, 0.5 dropout
- Early stopping: patience=25
Audio Model
- 3-layer CNN
- Mel spectrogram input (22050 Hz)
- SpecAugment enabled
- Cosine LR scheduler
- Class-weighted cross-entropy
Model Evolution Summary
| Version | Min Recall | Accuracy | Key Change |
|---|---|---|---|
| V1 | 31.8% | 47% | Baseline |
| V2 | 26.7% | 44% | High LR β overfitting |
| V3 | 27.2% | 52% | Frozen backbone |
| V4 | 0% | β | Folder mapping bug |
| V5 | 63.6% | 77.3% | Fixed dataset loading |
| V5.1 | 66.7% | 77.3% | Larger head + improved LR |
Total improvement:
+35 pts minimum recall
+30 pts accuracy
Bias, Risks & Safety Considerations
- Trained on web-sourced images β may not generalize to low-light industrial environments
- Audio dataset is small and partially synthetic
- Some image classes are underrepresented (cracked_faceplate, water_exposed)
- Not certified for electrical compliance decisions
- Should not replace licensed electrical inspection
Recommended use: screening / preliminary diagnostics only
Future Improvements
- Add 100β200 more real cracked/water samples
- Clean watermarked images
- Upgrade backbone to ConvNeXt-Tiny or EfficientNet-B2
- Collect real-world buzzing/arcing audio
License
Proprietary β for use in the Electrical Outlets diagnostic pipeline only.