Electrical Outlets & Switches Diagnostic Pipeline

Non-intrusive AI diagnostic system for electrical outlets and switches using image classification and audio analysis with decision-level fusion.

Overview

This pipeline analyzes photos and/or audio recordings of electrical outlets to detect potential safety issues without requiring physical inspection. It uses two independent models fused at the decision level for robust predictions.

Image Model

  • Architecture: EfficientNet-B0 (frozen backbone) + MLP head (512 β†’ 5 classes)
  • Classes: burn/overheating, cracked faceplate, loose outlet, normal, water exposed
  • Performance: 77.3% accuracy, 66.7% minimum per-class recall
  • Training data: 1,299 images across 10 source categories merged into 5 classes

Audio Model

  • Architecture: 3-layer Spectrogram CNN (32β†’64β†’128 channels + adaptive pooling)
  • Classes: normal, buzzing, crackling/arcing, arcing pop
  • Performance: 100% macro recall on validation
  • Training data: 100 WAV files (22050 Hz, mel spectrograms with SpecAugment)

Fusion

  • Decision-level fusion combining both modalities
  • Safety-first: prefers "uncertain" over "normal" when in doubt
  • Severity = max(image_severity, audio_severity)
  • Configurable confidence thresholds in config/thresholds.yaml

Project Structure

CV/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ label_mapping.json          # Class definitions & folderβ†’class mapping
β”‚   β”œβ”€β”€ image_train_config.yaml     # Image training hyperparameters
β”‚   β”œβ”€β”€ audio_train_config.yaml     # Audio training hyperparameters
β”‚   β”œβ”€β”€ thresholds.yaml             # Fusion confidence thresholds
β”‚   └── schema.yaml                 # API output schema
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ image_dataset.py        # Image dataset with stratified splits
β”‚   β”‚   └── audio_dataset.py        # Audio dataset with stratified splits
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ image_model.py          # EfficientNet-B0 + MLP classifier
β”‚   β”‚   └── audio_model.py          # Spectrogram CNN classifier
β”‚   β”œβ”€β”€ fusion/
β”‚   β”‚   └── fusion_logic.py         # Decision-level fusion
β”‚   └── inference/
β”‚       └── wrapper.py              # End-to-end inference pipeline
β”œβ”€β”€ training/
β”‚   β”œβ”€β”€ train_image.py              # Image model training (2-stage)
β”‚   └── train_audio.py              # Audio model training
β”œβ”€β”€ api/
β”‚   └── main.py                     # FastAPI endpoint
β”œβ”€β”€ weights/
β”‚   β”œβ”€β”€ electrical_outlets_image_best.pt   # Trained image model
β”‚   └── electrical_outlets_audio_best.pt   # Trained audio model
β”œβ”€β”€ tests/
β”‚   └── test_fusion.py              # Fusion logic tests
β”œβ”€β”€ test_single_image.py            # Quick single-image testing
β”œβ”€β”€ requirements.txt
└── README.md

Setup

Requirements

  • Python 3.10+
  • NVIDIA GPU with CUDA (recommended: RTX 3090 or better)

Installation

git clone https://huggingface.co/<your-repo>/electrical-outlets-diagnostic
cd electrical-outlets-diagnostic

pip install -r requirements.txt

# If GPU: install CUDA-enabled PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Also needed on Windows:
pip install soundfile

Download Weights

Download the model weights from the HuggingFace repository and place them in weights/:

weights/
β”œβ”€β”€ electrical_outlets_image_best.pt   (~ 17 MB)
└── electrical_outlets_audio_best.pt   (~ 2 MB)

Usage

Test a Single Image

python test_single_image.py --image path/to/outlet_photo.jpg

Output: ```

burned_outlet.jpg

β†’ burn_overheating (high severity) β†’ 87.3% confidence β†’ issue_detected

burn_overheating 87.3% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β—„ cracked_faceplate 5.2% β–ˆ loose_outlet 3.1% β–Š normal 2.8% β–Š water_exposed 1.6% ▍


### API Server

```bash
uvicorn api.main:app --host 0.0.0.0 --port 8000

Endpoints

POST /v1/diagnose/electrical_outlets

Upload image and/or audio for diagnosis:

# Image only
curl -X POST http://localhost:8000/v1/diagnose/electrical_outlets \
  -F "image=@outlet_photo.jpg"

# Image + Audio
curl -X POST http://localhost:8000/v1/diagnose/electrical_outlets \
  -F "image=@outlet_photo.jpg" \
  -F "audio=@outlet_recording.wav"

Response:

{
  "diagnostic_element": "electrical_outlets",
  "result": "issue_detected",
  "issue_type": "burn_overheating",
  "severity": "high",
  "confidence": 0.873,
  "modality_contributions": null,
  "primary_issue": "burn_overheating",
  "secondary_issue": null
}

GET /health β€” Check model availability

Python API

from src.inference.wrapper import run_electrical_outlets_inference

result = run_electrical_outlets_inference(
    image_path="path/to/photo.jpg",
    audio_path="path/to/recording.wav",  # optional
)
print(result)

Training

Image Model

python training/train_image.py --device cuda

Two-stage training:

  1. Stage 1: Frozen EfficientNet-B0 backbone, train MLP head only (80-100 epochs)
  2. Stage 2: Unfreeze last 2 backbone blocks, fine-tune with low LR (25 epochs)

Audio Model

python training/train_audio.py --device cuda

Single-stage with SpecAugment, class-weighted loss, cosine LR schedule.

Class Mapping

Image Classes (5)

Class Issue Type Severity Source Folders
0 burn_overheating high Burn marks (250), Discoloration (100), Sparking damage (150)
1 cracked_faceplate medium Cracked faceplate (150), Damaged switches (50)
2 loose_outlet medium Loose outlet (200), Exposed wiring (150)
3 normal low Normal outlets (50), Normal switches (50)
4 water_exposed high Water intrusion (150)

Audio Classes (4)

Class Issue Type Severity
0 normal low
1 buzzing high
2 crackling_arcing high
3 arcing_pop critical

Severity Levels

Level Action Required
low Monitor β€” no immediate action
medium Schedule repair
high Shut off circuit immediately
critical Shut off main breaker immediately

Fusion Logic

The fusion layer combines image and audio predictions:

  • If both agree on issue β†’ issue_detected with max severity
  • If both agree on normal with high confidence β†’ normal
  • If they disagree β†’ uncertain (unless one has >92% confidence)
  • Safety-first: defaults to uncertain over normal when confidence is low

Limitations

  • Image model trained on web-sourced images (some watermarked/AI-generated)
  • Audio model trained on 100 synthetic clips β€” use as supporting evidence only
  • Water damage and cracked faceplate classes have lower recall (64-67%)
  • No GFCI failure detection (no training data available)
  • Real-world accuracy will be lower than validation metrics

Evaluation Results

Image Model (V5.1 – Final)

Dataset: 1,299 images
Validation Split: 194 images
Best Epoch: 76

Overall Metrics

Metric Value
Accuracy 77.3%
Minimum Per-Class Recall 66.7%
Macro Recall 77.0%
Trainable Parameters 658,437 (14.1%)

Per-Class Recall

Class Recall Notes
burn_overheating 68% Confused with dark loose_outlet cases
cracked_faceplate 63% Lowest data (200 images)
loose_outlet 98% Strong visual pattern
normal 93% Despite only 100 images
water_exposed 64% Subtle cues, limited data

Audio Model

Dataset: 100 WAV files
Validation Recall: 100% macro recall
Converged: Epoch 15

⚠ Audio validation dataset is small and partially synthetic; real-world generalization may differ.


Training Configuration

Image Model

  • Backbone: EfficientNet-B0 (ImageNet pretrained)
  • Stage 1: Frozen backbone, head training (80–100 epochs)
  • Stage 2: Partial unfreeze (last 2 blocks), low LR fine-tuning
  • Optimizer: AdamW
  • Fine-tune LR: 2e-4
  • Head: 512 hidden units, 0.5 dropout
  • Early stopping: patience=25

Audio Model

  • 3-layer CNN
  • Mel spectrogram input (22050 Hz)
  • SpecAugment enabled
  • Cosine LR scheduler
  • Class-weighted cross-entropy

Model Evolution Summary

Version Min Recall Accuracy Key Change
V1 31.8% 47% Baseline
V2 26.7% 44% High LR β†’ overfitting
V3 27.2% 52% Frozen backbone
V4 0% β€” Folder mapping bug
V5 63.6% 77.3% Fixed dataset loading
V5.1 66.7% 77.3% Larger head + improved LR

Total improvement:
+35 pts minimum recall
+30 pts accuracy


Bias, Risks & Safety Considerations

  • Trained on web-sourced images β†’ may not generalize to low-light industrial environments
  • Audio dataset is small and partially synthetic
  • Some image classes are underrepresented (cracked_faceplate, water_exposed)
  • Not certified for electrical compliance decisions
  • Should not replace licensed electrical inspection

Recommended use: screening / preliminary diagnostics only


Future Improvements

  • Add 100–200 more real cracked/water samples
  • Clean watermarked images
  • Upgrade backbone to ConvNeXt-Tiny or EfficientNet-B2
  • Collect real-world buzzing/arcing audio

License

Proprietary β€” for use in the Electrical Outlets diagnostic pipeline only.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support