FLASK_APP / docs /VIT_MODEL_EXPLANATION.md
pranit144's picture
Upload 97 files
e38de99 verified
# ViT Model in DCRM Pipeline - Complete Explanation
## What is `vitResult`?
The `vitResult` is the output from a **Vision Transformer (ViT) + Gemini AI Ensemble Model** that analyzes the DCRM resistance plot image to classify circuit breaker defects.
---
## πŸ“Š Complete Flow (Step-by-Step)
### **Step 1: Generate Resistance Plot**
**File**: `core/models/vit_classifier.py` β†’ `plot_resistance_for_vit()`
```python
# Creates a plot with 3 lines:
# - Green line: Resistance profile
# - Blue line: Current profile
# - Red line: Travel profile
# Saves as temporary PNG file: temp_vit_plot_{phase}_{uuid}.png
```
**Example**: `temp_vit_plot_r_a3f8d2b1.png`
---
### **Step 2: ViT Model Analysis** (Remote API)
**File**: `core/models/vit_classifier.py` β†’ `get_remote_vit_probabilities()`
```python
# Sends image to deployed ViT model API
DEPLOYED_VIT_URL = "http://143.110.244.235/predict"
# ViT is trained on DCRM images to detect 5 defect classes:
CLASSES = [
"Healthy",
"Arcing_Contact_Misalignment",
"Arcing_Contact_Wear",
"Main Contact Misalignment",
"main_contact_wear"
]
# Returns probability distribution for each class
vit_probs = {
"Healthy": 0.507,
"Arcing_Contact_Misalignment": 0.120,
"Arcing_Contact_Wear": 0.044,
"Main Contact Misalignment": 0.142,
"main_contact_wear": 0.186
}
```
**How ViT Works**:
- ViT (Vision Transformer) is a deep learning model trained on DCRM plot images
- It learned visual patterns from thousands of circuit breaker test plots
- Analyzes waveform shapes, spikes, plateaus, and transitions
- Outputs probability for each defect type
---
### **Step 3: Gemini AI Analysis**
**File**: `core/models/vit_classifier.py` β†’ `get_gemini_prediction()`
```python
# Sends same image to Google Gemini 2.0 Flash
# Uses expert prompt with diagnostic heuristics:
Diagnostic Rules:
1. "The Significant Grass" β†’ Main Contact Corrosion
- Jagged, irregular resistance plateau (> 15-20ΞΌΞ© variance)
2. "Big Spikes & Short Wipe" β†’ Arcing Contact Wear
- Large amplitude spikes, shortened arcing zone
3. "The Struggle to Settle" β†’ Main Misalignment
- High-amplitude peaks before plateau (> 3-5ms)
4. "Rough Entry" β†’ Arcing Misalignment
- Erratic spikes during initial entry
5. "Stretched Time" β†’ Slow Mechanism
- Elongated resistance profile on X-axis
# Returns probability distribution
gemini_probs = {
"Healthy": 0.05,
"Arcing_Contact_Misalignment": 0.02,
"Arcing_Contact_Wear": 0.01,
"Main Contact Misalignment": 0.02,
"main_contact_wear": 0.90 # High confidence!
}
```
---
### **Step 4: Ensemble Prediction**
**File**: `core/models/vit_classifier.py` β†’ `predict_dcrm_image()`
```python
# Combines ViT + Gemini predictions
# ensemble_score = vit_prob + gemini_prob
ensemble_scores = {
"Healthy": 0.507 + 0.05 = 0.557,
"Arcing_Contact_Misalignment": 0.120 + 0.02 = 0.140,
"Arcing_Contact_Wear": 0.044 + 0.01 = 0.054,
"Main Contact Misalignment": 0.142 + 0.02 = 0.162,
"main_contact_wear": 0.186 + 0.90 = 1.086 # βœ… HIGHEST!
}
# Selects class with highest ensemble score
predicted_class = "main_contact_wear"
confidence = 0.543 # Normalized confidence
```
---
### **Step 5: Integration into Pipeline**
**File**: `apps/flask_server.py` β†’ `process_single_phase_csv()`
```python
# Lines 155-183
vit_result = None
vit_plot_path = f"temp_vit_plot_{phase_name}_{uuid.uuid4().hex[:8]}.png"
# Generate plot
if plot_resistance_for_vit(df, vit_plot_path):
# Get prediction
vit_class, vit_conf, vit_details = predict_dcrm_image(vit_plot_path, api_key=api_key)
vit_result = {
"class": vit_class, # "main_contact_wear"
"confidence": vit_conf, # 0.5429375439882278
"details": vit_details # Full breakdown below
}
# Cleanup temp file
os.remove(vit_plot_path)
```
---
## πŸ“¦ vitResult Structure Breakdown
```json
{
"class": "main_contact_wear", // βœ… FINAL PREDICTION
"confidence": 0.5429375439882278, // βœ… NORMALIZED CONFIDENCE
"details": {
"vit_probs": { // πŸ€– Vision Transformer probabilities
"Healthy": 0.5076556205749512,
"Arcing_Contact_Misalignment": 0.12034504860639572,
"Arcing_Contact_Wear": 0.04370640590786934,
"Main Contact Misalignment": 0.1424178034067154,
"main_contact_wear": 0.1858750879764557
},
"gemini_probs": { // 🧠 Gemini AI probabilities
"Healthy": 0.05,
"Arcing_Contact_Misalignment": 0.02,
"Arcing_Contact_Wear": 0.01,
"Main Contact Misalignment": 0.02,
"main_contact_wear": 0.9 // Gemini is very confident!
},
"ensemble_scores": { // 🎯 COMBINED SCORES
"Healthy": 0.5576556205749512,
"Arcing_Contact_Misalignment": 0.1403450486063957,
"Arcing_Contact_Wear": 0.05370640590786934,
"Main Contact Misalignment": 0.16241780340671538,
"main_contact_wear": 1.0858750879764556 // βœ… HIGHEST β†’ WINNER
}
}
}
```
---
## πŸ” Why Two Models?
| Model | Strengths | Weaknesses |
|-------|-----------|------------|
| **ViT** | - Trained on real DCRM data<br>- Fast inference<br>- Consistent | - May overfit to training data<br>- Limited to visual patterns |
| **Gemini** | - Expert reasoning<br>- Contextual understanding<br>- Adapts to new cases | - May hallucinate<br>- Slower<br>- Requires API calls |
| **Ensemble** | βœ… **Best of both worlds**<br>- ViT provides baseline<br>- Gemini adds expertise | - Slightly higher computational cost |
---
## 🎯 How It's Used in the Pipeline
The `vitResult` is:
1. **Generated** in `flask_server.py` (lines 155-183)
2. **Passed to** `report_generator.py`
3. **Included in** final JSON output under each phase (r, y, b)
4. **Referenced** in fault summaries for LLM context
**Example Usage**:
```python
# In report_generator.py
if vit_result:
faults_summary += f"\nViT Model Prediction:\n- Class: {vit_result.get('class', 'Unknown')}\n- Confidence: {vit_result.get('confidence', 0)*100:.2f}%\n"
```
---
## πŸ“Š Visual Flow Diagram
```
Input CSV Data
↓
Extract Resistance, Current, Travel
↓
Generate Plot (matplotlib)
↓
temp_vit_plot.png
↓
β”œβ”€β”€β†’ [ViT API] β†’ vit_probs
└──→ [Gemini AI] β†’ gemini_probs
↓
Ensemble Combination
↓
ensemble_scores
↓
Select MAX score β†’ predicted_class
↓
vitResult JSON
↓
Included in final report
```
---
## πŸ› οΈ Configuration
**ViT API Endpoint**:
```python
DEPLOYED_VIT_URL = "http://143.110.244.235/predict"
```
**Gemini Model**:
```python
model = genai.GenerativeModel('gemini-2.0-flash')
```
**API Key** (from environment):
```python
GOOGLE_API_KEY # Main key
GOOGLE_API_KEY_1 # For R phase
GOOGLE_API_KEY_2 # For Y phase
GOOGLE_API_KEY_3 # For B phase
```
---
## 🚨 Error Handling
If ViT or Gemini fails:
```python
if not vit_result:
# Pipeline continues without ViT analysis
# Other components (Rule Engine, AI Agent) still work
print("ViT prediction unavailable, continuing with other analyses...")
```
The pipeline is **resilient** - if ViT fails, analysis still completes using Rule Engine + AI Agent.
---
## πŸ“ Summary
**vitResult provides**:
- βœ… Image-based defect classification
- βœ… Visual pattern recognition (ViT)
- βœ… Expert reasoning (Gemini)
- βœ… Ensemble confidence scoring
- βœ… Detailed probability breakdown
- βœ… Complements KPI-based and time-series analysis
It's a **3rd independent diagnostic method** alongside:
1. Rule Engine (deterministic thresholds)
2. AI Agent (LLM-based fault detection)
3. **ViT Model (image classification)** ← This one!
All three methods are combined to provide comprehensive, multi-faceted circuit breaker diagnostics.