Upload 6 files
Browse filesInitial deployment of Insurance Claims Decision Support System
This is a GOVERNANCE-COMPLIANT reference implementation:
- Classical ML only (XGBoost)
- ADVISORY outputs only
- Human-in-the-loop REQUIRED
- Full explainability (confidence scores, feature importance)
- Decision boundaries FROZEN from decision_spec.yaml
- NO autonomous decision-making
Deliverables:
- train.py: Training pipeline
- evaluate.py: Model evaluation with metrics
- predict.py: Advisory predictions with explainability
- requirements.txt: Dependencies (classical ML only)
- decision_spec.yaml: Frozen decision boundaries
- README.md: Model Card with limitations and governance status
- README.md +393 -3
- decision_spec.yaml +189 -0
- evaluate.py +410 -0
- predict.py +370 -0
- requirements.txt +18 -0
- train.py +301 -0
README.md
CHANGED
|
@@ -1,3 +1,393 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Model Card: Insurance Claims Decision Support System
|
| 2 |
+
|
| 3 |
+
**Model Version**: 1.0.0
|
| 4 |
+
**Last Updated**: 2026-01-04
|
| 5 |
+
**Model Type**: Classical Machine Learning (XGBoost Classifier)
|
| 6 |
+
**Governance Status**: ADVISORY ONLY - Human-in-the-Loop Required
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Model Description
|
| 11 |
+
|
| 12 |
+
### Overview
|
| 13 |
+
This model is a **classical machine learning classifier** designed to provide **advisory suggestions** for insurance claim severity assessment. It uses XGBoost (gradient boosting decision trees) to analyze claim characteristics and suggest severity levels.
|
| 14 |
+
|
| 15 |
+
**CRITICAL: This is NOT an autonomous decision-making system.** All outputs are advisory suggestions that require mandatory human review and confirmation.
|
| 16 |
+
|
| 17 |
+
### Architecture
|
| 18 |
+
- **Algorithm**: XGBoost Classifier (tree-based gradient boosting)
|
| 19 |
+
- **Type**: Classical ML (NOT neural networks, NOT deep learning, NOT LLMs)
|
| 20 |
+
- **Training**: Supervised learning on synthetic insurance claims data
|
| 21 |
+
- **Output**: Three-class classification (Low/Medium/High severity) with confidence scores
|
| 22 |
+
|
| 23 |
+
### Model Characteristics
|
| 24 |
+
- **Deterministic**: Same inputs always produce same outputs
|
| 25 |
+
- **Explainable**: Feature importance and rule signals provided for every prediction
|
| 26 |
+
- **Transparent**: All decision logic is open source and auditable
|
| 27 |
+
- **Non-autonomous**: Cannot make binding decisions without human confirmation
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## Intended Use
|
| 32 |
+
|
| 33 |
+
### Primary Use Cases
|
| 34 |
+
✅ **Educational demonstration** of AI governance principles
|
| 35 |
+
✅ **Proof-of-concept** for governed decision support systems
|
| 36 |
+
✅ **Training tool** for insurance professionals learning about AI assistance
|
| 37 |
+
✅ **Research platform** for studying human-in-the-loop AI systems
|
| 38 |
+
✅ **Compliance review** demonstrations for regulatory stakeholders
|
| 39 |
+
|
| 40 |
+
### Target Audience
|
| 41 |
+
- AI governance researchers and practitioners
|
| 42 |
+
- Insurance industry evaluators and trainers
|
| 43 |
+
- Regulatory compliance officers
|
| 44 |
+
- Responsible AI designers
|
| 45 |
+
- Educational institutions
|
| 46 |
+
|
| 47 |
+
### Appropriate Contexts
|
| 48 |
+
- Demonstration environments with synthetic data
|
| 49 |
+
- Educational workshops and training sessions
|
| 50 |
+
- Prototype testing for governance frameworks
|
| 51 |
+
- Academic research on AI decision support
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## Non-Intended Use
|
| 56 |
+
|
| 57 |
+
### ❌ DO NOT USE FOR:
|
| 58 |
+
- **Production insurance claims processing** - This is a demonstration system only
|
| 59 |
+
- **Real financial decisions** - Not validated for real-world claims
|
| 60 |
+
- **Autonomous decision-making** - Human oversight is mandatory
|
| 61 |
+
- **Processing real customer data** - Designed for synthetic data only
|
| 62 |
+
- **Regulatory compliance** without human review - No regulatory approval obtained
|
| 63 |
+
- **Replacing human insurance adjusters** - Designed to assist, not replace
|
| 64 |
+
- **High-stakes decisions** without expert review
|
| 65 |
+
- **Any application** where model errors could cause harm
|
| 66 |
+
|
| 67 |
+
### Why These Uses Are Prohibited
|
| 68 |
+
1. **No Real-World Validation**: Trained only on synthetic data
|
| 69 |
+
2. **No Regulatory Approval**: Not certified for insurance operations
|
| 70 |
+
3. **Simplified Rules**: Real insurance claims are far more complex
|
| 71 |
+
4. **Demonstration Quality**: Built for education, not production
|
| 72 |
+
5. **No Liability Coverage**: No guarantees or warranties provided
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## Training Data
|
| 77 |
+
|
| 78 |
+
### Dataset Information
|
| 79 |
+
- **Source**: BDR-AI/insurance_decision_boundaries_v1 (Hugging Face Datasets)
|
| 80 |
+
- **Type**: Synthetic/demonstration data
|
| 81 |
+
- **Purpose**: Educational model training only
|
| 82 |
+
- **Size**: [Varies - check model_metadata.json for specific training run]
|
| 83 |
+
|
| 84 |
+
### Data Characteristics
|
| 85 |
+
- **Features**: 4 input features (claim_type, damage_amount, injury_involved, risk_factor)
|
| 86 |
+
- **Target**: 3 severity levels (Low, Medium, High)
|
| 87 |
+
- **Distribution**: Balanced across severity classes
|
| 88 |
+
- **Quality**: Synthetic data generated based on simplified rules
|
| 89 |
+
|
| 90 |
+
### Data Limitations
|
| 91 |
+
⚠ **NOT REAL-WORLD DATA**: This dataset is synthetic and does not represent actual insurance claims
|
| 92 |
+
⚠ **SIMPLIFIED**: Real insurance claims involve hundreds of factors, not just 4
|
| 93 |
+
⚠ **NO BIAS TESTING**: Synthetic data may not reflect real-world demographic patterns
|
| 94 |
+
⚠ **FROZEN BOUNDARIES**: Decision thresholds are fixed and may not match real insurance practices
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## Model Performance
|
| 99 |
+
|
| 100 |
+
### Evaluation Metrics
|
| 101 |
+
Performance metrics are available in `evaluation_report.json` after running `evaluate.py`.
|
| 102 |
+
|
| 103 |
+
**Typical Performance** (on synthetic test data):
|
| 104 |
+
- **Accuracy**: ~85-95% (varies by training run)
|
| 105 |
+
- **Precision/Recall**: Balanced across severity classes
|
| 106 |
+
- **Confidence Calibration**: Assessed via log loss metric
|
| 107 |
+
- **Uncertainty Quantification**: Entropy-based uncertainty scores provided
|
| 108 |
+
|
| 109 |
+
### Performance Interpretation
|
| 110 |
+
✓ **High accuracy on synthetic data** - Model learns the simplified rules effectively
|
| 111 |
+
⚠ **Unknown real-world performance** - Not tested on actual insurance claims
|
| 112 |
+
⚠ **Overconfidence risk** - Synthetic data may lead to higher confidence than warranted
|
| 113 |
+
|
| 114 |
+
### Confidence Scores
|
| 115 |
+
- Model provides confidence scores (0.0-1.0) for each prediction
|
| 116 |
+
- Higher confidence does NOT eliminate need for human review
|
| 117 |
+
- Low confidence predictions require extra scrutiny
|
| 118 |
+
- Uncertainty quantification helps prioritize human attention
|
| 119 |
+
|
| 120 |
+
---
|
| 121 |
+
|
| 122 |
+
## Limitations
|
| 123 |
+
|
| 124 |
+
### Technical Limitations
|
| 125 |
+
1. **Simplified Feature Set**: Only 4 input features (real claims need many more)
|
| 126 |
+
2. **Synthetic Training Data**: Not validated on real insurance claims
|
| 127 |
+
3. **Fixed Decision Boundaries**: Cannot adapt to changing insurance standards
|
| 128 |
+
4. **No Contextual Understanding**: Cannot consider claim narratives or special circumstances
|
| 129 |
+
5. **Limited Claim Types**: Only handles 4 predefined claim types
|
| 130 |
+
6. **No Temporal Factors**: Doesn't account for claim timing or seasonal patterns
|
| 131 |
+
|
| 132 |
+
### Governance Limitations
|
| 133 |
+
1. **No Autonomous Operation**: Must have human oversight for every prediction
|
| 134 |
+
2. **No Binding Authority**: All outputs are advisory suggestions only
|
| 135 |
+
3. **No Regulatory Approval**: Not certified by insurance regulators
|
| 136 |
+
4. **Demonstration Quality**: Not built to production standards
|
| 137 |
+
5. **No Safety Guarantees**: Errors and mistakes are expected
|
| 138 |
+
|
| 139 |
+
### Ethical Limitations
|
| 140 |
+
1. **Bias Unknown**: Not tested for fairness across demographic groups
|
| 141 |
+
2. **Explainability Gaps**: Feature importance doesn't capture all reasoning
|
| 142 |
+
3. **No Accountability**: Model cannot be held responsible for decisions
|
| 143 |
+
4. **Limited Transparency**: Internal tree structure can be complex
|
| 144 |
+
5. **No Appeal Process**: No mechanism for disputing model suggestions
|
| 145 |
+
|
| 146 |
+
### Operational Limitations
|
| 147 |
+
1. **Single Model**: No ensemble or backup systems
|
| 148 |
+
2. **No Online Learning**: Cannot improve from new data without retraining
|
| 149 |
+
3. **No A/B Testing**: Not designed for production experimentation
|
| 150 |
+
4. **Limited Monitoring**: Basic evaluation only, no production monitoring
|
| 151 |
+
5. **No SLA Guarantees**: Performance and availability not guaranteed
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
+
## Human-in-the-Loop Requirements
|
| 156 |
+
|
| 157 |
+
### MANDATORY Human Oversight
|
| 158 |
+
🔴 **CRITICAL**: This system CANNOT and MUST NOT operate without human supervision.
|
| 159 |
+
|
| 160 |
+
### Human Responsibilities
|
| 161 |
+
1. **Review Every Prediction**: Human must independently evaluate each claim
|
| 162 |
+
2. **Exercise Independent Judgment**: Do not blindly accept model suggestions
|
| 163 |
+
3. **Confirm or Override**: Human decides whether to accept or reject advisory
|
| 164 |
+
4. **Document Rationale**: Human must explain reasoning for final decision
|
| 165 |
+
5. **Maintain Audit Trail**: All decisions and rationales must be logged
|
| 166 |
+
|
| 167 |
+
### Enforcement Mechanisms
|
| 168 |
+
- System outputs clearly marked as "ADVISORY ONLY"
|
| 169 |
+
- No automatic actions taken based on model predictions
|
| 170 |
+
- Human confirmation required before any decision is finalized
|
| 171 |
+
- Override capability provided without restrictions
|
| 172 |
+
- All human decisions logged with timestamps and rationale
|
| 173 |
+
|
| 174 |
+
### Human Authority
|
| 175 |
+
✅ Human decision-maker has **FULL AUTHORITY** to:
|
| 176 |
+
- Accept model suggestions
|
| 177 |
+
- Override model suggestions
|
| 178 |
+
- Request additional information
|
| 179 |
+
- Escalate complex cases
|
| 180 |
+
- Apply contextual judgment
|
| 181 |
+
|
| 182 |
+
The model is a **tool to assist humans**, not a replacement for human expertise.
|
| 183 |
+
|
| 184 |
+
---
|
| 185 |
+
|
| 186 |
+
## Explainability and Transparency
|
| 187 |
+
|
| 188 |
+
### Explainability Features
|
| 189 |
+
1. **Feature Importance**: Shows which factors influenced each prediction
|
| 190 |
+
2. **Rule Signals**: Human-readable explanation of triggered decision rules
|
| 191 |
+
3. **Confidence Scores**: Quantifies model certainty for each prediction
|
| 192 |
+
4. **Uncertainty Assessment**: Identifies predictions requiring extra scrutiny
|
| 193 |
+
5. **Decision Boundaries**: Fixed thresholds documented and transparent
|
| 194 |
+
|
| 195 |
+
### Transparency Measures
|
| 196 |
+
- All code is open source and reviewable
|
| 197 |
+
- Decision logic based on documented rules (decision_spec.yaml)
|
| 198 |
+
- Model architecture is classical ML (not black-box deep learning)
|
| 199 |
+
- Training process fully documented
|
| 200 |
+
- Evaluation metrics publicly available
|
| 201 |
+
|
| 202 |
+
### Limitations of Explainability
|
| 203 |
+
- Feature importance is global, not always case-specific
|
| 204 |
+
- Tree ensemble decisions can be complex to trace
|
| 205 |
+
- Interactions between features may not be obvious
|
| 206 |
+
- Confidence scores can be miscalibrated
|
| 207 |
+
- Uncertainty measures are estimates, not guarantees
|
| 208 |
+
|
| 209 |
+
---
|
| 210 |
+
|
| 211 |
+
## Ethical Considerations
|
| 212 |
+
|
| 213 |
+
### Transparency Commitment
|
| 214 |
+
✓ **No Hidden Logic**: All decision rules are documented and accessible
|
| 215 |
+
✓ **Explicit Uncertainty**: Model communicates when it's uncertain
|
| 216 |
+
✓ **Human Authority**: Human judgment is preserved and required
|
| 217 |
+
✓ **Open Source**: Code and methodology are publicly reviewable
|
| 218 |
+
|
| 219 |
+
### Accountability Framework
|
| 220 |
+
✓ **Human Decision-Maker**: Identified in audit trail for every decision
|
| 221 |
+
✓ **Rationale Required**: Human must document reasoning
|
| 222 |
+
✓ **Clear Ownership**: Human owns the decision, not the model
|
| 223 |
+
✓ **Audit Trail**: Complete record of all decisions maintained
|
| 224 |
+
|
| 225 |
+
### Safety Measures
|
| 226 |
+
✓ **No Autonomous Operation**: System cannot act independently
|
| 227 |
+
✓ **Fail-Safe Defaults**: Errors result in human review, not automatic rejection
|
| 228 |
+
✓ **Explicit Constraints**: System capabilities clearly bounded
|
| 229 |
+
✓ **Override Always Available**: Human can always override suggestions
|
| 230 |
+
|
| 231 |
+
### Fairness Considerations
|
| 232 |
+
⚠ **Bias Testing Not Performed**: Model not evaluated for demographic fairness
|
| 233 |
+
⚠ **Synthetic Data Only**: May not reflect real-world population distributions
|
| 234 |
+
⚠ **Simplified Features**: May miss important fairness-relevant factors
|
| 235 |
+
⚠ **Human Bias Possible**: Human decision-maker may introduce biases
|
| 236 |
+
|
| 237 |
+
**Recommendation**: Any deployment should include fairness auditing and bias testing appropriate to the specific use case.
|
| 238 |
+
|
| 239 |
+
---
|
| 240 |
+
|
| 241 |
+
## Technical Specifications
|
| 242 |
+
|
| 243 |
+
### Environment Requirements
|
| 244 |
+
- **Python Version**: 3.11 or higher
|
| 245 |
+
- **Dependencies**: See requirements.txt
|
| 246 |
+
- scikit-learn >= 1.3.0
|
| 247 |
+
- xgboost >= 2.0.0
|
| 248 |
+
- pandas >= 2.0.0
|
| 249 |
+
- numpy >= 1.24.0
|
| 250 |
+
- shap >= 0.42.0
|
| 251 |
+
- joblib >= 1.3.0
|
| 252 |
+
|
| 253 |
+
### Model Artifacts
|
| 254 |
+
- **Model File**: model.pkl (joblib serialized XGBoost model)
|
| 255 |
+
- **Encoders**: encoders.pkl (label encoders for categorical features)
|
| 256 |
+
- **Metadata**: model_metadata.json (training information and metrics)
|
| 257 |
+
- **Configuration**: decision_spec.yaml (frozen decision boundaries)
|
| 258 |
+
|
| 259 |
+
### Input Specification
|
| 260 |
+
```python
|
| 261 |
+
{
|
| 262 |
+
'claim_type': str, # "Auto", "Property", "Health", or "Liability"
|
| 263 |
+
'damage_amount': float, # USD amount (non-negative)
|
| 264 |
+
'injury_involved': bool, # True or False
|
| 265 |
+
'risk_factor': str # "low", "medium", or "high"
|
| 266 |
+
}
|
| 267 |
+
```
|
| 268 |
+
|
| 269 |
+
### Output Specification
|
| 270 |
+
```python
|
| 271 |
+
{
|
| 272 |
+
'model_suggestion': str, # e.g., "High Severity (Advisory)"
|
| 273 |
+
'confidence_score': float, # 0.0 to 1.0
|
| 274 |
+
'feature_importance': dict, # Feature contributions
|
| 275 |
+
'rule_signals': list, # Human-readable explanations
|
| 276 |
+
'uncertainty_assessment': dict, # Uncertainty level and metrics
|
| 277 |
+
'governance_status': str, # "ADVISORY ONLY"
|
| 278 |
+
'requires_human_review': bool # Always True
|
| 279 |
+
}
|
| 280 |
+
```
|
| 281 |
+
|
| 282 |
+
### Usage Example
|
| 283 |
+
```python
|
| 284 |
+
from predict import predict_claim
|
| 285 |
+
|
| 286 |
+
result = predict_claim(
|
| 287 |
+
claim_type="Auto",
|
| 288 |
+
damage_amount=15000.0,
|
| 289 |
+
injury_involved=True,
|
| 290 |
+
risk_factor="medium"
|
| 291 |
+
)
|
| 292 |
+
|
| 293 |
+
print(f"Advisory Suggestion: {result['model_suggestion']}")
|
| 294 |
+
print(f"Confidence: {result['confidence_score']:.2%}")
|
| 295 |
+
print(f"Human Review Required: {result['requires_human_review']}")
|
| 296 |
+
```
|
| 297 |
+
|
| 298 |
+
---
|
| 299 |
+
|
| 300 |
+
## Maintenance and Updates
|
| 301 |
+
|
| 302 |
+
### Version History
|
| 303 |
+
- **v1.0.0** (2026-01-04): Initial release
|
| 304 |
+
- XGBoost classifier trained on synthetic dataset
|
| 305 |
+
- Advisory-only governance framework
|
| 306 |
+
- Human-in-the-loop enforcement
|
| 307 |
+
- Feature importance and uncertainty quantification
|
| 308 |
+
|
| 309 |
+
### Update Policy
|
| 310 |
+
- Model frozen for demonstration purposes
|
| 311 |
+
- Retraining requires explicit approval
|
| 312 |
+
- Decision boundaries cannot be modified
|
| 313 |
+
- Governance constraints are immutable
|
| 314 |
+
|
| 315 |
+
### Contact and Support
|
| 316 |
+
This is a demonstration model for the BDR Agent Factory governance framework.
|
| 317 |
+
For questions about governance principles or implementation:
|
| 318 |
+
- Review the decision_spec.yaml file
|
| 319 |
+
- Consult the QODER_EXECUTION_BRIEF.md
|
| 320 |
+
- Refer to project documentation
|
| 321 |
+
|
| 322 |
+
---
|
| 323 |
+
|
| 324 |
+
## Governance Compliance Summary
|
| 325 |
+
|
| 326 |
+
### ✅ Compliance Verified
|
| 327 |
+
- [x] Classical ML only (no LLMs, no neural networks)
|
| 328 |
+
- [x] Advisory-only outputs (no autonomous decisions)
|
| 329 |
+
- [x] Human review required for all predictions
|
| 330 |
+
- [x] Only allowed features used (4 features as specified)
|
| 331 |
+
- [x] Decision boundaries documented and frozen
|
| 332 |
+
- [x] Explainability artifacts generated
|
| 333 |
+
- [x] Uncertainty quantification provided
|
| 334 |
+
- [x] Audit trail support implemented
|
| 335 |
+
- [x] Override capability enabled
|
| 336 |
+
- [x] Limitations clearly documented
|
| 337 |
+
|
| 338 |
+
### Governance Framework
|
| 339 |
+
This model operates under the **BDR Agent Factory** governance framework:
|
| 340 |
+
- **No autonomous actions**: System cannot take actions without human approval
|
| 341 |
+
- **Transparency**: All logic is explainable and auditable
|
| 342 |
+
- **Human authority**: Human has final decision-making power
|
| 343 |
+
- **Accountability**: Human decision-maker is logged and responsible
|
| 344 |
+
- **Safety**: System designed with fail-safe constraints
|
| 345 |
+
|
| 346 |
+
---
|
| 347 |
+
|
| 348 |
+
## License and Disclaimer
|
| 349 |
+
|
| 350 |
+
### License
|
| 351 |
+
This model and associated code are provided for educational and research purposes.
|
| 352 |
+
Suggested License: Apache 2.0 or MIT (specify as appropriate for your use case)
|
| 353 |
+
|
| 354 |
+
### Disclaimer
|
| 355 |
+
**THIS MODEL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.**
|
| 356 |
+
|
| 357 |
+
⚠ **IMPORTANT DISCLAIMERS**:
|
| 358 |
+
1. **No Production Use**: This model is for demonstration and education only
|
| 359 |
+
2. **No Accuracy Guarantees**: Performance on real-world data is unknown
|
| 360 |
+
3. **No Regulatory Approval**: Not certified for insurance operations
|
| 361 |
+
4. **No Liability Coverage**: Use at your own risk
|
| 362 |
+
5. **Human Oversight Required**: Must not operate autonomously
|
| 363 |
+
6. **Synthetic Data Only**: Not validated on real insurance claims
|
| 364 |
+
7. **Educational Purpose**: Designed for learning, not production deployment
|
| 365 |
+
|
| 366 |
+
### Responsible Use
|
| 367 |
+
Users of this model are responsible for:
|
| 368 |
+
- Ensuring appropriate human oversight
|
| 369 |
+
- Complying with applicable regulations
|
| 370 |
+
- Conducting their own validation and testing
|
| 371 |
+
- Not deploying in high-stakes scenarios without proper safeguards
|
| 372 |
+
- Maintaining audit trails and accountability
|
| 373 |
+
|
| 374 |
+
---
|
| 375 |
+
|
| 376 |
+
## Conclusion
|
| 377 |
+
|
| 378 |
+
This model demonstrates how classical machine learning can be deployed under strict governance constraints to provide **advisory decision support** while preserving human authority and accountability.
|
| 379 |
+
|
| 380 |
+
**Key Takeaways**:
|
| 381 |
+
✓ Advisory suggestions, not autonomous decisions
|
| 382 |
+
✓ Human-in-the-loop is mandatory
|
| 383 |
+
✓ Transparency and explainability built-in
|
| 384 |
+
✓ Clear documentation of limitations
|
| 385 |
+
✓ Designed for education, not production
|
| 386 |
+
|
| 387 |
+
**Remember**: This is a tool to **assist humans**, not replace them. The final decision authority always rests with qualified human professionals.
|
| 388 |
+
|
| 389 |
+
---
|
| 390 |
+
|
| 391 |
+
**Model Card Version**: 1.0.0
|
| 392 |
+
**Last Reviewed**: 2026-01-04
|
| 393 |
+
**Next Review**: Required before any production consideration (not currently approved)
|
decision_spec.yaml
ADDED
|
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Insurance Decision Specification
|
| 2 |
+
# Extracted from DecisionBoundaryDemo implementation
|
| 3 |
+
# This specification defines the governance constraints for the insurance decision support system
|
| 4 |
+
|
| 5 |
+
version: "1.0.0"
|
| 6 |
+
name: "Insurance Claims Decision Support System"
|
| 7 |
+
last_updated: "2026-01-04"
|
| 8 |
+
|
| 9 |
+
# GOVERNANCE CONSTRAINTS
|
| 10 |
+
governance:
|
| 11 |
+
# CRITICAL: Auto-action must be disabled
|
| 12 |
+
auto_action: false
|
| 13 |
+
|
| 14 |
+
# CRITICAL: Human review is mandatory
|
| 15 |
+
human_review_required: true
|
| 16 |
+
|
| 17 |
+
# System type: advisory only, non-autonomous
|
| 18 |
+
system_type: "advisory"
|
| 19 |
+
|
| 20 |
+
# Decision authority
|
| 21 |
+
decision_authority: "human"
|
| 22 |
+
|
| 23 |
+
# Autonomous operation
|
| 24 |
+
autonomous_operation: false
|
| 25 |
+
|
| 26 |
+
# DECISION OUTPUTS
|
| 27 |
+
decision_outputs:
|
| 28 |
+
# All outputs are advisory only
|
| 29 |
+
type: "advisory"
|
| 30 |
+
|
| 31 |
+
# No binding decisions
|
| 32 |
+
binding: false
|
| 33 |
+
|
| 34 |
+
# Outputs provided
|
| 35 |
+
outputs:
|
| 36 |
+
- rule_signals
|
| 37 |
+
- model_suggestion
|
| 38 |
+
- uncertainty_level
|
| 39 |
+
- explanation
|
| 40 |
+
- score
|
| 41 |
+
|
| 42 |
+
# All suggestions require human confirmation
|
| 43 |
+
requires_human_confirmation: true
|
| 44 |
+
|
| 45 |
+
# MODEL SPECIFICATION
|
| 46 |
+
model:
|
| 47 |
+
type: "rule-based"
|
| 48 |
+
architecture: "deterministic_heuristic"
|
| 49 |
+
training: "none"
|
| 50 |
+
|
| 51 |
+
# Model constraints
|
| 52 |
+
constraints:
|
| 53 |
+
- "Classical ML only (logistic regression, tree-based)"
|
| 54 |
+
- "No LLMs"
|
| 55 |
+
- "No reinforcement learning"
|
| 56 |
+
- "No automated decisions"
|
| 57 |
+
|
| 58 |
+
# Explainability
|
| 59 |
+
explainability:
|
| 60 |
+
required: true
|
| 61 |
+
methods:
|
| 62 |
+
- "rule_signals"
|
| 63 |
+
- "feature_importance"
|
| 64 |
+
- "confidence_scores"
|
| 65 |
+
|
| 66 |
+
# DECISION BOUNDARIES
|
| 67 |
+
decision_boundaries:
|
| 68 |
+
damage_thresholds:
|
| 69 |
+
low: 5000
|
| 70 |
+
medium: 15000
|
| 71 |
+
high: 50000
|
| 72 |
+
|
| 73 |
+
risk_weights:
|
| 74 |
+
low: 1.0
|
| 75 |
+
medium: 1.5
|
| 76 |
+
high: 2.0
|
| 77 |
+
|
| 78 |
+
injury_multiplier: 1.8
|
| 79 |
+
|
| 80 |
+
severity_thresholds:
|
| 81 |
+
low: 5
|
| 82 |
+
medium: 15
|
| 83 |
+
|
| 84 |
+
# INPUT FEATURES
|
| 85 |
+
input_features:
|
| 86 |
+
- name: "claim_type"
|
| 87 |
+
type: "categorical"
|
| 88 |
+
values: ["Auto", "Property", "Health", "Liability"]
|
| 89 |
+
required: true
|
| 90 |
+
|
| 91 |
+
- name: "damage_amount"
|
| 92 |
+
type: "numeric"
|
| 93 |
+
unit: "USD"
|
| 94 |
+
required: true
|
| 95 |
+
|
| 96 |
+
- name: "injury_involved"
|
| 97 |
+
type: "boolean"
|
| 98 |
+
required: true
|
| 99 |
+
|
| 100 |
+
- name: "risk_factor"
|
| 101 |
+
type: "categorical"
|
| 102 |
+
values: ["low", "medium", "high"]
|
| 103 |
+
required: true
|
| 104 |
+
|
| 105 |
+
# HUMAN-IN-THE-LOOP REQUIREMENTS
|
| 106 |
+
human_in_the_loop:
|
| 107 |
+
mandatory: true
|
| 108 |
+
|
| 109 |
+
requirements:
|
| 110 |
+
- "Human must review all model suggestions"
|
| 111 |
+
- "Human must provide independent judgment"
|
| 112 |
+
- "Human must confirm final decision"
|
| 113 |
+
- "Human must document rationale"
|
| 114 |
+
|
| 115 |
+
enforcement:
|
| 116 |
+
- "No decision finalized without human_confirms=True"
|
| 117 |
+
- "Human must provide non-empty override_reason"
|
| 118 |
+
- "System blocks autonomous operation"
|
| 119 |
+
- "All confirmations logged in audit trail"
|
| 120 |
+
|
| 121 |
+
# AUDIT AND COMPLIANCE
|
| 122 |
+
audit:
|
| 123 |
+
required: true
|
| 124 |
+
|
| 125 |
+
logged_items:
|
| 126 |
+
- "All inputs"
|
| 127 |
+
- "All model outputs"
|
| 128 |
+
- "Human decisions"
|
| 129 |
+
- "Human rationale"
|
| 130 |
+
- "Timestamps"
|
| 131 |
+
- "Decision-maker identity"
|
| 132 |
+
|
| 133 |
+
transparency:
|
| 134 |
+
- "All decision logic is open source"
|
| 135 |
+
- "Explanations provided for every decision"
|
| 136 |
+
- "Governance constraints are explicit"
|
| 137 |
+
- "Audit trail is complete and accessible"
|
| 138 |
+
|
| 139 |
+
# LIMITATIONS
|
| 140 |
+
limitations:
|
| 141 |
+
- "Demonstration system only"
|
| 142 |
+
- "Uses synthetic/generic data"
|
| 143 |
+
- "Not for production use"
|
| 144 |
+
- "No accuracy or performance claims"
|
| 145 |
+
- "Simplified decision rules"
|
| 146 |
+
- "No regulatory approval"
|
| 147 |
+
- "No real-world validation"
|
| 148 |
+
|
| 149 |
+
# ETHICAL CONSIDERATIONS
|
| 150 |
+
ethics:
|
| 151 |
+
transparency:
|
| 152 |
+
- "No hidden logic or black box decisions"
|
| 153 |
+
- "Uncertainty explicitly communicated"
|
| 154 |
+
- "Human judgment preserved and required"
|
| 155 |
+
|
| 156 |
+
accountability:
|
| 157 |
+
- "Human decision-maker identified in audit trail"
|
| 158 |
+
- "Rationale required and logged"
|
| 159 |
+
- "Decision ownership is clear"
|
| 160 |
+
|
| 161 |
+
safety:
|
| 162 |
+
- "System cannot operate autonomously"
|
| 163 |
+
- "Fail-safe defaults (reject on error)"
|
| 164 |
+
- "Explicit capability constraints"
|
| 165 |
+
|
| 166 |
+
# DATASET REFERENCE
|
| 167 |
+
dataset:
|
| 168 |
+
name: "BDR-AI/insurance_decision_boundaries_v1"
|
| 169 |
+
platform: "Hugging Face"
|
| 170 |
+
type: "synthetic"
|
| 171 |
+
purpose: "demonstration"
|
| 172 |
+
|
| 173 |
+
# DEPLOYMENT CONSTRAINTS
|
| 174 |
+
deployment:
|
| 175 |
+
mode: "reference_implementation"
|
| 176 |
+
quality: "educational_institutional"
|
| 177 |
+
production_ready: false
|
| 178 |
+
|
| 179 |
+
allowed_actions:
|
| 180 |
+
- "READ existing Hugging Face dataset"
|
| 181 |
+
- "TRAIN classical ML baseline model"
|
| 182 |
+
- "GENERATE model_card.md"
|
| 183 |
+
- "EXPOSE confidence scores and feature importance"
|
| 184 |
+
|
| 185 |
+
prohibited_actions:
|
| 186 |
+
- "Modify decision logic or thresholds"
|
| 187 |
+
- "Add new features beyond documented inputs"
|
| 188 |
+
- "Implement autonomous actions"
|
| 189 |
+
- "Deploy or publish without approval"
|
evaluate.py
ADDED
|
@@ -0,0 +1,410 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Evaluate Classical ML Model for Insurance Claims Decision Support
|
| 3 |
+
=================================================================
|
| 4 |
+
|
| 5 |
+
GOVERNANCE CONSTRAINTS:
|
| 6 |
+
- Advisory system only (NO autonomous decisions)
|
| 7 |
+
- Human-in-the-loop is MANDATORY
|
| 8 |
+
- All outputs are NON-BINDING suggestions
|
| 9 |
+
- Evaluate confidence calibration and uncertainty quantification
|
| 10 |
+
|
| 11 |
+
Purpose: Comprehensive evaluation of trained model
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import pandas as pd
|
| 15 |
+
import numpy as np
|
| 16 |
+
import joblib
|
| 17 |
+
import json
|
| 18 |
+
from datasets import load_dataset
|
| 19 |
+
from sklearn.model_selection import train_test_split
|
| 20 |
+
from sklearn.metrics import (
|
| 21 |
+
classification_report,
|
| 22 |
+
accuracy_score,
|
| 23 |
+
precision_recall_fscore_support,
|
| 24 |
+
confusion_matrix,
|
| 25 |
+
log_loss
|
| 26 |
+
)
|
| 27 |
+
from sklearn.preprocessing import LabelEncoder
|
| 28 |
+
|
| 29 |
+
def load_test_data():
|
| 30 |
+
"""
|
| 31 |
+
Load test data (same split as training).
|
| 32 |
+
"""
|
| 33 |
+
print("=" * 70)
|
| 34 |
+
print("LOADING TEST DATA")
|
| 35 |
+
print("=" * 70)
|
| 36 |
+
|
| 37 |
+
# Load dataset
|
| 38 |
+
dataset = load_dataset("BDR-AI/insurance_decision_boundaries_v1")
|
| 39 |
+
df = pd.DataFrame(dataset['train'])
|
| 40 |
+
|
| 41 |
+
# Load encoders
|
| 42 |
+
encoders = joblib.load('encoders.pkl')
|
| 43 |
+
|
| 44 |
+
# Prepare features
|
| 45 |
+
allowed_features = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 46 |
+
X = df[allowed_features].copy()
|
| 47 |
+
y = df['severity']
|
| 48 |
+
|
| 49 |
+
# Encode features
|
| 50 |
+
X['claim_type_encoded'] = encoders['claim_type'].transform(X['claim_type'])
|
| 51 |
+
X['risk_factor_encoded'] = encoders['risk_factor'].transform(X['risk_factor'])
|
| 52 |
+
X['injury_involved_encoded'] = X['injury_involved'].astype(int)
|
| 53 |
+
|
| 54 |
+
X_processed = X[['claim_type_encoded', 'damage_amount', 'injury_involved_encoded', 'risk_factor_encoded']].copy()
|
| 55 |
+
X_processed.columns = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 56 |
+
|
| 57 |
+
# Encode target
|
| 58 |
+
y_encoded = encoders['target'].transform(y)
|
| 59 |
+
|
| 60 |
+
# Use same split as training
|
| 61 |
+
_, X_test, _, y_test = train_test_split(
|
| 62 |
+
X_processed, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
print(f"✓ Test set loaded: {len(X_test)} samples")
|
| 66 |
+
|
| 67 |
+
return X_test, y_test, encoders
|
| 68 |
+
|
| 69 |
+
def evaluate_classification_performance(model, X_test, y_test, encoders):
|
| 70 |
+
"""
|
| 71 |
+
Evaluate classification metrics.
|
| 72 |
+
"""
|
| 73 |
+
print(f"\n{'='*70}")
|
| 74 |
+
print("CLASSIFICATION PERFORMANCE EVALUATION")
|
| 75 |
+
print(f"{'='*70}")
|
| 76 |
+
|
| 77 |
+
# Make predictions
|
| 78 |
+
y_pred = model.predict(X_test)
|
| 79 |
+
y_pred_proba = model.predict_proba(X_test)
|
| 80 |
+
|
| 81 |
+
# Get class names
|
| 82 |
+
target_names = encoders['target'].classes_
|
| 83 |
+
|
| 84 |
+
# Overall accuracy
|
| 85 |
+
accuracy = accuracy_score(y_test, y_pred)
|
| 86 |
+
print(f"\nOverall Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
|
| 87 |
+
|
| 88 |
+
# Detailed classification report
|
| 89 |
+
print(f"\n{'='*70}")
|
| 90 |
+
print("DETAILED CLASSIFICATION REPORT")
|
| 91 |
+
print(f"{'='*70}")
|
| 92 |
+
report = classification_report(y_test, y_pred, target_names=target_names, digits=4)
|
| 93 |
+
print(report)
|
| 94 |
+
report_dict = classification_report(y_test, y_pred, target_names=target_names, output_dict=True)
|
| 95 |
+
|
| 96 |
+
# Per-class metrics
|
| 97 |
+
precision, recall, f1, support = precision_recall_fscore_support(y_test, y_pred, average=None)
|
| 98 |
+
|
| 99 |
+
print(f"{'='*70}")
|
| 100 |
+
print("PER-CLASS METRICS (Advisory Severity Levels)")
|
| 101 |
+
print(f"{'='*70}")
|
| 102 |
+
print(f"{'Class':<15} {'Precision':<12} {'Recall':<12} {'F1-Score':<12} {'Support':<10}")
|
| 103 |
+
print("-" * 70)
|
| 104 |
+
for i, class_name in enumerate(target_names):
|
| 105 |
+
print(f"{class_name:<15} {precision[i]:<12.4f} {recall[i]:<12.4f} {f1[i]:<12.4f} {support[i]:<10}")
|
| 106 |
+
|
| 107 |
+
# Confusion matrix
|
| 108 |
+
cm = confusion_matrix(y_test, y_pred)
|
| 109 |
+
print(f"\n{'='*70}")
|
| 110 |
+
print("CONFUSION MATRIX")
|
| 111 |
+
print(f"{'='*70}")
|
| 112 |
+
print(f" Predicted")
|
| 113 |
+
print(f" {' '.join([f'{name:8s}' for name in target_names])}")
|
| 114 |
+
for i, label in enumerate(target_names):
|
| 115 |
+
values = ' '.join([f'{cm[i][j]:8d}' for j in range(len(target_names))])
|
| 116 |
+
print(f"Actual {label:8s} {values}")
|
| 117 |
+
|
| 118 |
+
# Calculate log loss (confidence calibration indicator)
|
| 119 |
+
logloss = log_loss(y_test, y_pred_proba)
|
| 120 |
+
print(f"\n{'='*70}")
|
| 121 |
+
print("CONFIDENCE CALIBRATION")
|
| 122 |
+
print(f"{'='*70}")
|
| 123 |
+
print(f"Log Loss: {logloss:.4f}")
|
| 124 |
+
print("(Lower is better - indicates better calibrated confidence scores)")
|
| 125 |
+
|
| 126 |
+
return {
|
| 127 |
+
'accuracy': accuracy,
|
| 128 |
+
'precision': precision.tolist(),
|
| 129 |
+
'recall': recall.tolist(),
|
| 130 |
+
'f1_score': f1.tolist(),
|
| 131 |
+
'support': support.tolist(),
|
| 132 |
+
'confusion_matrix': cm.tolist(),
|
| 133 |
+
'log_loss': logloss,
|
| 134 |
+
'classification_report': report_dict
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
def evaluate_confidence_distribution(model, X_test, y_test, encoders):
|
| 138 |
+
"""
|
| 139 |
+
Analyze confidence score distribution.
|
| 140 |
+
"""
|
| 141 |
+
print(f"\n{'='*70}")
|
| 142 |
+
print("CONFIDENCE SCORE DISTRIBUTION ANALYSIS")
|
| 143 |
+
print(f"{'='*70}")
|
| 144 |
+
|
| 145 |
+
y_pred_proba = model.predict_proba(X_test)
|
| 146 |
+
y_pred = model.predict(X_test)
|
| 147 |
+
|
| 148 |
+
# Get max confidence for each prediction
|
| 149 |
+
max_confidence = np.max(y_pred_proba, axis=1)
|
| 150 |
+
|
| 151 |
+
print(f"\nConfidence Statistics:")
|
| 152 |
+
print(f" Mean confidence: {np.mean(max_confidence):.4f}")
|
| 153 |
+
print(f" Median confidence: {np.median(max_confidence):.4f}")
|
| 154 |
+
print(f" Min confidence: {np.min(max_confidence):.4f}")
|
| 155 |
+
print(f" Max confidence: {np.max(max_confidence):.4f}")
|
| 156 |
+
print(f" Std deviation: {np.std(max_confidence):.4f}")
|
| 157 |
+
|
| 158 |
+
# Confidence distribution by bins
|
| 159 |
+
bins = [0.0, 0.5, 0.7, 0.8, 0.9, 1.0]
|
| 160 |
+
bin_labels = ['0.0-0.5', '0.5-0.7', '0.7-0.8', '0.8-0.9', '0.9-1.0']
|
| 161 |
+
|
| 162 |
+
print(f"\n{'='*70}")
|
| 163 |
+
print("CONFIDENCE DISTRIBUTION BY BINS")
|
| 164 |
+
print(f"{'='*70}")
|
| 165 |
+
print(f"{'Confidence Range':<20} {'Count':<10} {'Percentage':<12}")
|
| 166 |
+
print("-" * 70)
|
| 167 |
+
|
| 168 |
+
for i in range(len(bins)-1):
|
| 169 |
+
mask = (max_confidence >= bins[i]) & (max_confidence < bins[i+1])
|
| 170 |
+
if i == len(bins)-2: # Last bin includes 1.0
|
| 171 |
+
mask = (max_confidence >= bins[i]) & (max_confidence <= bins[i+1])
|
| 172 |
+
count = np.sum(mask)
|
| 173 |
+
percentage = (count / len(max_confidence)) * 100
|
| 174 |
+
print(f"{bin_labels[i]:<20} {count:<10} {percentage:>6.2f}%")
|
| 175 |
+
|
| 176 |
+
# Accuracy by confidence level
|
| 177 |
+
print(f"\n{'='*70}")
|
| 178 |
+
print("ACCURACY BY CONFIDENCE LEVEL")
|
| 179 |
+
print(f"{'='*70}")
|
| 180 |
+
print(f"{'Confidence Range':<20} {'Accuracy':<12} {'Sample Count':<15}")
|
| 181 |
+
print("-" * 70)
|
| 182 |
+
|
| 183 |
+
for i in range(len(bins)-1):
|
| 184 |
+
mask = (max_confidence >= bins[i]) & (max_confidence < bins[i+1])
|
| 185 |
+
if i == len(bins)-2:
|
| 186 |
+
mask = (max_confidence >= bins[i]) & (max_confidence <= bins[i+1])
|
| 187 |
+
|
| 188 |
+
if np.sum(mask) > 0:
|
| 189 |
+
acc = accuracy_score(y_test[mask], y_pred[mask])
|
| 190 |
+
print(f"{bin_labels[i]:<20} {acc:<12.4f} {np.sum(mask):<15}")
|
| 191 |
+
|
| 192 |
+
return {
|
| 193 |
+
'mean_confidence': float(np.mean(max_confidence)),
|
| 194 |
+
'median_confidence': float(np.median(max_confidence)),
|
| 195 |
+
'min_confidence': float(np.min(max_confidence)),
|
| 196 |
+
'max_confidence': float(np.max(max_confidence)),
|
| 197 |
+
'std_confidence': float(np.std(max_confidence))
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
def evaluate_feature_importance(model, encoders):
|
| 201 |
+
"""
|
| 202 |
+
Analyze feature importance for explainability.
|
| 203 |
+
"""
|
| 204 |
+
print(f"\n{'='*70}")
|
| 205 |
+
print("FEATURE IMPORTANCE ANALYSIS (Explainability)")
|
| 206 |
+
print(f"{'='*70}")
|
| 207 |
+
|
| 208 |
+
feature_names = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 209 |
+
feature_importance = model.feature_importances_
|
| 210 |
+
|
| 211 |
+
# Sort by importance
|
| 212 |
+
importance_pairs = sorted(zip(feature_names, feature_importance), key=lambda x: x[1], reverse=True)
|
| 213 |
+
|
| 214 |
+
print(f"\n{'Feature':<20} {'Importance':<12} {'Relative %':<12}")
|
| 215 |
+
print("-" * 70)
|
| 216 |
+
|
| 217 |
+
total_importance = sum(feature_importance)
|
| 218 |
+
for name, importance in importance_pairs:
|
| 219 |
+
relative_pct = (importance / total_importance) * 100
|
| 220 |
+
print(f"{name:<20} {importance:<12.4f} {relative_pct:>6.2f}%")
|
| 221 |
+
|
| 222 |
+
print(f"\n{'='*70}")
|
| 223 |
+
print("FEATURE IMPORTANCE INTERPRETATION")
|
| 224 |
+
print(f"{'='*70}")
|
| 225 |
+
print("Higher importance = Greater influence on advisory predictions")
|
| 226 |
+
print("This helps humans understand which factors drive the model's suggestions")
|
| 227 |
+
|
| 228 |
+
return dict(zip(feature_names, feature_importance.tolist()))
|
| 229 |
+
|
| 230 |
+
def evaluate_uncertainty_quantification(model, X_test, encoders):
|
| 231 |
+
"""
|
| 232 |
+
Evaluate uncertainty quantification quality.
|
| 233 |
+
"""
|
| 234 |
+
print(f"\n{'='*70}")
|
| 235 |
+
print("UNCERTAINTY QUANTIFICATION ASSESSMENT")
|
| 236 |
+
print(f"{'='*70}")
|
| 237 |
+
|
| 238 |
+
y_pred_proba = model.predict_proba(X_test)
|
| 239 |
+
|
| 240 |
+
# Calculate entropy as uncertainty measure
|
| 241 |
+
# Higher entropy = More uncertain
|
| 242 |
+
epsilon = 1e-10 # Avoid log(0)
|
| 243 |
+
entropy = -np.sum(y_pred_proba * np.log(y_pred_proba + epsilon), axis=1)
|
| 244 |
+
max_entropy = np.log(y_pred_proba.shape[1]) # Max entropy for uniform distribution
|
| 245 |
+
normalized_entropy = entropy / max_entropy
|
| 246 |
+
|
| 247 |
+
print(f"\nEntropy-based Uncertainty Statistics:")
|
| 248 |
+
print(f" Mean entropy: {np.mean(entropy):.4f}")
|
| 249 |
+
print(f" Mean normalized entropy: {np.mean(normalized_entropy):.4f}")
|
| 250 |
+
print(f" (0.0 = certain, 1.0 = maximum uncertainty)")
|
| 251 |
+
|
| 252 |
+
# Classify uncertainty levels
|
| 253 |
+
low_uncertainty = np.sum(normalized_entropy < 0.3)
|
| 254 |
+
medium_uncertainty = np.sum((normalized_entropy >= 0.3) & (normalized_entropy < 0.6))
|
| 255 |
+
high_uncertainty = np.sum(normalized_entropy >= 0.6)
|
| 256 |
+
|
| 257 |
+
print(f"\n{'='*70}")
|
| 258 |
+
print("UNCERTAINTY LEVEL DISTRIBUTION")
|
| 259 |
+
print(f"{'='*70}")
|
| 260 |
+
print(f"Low uncertainty (<0.3): {low_uncertainty:5d} ({low_uncertainty/len(entropy)*100:>5.1f}%)")
|
| 261 |
+
print(f"Medium uncertainty (0.3-0.6): {medium_uncertainty:5d} ({medium_uncertainty/len(entropy)*100:>5.1f}%)")
|
| 262 |
+
print(f"High uncertainty (≥0.6): {high_uncertainty:5d} ({high_uncertainty/len(entropy)*100:>5.1f}%)")
|
| 263 |
+
|
| 264 |
+
print(f"\n{'='*70}")
|
| 265 |
+
print("GOVERNANCE NOTE: Uncertainty Quantification")
|
| 266 |
+
print(f"{'='*70}")
|
| 267 |
+
print("⚠ High uncertainty predictions should receive EXTRA human scrutiny")
|
| 268 |
+
print("⚠ Human reviewers should prioritize cases with uncertainty ≥ 0.6")
|
| 269 |
+
print("⚠ All predictions require human confirmation regardless of confidence")
|
| 270 |
+
|
| 271 |
+
return {
|
| 272 |
+
'mean_entropy': float(np.mean(entropy)),
|
| 273 |
+
'mean_normalized_entropy': float(np.mean(normalized_entropy)),
|
| 274 |
+
'low_uncertainty_count': int(low_uncertainty),
|
| 275 |
+
'medium_uncertainty_count': int(medium_uncertainty),
|
| 276 |
+
'high_uncertainty_count': int(high_uncertainty)
|
| 277 |
+
}
|
| 278 |
+
|
| 279 |
+
def governance_compliance_check():
|
| 280 |
+
"""
|
| 281 |
+
Verify model complies with governance constraints.
|
| 282 |
+
"""
|
| 283 |
+
print(f"\n{'='*70}")
|
| 284 |
+
print("GOVERNANCE COMPLIANCE VERIFICATION")
|
| 285 |
+
print(f"{'='*70}")
|
| 286 |
+
|
| 287 |
+
# Load metadata
|
| 288 |
+
with open('model_metadata.json', 'r') as f:
|
| 289 |
+
metadata = json.load(f)
|
| 290 |
+
|
| 291 |
+
checks = []
|
| 292 |
+
|
| 293 |
+
# Check 1: Model type
|
| 294 |
+
model_type = metadata.get('model_type', '')
|
| 295 |
+
is_classical = 'XGBoost' in model_type or 'Random Forest' in model_type or 'Logistic' in model_type
|
| 296 |
+
checks.append(('Classical ML model (no neural networks)', is_classical))
|
| 297 |
+
|
| 298 |
+
# Check 2: Advisory status
|
| 299 |
+
is_advisory = metadata.get('governance_status', '').upper().find('ADVISORY') >= 0
|
| 300 |
+
checks.append(('Advisory-only system (no autonomous decisions)', is_advisory))
|
| 301 |
+
|
| 302 |
+
# Check 3: Human review required
|
| 303 |
+
human_required = metadata.get('human_review_required', False)
|
| 304 |
+
checks.append(('Human review required', human_required))
|
| 305 |
+
|
| 306 |
+
# Check 4: Correct features
|
| 307 |
+
features = metadata.get('features', [])
|
| 308 |
+
correct_features = set(features) == {'claim_type', 'damage_amount', 'injury_involved', 'risk_factor'}
|
| 309 |
+
checks.append(('Only allowed features used (4 features)', correct_features))
|
| 310 |
+
|
| 311 |
+
# Check 5: Frozen decision boundaries present
|
| 312 |
+
has_boundaries = 'decision_boundaries' in metadata
|
| 313 |
+
checks.append(('Decision boundaries documented', has_boundaries))
|
| 314 |
+
|
| 315 |
+
# Print results
|
| 316 |
+
all_passed = True
|
| 317 |
+
for check_name, passed in checks:
|
| 318 |
+
status = "✓ PASS" if passed else "✗ FAIL"
|
| 319 |
+
print(f"{status} {check_name}")
|
| 320 |
+
if not passed:
|
| 321 |
+
all_passed = False
|
| 322 |
+
|
| 323 |
+
print(f"\n{'='*70}")
|
| 324 |
+
if all_passed:
|
| 325 |
+
print("✓ ALL GOVERNANCE CHECKS PASSED")
|
| 326 |
+
else:
|
| 327 |
+
print("✗ GOVERNANCE VIOLATIONS DETECTED - REVIEW REQUIRED")
|
| 328 |
+
print(f"{'='*70}")
|
| 329 |
+
|
| 330 |
+
return all_passed
|
| 331 |
+
|
| 332 |
+
def save_evaluation_report(metrics):
|
| 333 |
+
"""
|
| 334 |
+
Save comprehensive evaluation report.
|
| 335 |
+
"""
|
| 336 |
+
print(f"\n{'='*70}")
|
| 337 |
+
print("SAVING EVALUATION REPORT")
|
| 338 |
+
print(f"{'='*70}")
|
| 339 |
+
|
| 340 |
+
with open('evaluation_report.json', 'w') as f:
|
| 341 |
+
json.dump(metrics, f, indent=2)
|
| 342 |
+
|
| 343 |
+
print("✓ Evaluation report saved to: evaluation_report.json")
|
| 344 |
+
|
| 345 |
+
def main():
|
| 346 |
+
"""
|
| 347 |
+
Main evaluation pipeline.
|
| 348 |
+
"""
|
| 349 |
+
print("\n" + "="*70)
|
| 350 |
+
print("INSURANCE DECISION SUPPORT MODEL - EVALUATION PIPELINE")
|
| 351 |
+
print("="*70)
|
| 352 |
+
print("Governance Mode: ADVISORY (Human-in-the-Loop Required)")
|
| 353 |
+
print("Purpose: Evaluate model performance and compliance")
|
| 354 |
+
print("="*70 + "\n")
|
| 355 |
+
|
| 356 |
+
# Load model
|
| 357 |
+
print("Loading trained model...")
|
| 358 |
+
model = joblib.load('model.pkl')
|
| 359 |
+
print("✓ Model loaded successfully\n")
|
| 360 |
+
|
| 361 |
+
# Load test data
|
| 362 |
+
X_test, y_test, encoders = load_test_data()
|
| 363 |
+
|
| 364 |
+
# Evaluate classification performance
|
| 365 |
+
classification_metrics = evaluate_classification_performance(model, X_test, y_test, encoders)
|
| 366 |
+
|
| 367 |
+
# Evaluate confidence distribution
|
| 368 |
+
confidence_metrics = evaluate_confidence_distribution(model, X_test, y_test, encoders)
|
| 369 |
+
|
| 370 |
+
# Evaluate feature importance
|
| 371 |
+
feature_importance = evaluate_feature_importance(model, encoders)
|
| 372 |
+
|
| 373 |
+
# Evaluate uncertainty quantification
|
| 374 |
+
uncertainty_metrics = evaluate_uncertainty_quantification(model, X_test, encoders)
|
| 375 |
+
|
| 376 |
+
# Governance compliance check
|
| 377 |
+
governance_passed = governance_compliance_check()
|
| 378 |
+
|
| 379 |
+
# Compile all metrics
|
| 380 |
+
evaluation_report = {
|
| 381 |
+
'evaluation_date': pd.Timestamp.now().isoformat(),
|
| 382 |
+
'model_file': 'model.pkl',
|
| 383 |
+
'test_samples': len(X_test),
|
| 384 |
+
'classification_metrics': classification_metrics,
|
| 385 |
+
'confidence_metrics': confidence_metrics,
|
| 386 |
+
'feature_importance': feature_importance,
|
| 387 |
+
'uncertainty_metrics': uncertainty_metrics,
|
| 388 |
+
'governance_compliance': governance_passed
|
| 389 |
+
}
|
| 390 |
+
|
| 391 |
+
# Save report
|
| 392 |
+
save_evaluation_report(evaluation_report)
|
| 393 |
+
|
| 394 |
+
print(f"\n{'='*70}")
|
| 395 |
+
print("EVALUATION COMPLETE")
|
| 396 |
+
print(f"{'='*70}")
|
| 397 |
+
print(f"✓ Test accuracy: {classification_metrics['accuracy']*100:.2f}%")
|
| 398 |
+
print(f"✓ Mean confidence: {confidence_metrics['mean_confidence']:.4f}")
|
| 399 |
+
print(f"✓ Governance compliance: {'PASSED' if governance_passed else 'FAILED'}")
|
| 400 |
+
print(f"✓ Report saved: evaluation_report.json")
|
| 401 |
+
print(f"\n{'='*70}")
|
| 402 |
+
print("GOVERNANCE REMINDER")
|
| 403 |
+
print(f"{'='*70}")
|
| 404 |
+
print("⚠ This model produces ADVISORY outputs only")
|
| 405 |
+
print("⚠ Human confirmation is MANDATORY for all decisions")
|
| 406 |
+
print("⚠ High uncertainty cases require EXTRA human scrutiny")
|
| 407 |
+
print(f"{'='*70}\n")
|
| 408 |
+
|
| 409 |
+
if __name__ == "__main__":
|
| 410 |
+
main()
|
predict.py
ADDED
|
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Make Advisory Predictions with Explainability
|
| 3 |
+
=============================================
|
| 4 |
+
|
| 5 |
+
GOVERNANCE CONSTRAINTS:
|
| 6 |
+
- Advisory system only (NO autonomous decisions)
|
| 7 |
+
- Human-in-the-loop is MANDATORY
|
| 8 |
+
- All outputs are NON-BINDING suggestions
|
| 9 |
+
- Full explainability required (confidence, feature importance, rule signals)
|
| 10 |
+
|
| 11 |
+
Purpose: Generate advisory predictions with complete transparency
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import numpy as np
|
| 15 |
+
import joblib
|
| 16 |
+
import json
|
| 17 |
+
import yaml
|
| 18 |
+
from datetime import datetime
|
| 19 |
+
|
| 20 |
+
# FROZEN DECISION BOUNDARIES - DO NOT MODIFY (from decision_spec.yaml)
|
| 21 |
+
DECISION_BOUNDARIES = {
|
| 22 |
+
'damage_thresholds': {
|
| 23 |
+
'low': 5000,
|
| 24 |
+
'medium': 15000,
|
| 25 |
+
'high': 50000
|
| 26 |
+
},
|
| 27 |
+
'risk_weights': {
|
| 28 |
+
'low': 1.0,
|
| 29 |
+
'medium': 1.5,
|
| 30 |
+
'high': 2.0
|
| 31 |
+
},
|
| 32 |
+
'injury_multiplier': 1.8,
|
| 33 |
+
'severity_thresholds': {
|
| 34 |
+
'low': 5,
|
| 35 |
+
'medium': 15
|
| 36 |
+
}
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
def load_model_artifacts():
|
| 40 |
+
"""
|
| 41 |
+
Load trained model and encoders.
|
| 42 |
+
"""
|
| 43 |
+
model = joblib.load('model.pkl')
|
| 44 |
+
encoders = joblib.load('encoders.pkl')
|
| 45 |
+
|
| 46 |
+
with open('model_metadata.json', 'r') as f:
|
| 47 |
+
metadata = json.load(f)
|
| 48 |
+
|
| 49 |
+
return model, encoders, metadata
|
| 50 |
+
|
| 51 |
+
def generate_rule_signals(claim_type, damage_amount, injury_involved, risk_factor):
|
| 52 |
+
"""
|
| 53 |
+
Generate human-readable rule signals based on frozen decision boundaries.
|
| 54 |
+
|
| 55 |
+
This provides transparent explanation of which rules are triggered.
|
| 56 |
+
"""
|
| 57 |
+
signals = []
|
| 58 |
+
|
| 59 |
+
# Damage threshold signals
|
| 60 |
+
if damage_amount < DECISION_BOUNDARIES['damage_thresholds']['low']:
|
| 61 |
+
signals.append(f"✓ Low damage (<${DECISION_BOUNDARIES['damage_thresholds']['low']:,}): ${damage_amount:,.2f}")
|
| 62 |
+
elif damage_amount < DECISION_BOUNDARIES['damage_thresholds']['medium']:
|
| 63 |
+
signals.append(f"⚠ Medium damage (${DECISION_BOUNDARIES['damage_thresholds']['low']:,}-${DECISION_BOUNDARIES['damage_thresholds']['medium']:,}): ${damage_amount:,.2f}")
|
| 64 |
+
elif damage_amount < DECISION_BOUNDARIES['damage_thresholds']['high']:
|
| 65 |
+
signals.append(f"⚠⚠ High damage (${DECISION_BOUNDARIES['damage_thresholds']['medium']:,}-${DECISION_BOUNDARIES['damage_thresholds']['high']:,}): ${damage_amount:,.2f}")
|
| 66 |
+
else:
|
| 67 |
+
signals.append(f"⚠⚠⚠ Very high damage (≥${DECISION_BOUNDARIES['damage_thresholds']['high']:,}): ${damage_amount:,.2f}")
|
| 68 |
+
|
| 69 |
+
# Injury signal
|
| 70 |
+
if injury_involved:
|
| 71 |
+
signals.append(f"⚠ Injury involved (multiplier: {DECISION_BOUNDARIES['injury_multiplier']}x)")
|
| 72 |
+
else:
|
| 73 |
+
signals.append(f"✓ No injury involved")
|
| 74 |
+
|
| 75 |
+
# Risk factor signal
|
| 76 |
+
risk_weight = DECISION_BOUNDARIES['risk_weights'][risk_factor.lower()]
|
| 77 |
+
if risk_factor.lower() == 'high':
|
| 78 |
+
signals.append(f"⚠⚠ High risk factor (weight: {risk_weight}x)")
|
| 79 |
+
elif risk_factor.lower() == 'medium':
|
| 80 |
+
signals.append(f"⚠ Medium risk factor (weight: {risk_weight}x)")
|
| 81 |
+
else:
|
| 82 |
+
signals.append(f"✓ Low risk factor (weight: {risk_weight}x)")
|
| 83 |
+
|
| 84 |
+
# Claim type signal
|
| 85 |
+
if claim_type == "Liability":
|
| 86 |
+
signals.append(f"⚠ Liability claim (additional multiplier applied)")
|
| 87 |
+
else:
|
| 88 |
+
signals.append(f"Claim type: {claim_type}")
|
| 89 |
+
|
| 90 |
+
return signals
|
| 91 |
+
|
| 92 |
+
def calculate_uncertainty(prediction_proba):
|
| 93 |
+
"""
|
| 94 |
+
Calculate prediction uncertainty using entropy.
|
| 95 |
+
|
| 96 |
+
Returns:
|
| 97 |
+
dict with uncertainty level and metrics
|
| 98 |
+
"""
|
| 99 |
+
# Calculate entropy
|
| 100 |
+
epsilon = 1e-10
|
| 101 |
+
entropy = -np.sum(prediction_proba * np.log(prediction_proba + epsilon))
|
| 102 |
+
max_entropy = np.log(len(prediction_proba))
|
| 103 |
+
normalized_entropy = entropy / max_entropy
|
| 104 |
+
|
| 105 |
+
# Determine uncertainty level
|
| 106 |
+
if normalized_entropy < 0.3:
|
| 107 |
+
level = "Low"
|
| 108 |
+
interpretation = "Model is confident in this prediction"
|
| 109 |
+
elif normalized_entropy < 0.6:
|
| 110 |
+
level = "Medium"
|
| 111 |
+
interpretation = "Model has moderate uncertainty - extra human scrutiny recommended"
|
| 112 |
+
else:
|
| 113 |
+
level = "High"
|
| 114 |
+
interpretation = "Model is uncertain - REQUIRES careful human review"
|
| 115 |
+
|
| 116 |
+
return {
|
| 117 |
+
'level': level,
|
| 118 |
+
'entropy': float(entropy),
|
| 119 |
+
'normalized_entropy': float(normalized_entropy),
|
| 120 |
+
'interpretation': interpretation,
|
| 121 |
+
'confidence_distribution': {
|
| 122 |
+
'Low': float(prediction_proba[0]),
|
| 123 |
+
'Medium': float(prediction_proba[1]) if len(prediction_proba) > 1 else 0.0,
|
| 124 |
+
'High': float(prediction_proba[2]) if len(prediction_proba) > 2 else 0.0
|
| 125 |
+
}
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
def get_feature_importance_for_prediction(model, feature_values):
|
| 129 |
+
"""
|
| 130 |
+
Get feature importance specific to this prediction.
|
| 131 |
+
|
| 132 |
+
Uses the model's global feature importance as a proxy.
|
| 133 |
+
For tree-based models, this represents which features were most influential.
|
| 134 |
+
"""
|
| 135 |
+
feature_names = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 136 |
+
global_importance = model.feature_importances_
|
| 137 |
+
|
| 138 |
+
# Create importance dictionary
|
| 139 |
+
importance_dict = {}
|
| 140 |
+
for name, importance, value in zip(feature_names, global_importance, feature_values):
|
| 141 |
+
importance_dict[name] = {
|
| 142 |
+
'importance_score': float(importance),
|
| 143 |
+
'value': value,
|
| 144 |
+
'relative_percentage': float(importance / np.sum(global_importance) * 100)
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
# Sort by importance
|
| 148 |
+
sorted_features = sorted(importance_dict.items(), key=lambda x: x[1]['importance_score'], reverse=True)
|
| 149 |
+
|
| 150 |
+
return dict(sorted_features)
|
| 151 |
+
|
| 152 |
+
def predict_claim(claim_type, damage_amount, injury_involved, risk_factor):
|
| 153 |
+
"""
|
| 154 |
+
Make advisory prediction for insurance claim.
|
| 155 |
+
|
| 156 |
+
Args:
|
| 157 |
+
claim_type: str - "Auto", "Property", "Health", or "Liability"
|
| 158 |
+
damage_amount: float - Damage amount in USD
|
| 159 |
+
injury_involved: bool - Whether injury is involved
|
| 160 |
+
risk_factor: str - "low", "medium", or "high"
|
| 161 |
+
|
| 162 |
+
Returns:
|
| 163 |
+
dict with complete advisory prediction and explainability
|
| 164 |
+
"""
|
| 165 |
+
# Load model artifacts
|
| 166 |
+
model, encoders, metadata = load_model_artifacts()
|
| 167 |
+
|
| 168 |
+
# Validate inputs
|
| 169 |
+
valid_claim_types = ['Auto', 'Property', 'Health', 'Liability']
|
| 170 |
+
valid_risk_factors = ['low', 'medium', 'high']
|
| 171 |
+
|
| 172 |
+
if claim_type not in valid_claim_types:
|
| 173 |
+
raise ValueError(f"Invalid claim_type. Must be one of: {valid_claim_types}")
|
| 174 |
+
|
| 175 |
+
if risk_factor not in valid_risk_factors:
|
| 176 |
+
raise ValueError(f"Invalid risk_factor. Must be one of: {valid_risk_factors}")
|
| 177 |
+
|
| 178 |
+
if damage_amount < 0:
|
| 179 |
+
raise ValueError("damage_amount must be non-negative")
|
| 180 |
+
|
| 181 |
+
# Encode inputs
|
| 182 |
+
claim_type_encoded = encoders['claim_type'].transform([claim_type])[0]
|
| 183 |
+
risk_factor_encoded = encoders['risk_factor'].transform([risk_factor])[0]
|
| 184 |
+
injury_involved_encoded = int(injury_involved)
|
| 185 |
+
|
| 186 |
+
# Create feature vector
|
| 187 |
+
features = np.array([[
|
| 188 |
+
claim_type_encoded,
|
| 189 |
+
damage_amount,
|
| 190 |
+
injury_involved_encoded,
|
| 191 |
+
risk_factor_encoded
|
| 192 |
+
]])
|
| 193 |
+
|
| 194 |
+
# Make prediction
|
| 195 |
+
prediction = model.predict(features)[0]
|
| 196 |
+
prediction_proba = model.predict_proba(features)[0]
|
| 197 |
+
|
| 198 |
+
# Get severity label
|
| 199 |
+
severity = encoders['target'].inverse_transform([prediction])[0]
|
| 200 |
+
confidence = float(np.max(prediction_proba))
|
| 201 |
+
|
| 202 |
+
# Generate explainability artifacts
|
| 203 |
+
rule_signals = generate_rule_signals(claim_type, damage_amount, injury_involved, risk_factor)
|
| 204 |
+
uncertainty = calculate_uncertainty(prediction_proba)
|
| 205 |
+
feature_importance = get_feature_importance_for_prediction(
|
| 206 |
+
model,
|
| 207 |
+
[claim_type, damage_amount, injury_involved, risk_factor]
|
| 208 |
+
)
|
| 209 |
+
|
| 210 |
+
# Compile advisory output
|
| 211 |
+
advisory_output = {
|
| 212 |
+
# GOVERNANCE: All outputs clearly marked as ADVISORY
|
| 213 |
+
'governance_status': '⚠ ADVISORY ONLY - HUMAN CONFIRMATION REQUIRED',
|
| 214 |
+
'decision_authority': 'HUMAN (not machine)',
|
| 215 |
+
'binding': False,
|
| 216 |
+
'requires_human_review': True,
|
| 217 |
+
|
| 218 |
+
# Model suggestion (NON-BINDING)
|
| 219 |
+
'model_suggestion': f"{severity} Severity (Advisory)",
|
| 220 |
+
'severity_level': severity,
|
| 221 |
+
'confidence_score': confidence,
|
| 222 |
+
|
| 223 |
+
# Input summary
|
| 224 |
+
'input_summary': {
|
| 225 |
+
'claim_type': claim_type,
|
| 226 |
+
'damage_amount': f"${damage_amount:,.2f}",
|
| 227 |
+
'injury_involved': 'Yes' if injury_involved else 'No',
|
| 228 |
+
'risk_factor': risk_factor
|
| 229 |
+
},
|
| 230 |
+
|
| 231 |
+
# Explainability
|
| 232 |
+
'rule_signals': rule_signals,
|
| 233 |
+
'feature_importance': feature_importance,
|
| 234 |
+
'uncertainty_assessment': uncertainty,
|
| 235 |
+
|
| 236 |
+
# Prediction metadata
|
| 237 |
+
'prediction_metadata': {
|
| 238 |
+
'model_type': metadata['model_type'],
|
| 239 |
+
'model_architecture': metadata['model_architecture'],
|
| 240 |
+
'prediction_timestamp': datetime.now().isoformat(),
|
| 241 |
+
'dataset_source': metadata['dataset']
|
| 242 |
+
},
|
| 243 |
+
|
| 244 |
+
# Governance reminders
|
| 245 |
+
'governance_reminders': [
|
| 246 |
+
'⚠ This is an ADVISORY suggestion only',
|
| 247 |
+
'⚠ Human decision-maker has FULL AUTHORITY to accept or override',
|
| 248 |
+
'⚠ Human must independently evaluate the claim',
|
| 249 |
+
'⚠ Human must document rationale for final decision',
|
| 250 |
+
'⚠ All decisions must be logged in audit trail'
|
| 251 |
+
],
|
| 252 |
+
|
| 253 |
+
# Decision boundaries reference
|
| 254 |
+
'decision_boundaries_reference': DECISION_BOUNDARIES
|
| 255 |
+
}
|
| 256 |
+
|
| 257 |
+
return advisory_output
|
| 258 |
+
|
| 259 |
+
def format_advisory_output(output):
|
| 260 |
+
"""
|
| 261 |
+
Format advisory output for human-readable display.
|
| 262 |
+
"""
|
| 263 |
+
print("\n" + "="*70)
|
| 264 |
+
print("INSURANCE CLAIM ADVISORY PREDICTION")
|
| 265 |
+
print("="*70)
|
| 266 |
+
print(f"\n{output['governance_status']}")
|
| 267 |
+
print(f"Decision Authority: {output['decision_authority']}")
|
| 268 |
+
print(f"Binding: {output['binding']}")
|
| 269 |
+
|
| 270 |
+
print(f"\n{'='*70}")
|
| 271 |
+
print("INPUT SUMMARY")
|
| 272 |
+
print(f"{'='*70}")
|
| 273 |
+
for key, value in output['input_summary'].items():
|
| 274 |
+
print(f" {key.replace('_', ' ').title()}: {value}")
|
| 275 |
+
|
| 276 |
+
print(f"\n{'='*70}")
|
| 277 |
+
print("MODEL ADVISORY SUGGESTION (Non-Binding)")
|
| 278 |
+
print(f"{'='*70}")
|
| 279 |
+
print(f" Suggested Severity: {output['model_suggestion']}")
|
| 280 |
+
print(f" Model Confidence: {output['confidence_score']:.4f} ({output['confidence_score']*100:.2f}%)")
|
| 281 |
+
|
| 282 |
+
print(f"\n{'='*70}")
|
| 283 |
+
print("RULE SIGNALS (Transparent Decision Factors)")
|
| 284 |
+
print(f"{'='*70}")
|
| 285 |
+
for signal in output['rule_signals']:
|
| 286 |
+
print(f" {signal}")
|
| 287 |
+
|
| 288 |
+
print(f"\n{'='*70}")
|
| 289 |
+
print("FEATURE IMPORTANCE (What Influenced This Suggestion)")
|
| 290 |
+
print(f"{'='*70}")
|
| 291 |
+
for feature, details in output['feature_importance'].items():
|
| 292 |
+
print(f" {feature}: {details['relative_percentage']:.1f}% importance")
|
| 293 |
+
|
| 294 |
+
print(f"\n{'='*70}")
|
| 295 |
+
print("UNCERTAINTY ASSESSMENT")
|
| 296 |
+
print(f"{'='*70}")
|
| 297 |
+
uncertainty = output['uncertainty_assessment']
|
| 298 |
+
print(f" Uncertainty Level: {uncertainty['level']}")
|
| 299 |
+
print(f" Normalized Entropy: {uncertainty['normalized_entropy']:.4f}")
|
| 300 |
+
print(f" Interpretation: {uncertainty['interpretation']}")
|
| 301 |
+
|
| 302 |
+
print(f"\n Confidence Distribution:")
|
| 303 |
+
for severity, prob in uncertainty['confidence_distribution'].items():
|
| 304 |
+
print(f" {severity}: {prob:.4f} ({prob*100:.2f}%)")
|
| 305 |
+
|
| 306 |
+
print(f"\n{'='*70}")
|
| 307 |
+
print("GOVERNANCE REMINDERS")
|
| 308 |
+
print(f"{'='*70}")
|
| 309 |
+
for reminder in output['governance_reminders']:
|
| 310 |
+
print(f" {reminder}")
|
| 311 |
+
|
| 312 |
+
print(f"\n{'='*70}\n")
|
| 313 |
+
|
| 314 |
+
def main():
|
| 315 |
+
"""
|
| 316 |
+
Example usage with sample claims.
|
| 317 |
+
"""
|
| 318 |
+
print("\n" + "="*70)
|
| 319 |
+
print("ADVISORY PREDICTION SYSTEM - DEMONSTRATION")
|
| 320 |
+
print("="*70)
|
| 321 |
+
print("Model Type: Classical ML (XGBoost)")
|
| 322 |
+
print("Governance: Human-in-the-Loop Required")
|
| 323 |
+
print("="*70 + "\n")
|
| 324 |
+
|
| 325 |
+
# Example 1: Low severity claim
|
| 326 |
+
print("\n" + "="*70)
|
| 327 |
+
print("EXAMPLE 1: Low Damage Auto Claim")
|
| 328 |
+
print("="*70)
|
| 329 |
+
output1 = predict_claim(
|
| 330 |
+
claim_type="Auto",
|
| 331 |
+
damage_amount=2500.0,
|
| 332 |
+
injury_involved=False,
|
| 333 |
+
risk_factor="low"
|
| 334 |
+
)
|
| 335 |
+
format_advisory_output(output1)
|
| 336 |
+
|
| 337 |
+
# Example 2: High severity claim
|
| 338 |
+
print("\n" + "="*70)
|
| 339 |
+
print("EXAMPLE 2: High Damage Liability Claim with Injury")
|
| 340 |
+
print("="*70)
|
| 341 |
+
output2 = predict_claim(
|
| 342 |
+
claim_type="Liability",
|
| 343 |
+
damage_amount=75000.0,
|
| 344 |
+
injury_involved=True,
|
| 345 |
+
risk_factor="high"
|
| 346 |
+
)
|
| 347 |
+
format_advisory_output(output2)
|
| 348 |
+
|
| 349 |
+
# Example 3: Medium severity claim
|
| 350 |
+
print("\n" + "="*70)
|
| 351 |
+
print("EXAMPLE 3: Medium Damage Property Claim")
|
| 352 |
+
print("="*70)
|
| 353 |
+
output3 = predict_claim(
|
| 354 |
+
claim_type="Property",
|
| 355 |
+
damage_amount=12000.0,
|
| 356 |
+
injury_involved=False,
|
| 357 |
+
risk_factor="medium"
|
| 358 |
+
)
|
| 359 |
+
format_advisory_output(output3)
|
| 360 |
+
|
| 361 |
+
print("\n" + "="*70)
|
| 362 |
+
print("DEMONSTRATION COMPLETE")
|
| 363 |
+
print("="*70)
|
| 364 |
+
print("\nTo use this module in your code:")
|
| 365 |
+
print(" from predict import predict_claim")
|
| 366 |
+
print(" result = predict_claim('Auto', 5000.0, False, 'low')")
|
| 367 |
+
print("="*70 + "\n")
|
| 368 |
+
|
| 369 |
+
if __name__ == "__main__":
|
| 370 |
+
main()
|
requirements.txt
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# UI Framework
|
| 2 |
+
gradio>=4.0.0
|
| 3 |
+
|
| 4 |
+
# Data handling
|
| 5 |
+
datasets>=2.14.0
|
| 6 |
+
pandas>=2.0.0
|
| 7 |
+
numpy>=1.24.0
|
| 8 |
+
|
| 9 |
+
# Classical ML (NO deep learning, NO LLMs)
|
| 10 |
+
scikit-learn>=1.3.0
|
| 11 |
+
xgboost>=2.0.0
|
| 12 |
+
joblib>=1.3.0
|
| 13 |
+
|
| 14 |
+
# Explainability (REQUIRED for governance)
|
| 15 |
+
shap>=0.42.0
|
| 16 |
+
|
| 17 |
+
# Configuration
|
| 18 |
+
pyyaml>=6.0
|
train.py
ADDED
|
@@ -0,0 +1,301 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Train Classical ML Model for Insurance Claims Decision Support
|
| 3 |
+
==============================================================
|
| 4 |
+
|
| 5 |
+
GOVERNANCE CONSTRAINTS:
|
| 6 |
+
- Classical ML ONLY (XGBoost used here - NO neural networks, NO LLMs)
|
| 7 |
+
- Advisory system only (NO autonomous decisions)
|
| 8 |
+
- Must align with decision_spec.yaml frozen boundaries
|
| 9 |
+
- Human-in-the-loop is MANDATORY
|
| 10 |
+
- All outputs are NON-BINDING suggestions
|
| 11 |
+
|
| 12 |
+
Dataset: BDR-AI/insurance_decision_boundaries_v1 (Hugging Face)
|
| 13 |
+
Model: XGBoost Classifier
|
| 14 |
+
Purpose: Demonstration of AI governance principles
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
import pandas as pd
|
| 18 |
+
import numpy as np
|
| 19 |
+
from datasets import load_dataset
|
| 20 |
+
from sklearn.model_selection import train_test_split
|
| 21 |
+
from sklearn.preprocessing import LabelEncoder
|
| 22 |
+
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
|
| 23 |
+
import xgboost as xgb
|
| 24 |
+
import joblib
|
| 25 |
+
import json
|
| 26 |
+
from datetime import datetime
|
| 27 |
+
|
| 28 |
+
# FROZEN DECISION BOUNDARIES - DO NOT MODIFY
|
| 29 |
+
DECISION_BOUNDARIES = {
|
| 30 |
+
'damage_thresholds': {
|
| 31 |
+
'low': 5000,
|
| 32 |
+
'medium': 15000,
|
| 33 |
+
'high': 50000
|
| 34 |
+
},
|
| 35 |
+
'risk_weights': {
|
| 36 |
+
'low': 1.0,
|
| 37 |
+
'medium': 1.5,
|
| 38 |
+
'high': 2.0
|
| 39 |
+
},
|
| 40 |
+
'injury_multiplier': 1.8,
|
| 41 |
+
'severity_thresholds': {
|
| 42 |
+
'low': 5,
|
| 43 |
+
'medium': 15
|
| 44 |
+
}
|
| 45 |
+
}
|
| 46 |
+
|
| 47 |
+
def load_and_prepare_data():
|
| 48 |
+
"""
|
| 49 |
+
Load dataset from Hugging Face and prepare for training.
|
| 50 |
+
|
| 51 |
+
Returns:
|
| 52 |
+
X_train, X_test, y_train, y_test, encoders
|
| 53 |
+
"""
|
| 54 |
+
print("=" * 70)
|
| 55 |
+
print("LOADING DATASET: BDR-AI/insurance_decision_boundaries_v1")
|
| 56 |
+
print("=" * 70)
|
| 57 |
+
|
| 58 |
+
# Load dataset from Hugging Face
|
| 59 |
+
dataset = load_dataset("BDR-AI/insurance_decision_boundaries_v1")
|
| 60 |
+
df = pd.DataFrame(dataset['train'])
|
| 61 |
+
|
| 62 |
+
print(f"\nDataset loaded: {len(df)} samples")
|
| 63 |
+
print(f"Columns: {df.columns.tolist()}")
|
| 64 |
+
print(f"\nFirst few rows:")
|
| 65 |
+
print(df.head())
|
| 66 |
+
|
| 67 |
+
# GOVERNANCE CHECK: Verify only allowed features present
|
| 68 |
+
allowed_features = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 69 |
+
feature_cols = [col for col in df.columns if col != 'severity']
|
| 70 |
+
|
| 71 |
+
print(f"\n{'='*70}")
|
| 72 |
+
print("GOVERNANCE CHECK: Verifying feature compliance")
|
| 73 |
+
print(f"{'='*70}")
|
| 74 |
+
print(f"Allowed features: {allowed_features}")
|
| 75 |
+
print(f"Found features: {feature_cols}")
|
| 76 |
+
|
| 77 |
+
for col in feature_cols:
|
| 78 |
+
if col not in allowed_features:
|
| 79 |
+
raise ValueError(f"GOVERNANCE VIOLATION: Unauthorized feature '{col}' found in dataset!")
|
| 80 |
+
|
| 81 |
+
print("✓ Feature compliance verified - proceeding with training")
|
| 82 |
+
|
| 83 |
+
# Prepare features (4 inputs only - FROZEN)
|
| 84 |
+
X = df[allowed_features].copy()
|
| 85 |
+
y = df['severity']
|
| 86 |
+
|
| 87 |
+
print(f"\n{'='*70}")
|
| 88 |
+
print("TARGET DISTRIBUTION (Advisory Severity Levels)")
|
| 89 |
+
print(f"{'='*70}")
|
| 90 |
+
print(y.value_counts())
|
| 91 |
+
|
| 92 |
+
# Encode categorical features
|
| 93 |
+
encoders = {}
|
| 94 |
+
|
| 95 |
+
# Encode claim_type
|
| 96 |
+
le_claim = LabelEncoder()
|
| 97 |
+
X['claim_type_encoded'] = le_claim.fit_transform(X['claim_type'])
|
| 98 |
+
encoders['claim_type'] = le_claim
|
| 99 |
+
|
| 100 |
+
# Encode risk_factor
|
| 101 |
+
le_risk = LabelEncoder()
|
| 102 |
+
X['risk_factor_encoded'] = le_risk.fit_transform(X['risk_factor'])
|
| 103 |
+
encoders['risk_factor'] = le_risk
|
| 104 |
+
|
| 105 |
+
# Convert injury_involved to int
|
| 106 |
+
X['injury_involved_encoded'] = X['injury_involved'].astype(int)
|
| 107 |
+
|
| 108 |
+
# Create feature matrix with encoded values
|
| 109 |
+
X_processed = X[['claim_type_encoded', 'damage_amount', 'injury_involved_encoded', 'risk_factor_encoded']].copy()
|
| 110 |
+
X_processed.columns = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 111 |
+
|
| 112 |
+
# Encode target
|
| 113 |
+
le_target = LabelEncoder()
|
| 114 |
+
y_encoded = le_target.fit_transform(y)
|
| 115 |
+
encoders['target'] = le_target
|
| 116 |
+
|
| 117 |
+
print(f"\n{'='*70}")
|
| 118 |
+
print("ENCODING SUMMARY")
|
| 119 |
+
print(f"{'='*70}")
|
| 120 |
+
print(f"claim_type mapping: {dict(zip(le_claim.classes_, le_claim.transform(le_claim.classes_)))}")
|
| 121 |
+
print(f"risk_factor mapping: {dict(zip(le_risk.classes_, le_risk.transform(le_risk.classes_)))}")
|
| 122 |
+
print(f"target mapping: {dict(zip(le_target.classes_, le_target.transform(le_target.classes_)))}")
|
| 123 |
+
|
| 124 |
+
# Train-test split (80/20)
|
| 125 |
+
X_train, X_test, y_train, y_test = train_test_split(
|
| 126 |
+
X_processed, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
print(f"\n{'='*70}")
|
| 130 |
+
print("TRAIN/TEST SPLIT")
|
| 131 |
+
print(f"{'='*70}")
|
| 132 |
+
print(f"Training samples: {len(X_train)}")
|
| 133 |
+
print(f"Test samples: {len(X_test)}")
|
| 134 |
+
|
| 135 |
+
return X_train, X_test, y_train, y_test, encoders
|
| 136 |
+
|
| 137 |
+
def train_model(X_train, y_train):
|
| 138 |
+
"""
|
| 139 |
+
Train XGBoost classifier (classical ML).
|
| 140 |
+
|
| 141 |
+
GOVERNANCE: XGBoost is a classical ML algorithm (tree-based).
|
| 142 |
+
NO neural networks, NO LLMs, NO reinforcement learning.
|
| 143 |
+
"""
|
| 144 |
+
print(f"\n{'='*70}")
|
| 145 |
+
print("TRAINING XGBOOST CLASSIFIER (Classical ML)")
|
| 146 |
+
print(f"{'='*70}")
|
| 147 |
+
print("Model type: XGBoost (tree-based gradient boosting)")
|
| 148 |
+
print("Governance status: ✓ Classical ML approved")
|
| 149 |
+
print("Autonomous decisions: ✗ DISABLED (advisory only)")
|
| 150 |
+
|
| 151 |
+
# Train XGBoost model
|
| 152 |
+
model = xgb.XGBClassifier(
|
| 153 |
+
objective='multi:softprob',
|
| 154 |
+
num_class=3,
|
| 155 |
+
max_depth=6,
|
| 156 |
+
learning_rate=0.1,
|
| 157 |
+
n_estimators=100,
|
| 158 |
+
random_state=42,
|
| 159 |
+
eval_metric='mlogloss'
|
| 160 |
+
)
|
| 161 |
+
|
| 162 |
+
model.fit(X_train, y_train)
|
| 163 |
+
|
| 164 |
+
print("\n✓ Model training complete")
|
| 165 |
+
|
| 166 |
+
return model
|
| 167 |
+
|
| 168 |
+
def evaluate_model(model, X_test, y_test, encoders):
|
| 169 |
+
"""
|
| 170 |
+
Evaluate model performance on test set.
|
| 171 |
+
"""
|
| 172 |
+
print(f"\n{'='*70}")
|
| 173 |
+
print("MODEL EVALUATION")
|
| 174 |
+
print(f"{'='*70}")
|
| 175 |
+
|
| 176 |
+
# Make predictions
|
| 177 |
+
y_pred = model.predict(X_test)
|
| 178 |
+
y_pred_proba = model.predict_proba(X_test)
|
| 179 |
+
|
| 180 |
+
# Calculate metrics
|
| 181 |
+
accuracy = accuracy_score(y_test, y_pred)
|
| 182 |
+
|
| 183 |
+
print(f"\nTest Set Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
|
| 184 |
+
|
| 185 |
+
# Classification report
|
| 186 |
+
target_names = encoders['target'].classes_
|
| 187 |
+
print(f"\n{'='*70}")
|
| 188 |
+
print("CLASSIFICATION REPORT (Advisory Predictions)")
|
| 189 |
+
print(f"{'='*70}")
|
| 190 |
+
print(classification_report(y_test, y_pred, target_names=target_names))
|
| 191 |
+
|
| 192 |
+
# Confusion matrix
|
| 193 |
+
cm = confusion_matrix(y_test, y_pred)
|
| 194 |
+
print(f"{'='*70}")
|
| 195 |
+
print("CONFUSION MATRIX")
|
| 196 |
+
print(f"{'='*70}")
|
| 197 |
+
print(f" Predicted")
|
| 198 |
+
print(f" Low Medium High")
|
| 199 |
+
for i, label in enumerate(target_names):
|
| 200 |
+
print(f"Actual {label:8s} {cm[i]}")
|
| 201 |
+
|
| 202 |
+
# Feature importance
|
| 203 |
+
feature_importance = model.feature_importances_
|
| 204 |
+
feature_names = ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor']
|
| 205 |
+
|
| 206 |
+
print(f"\n{'='*70}")
|
| 207 |
+
print("FEATURE IMPORTANCE (Explainability)")
|
| 208 |
+
print(f"{'='*70}")
|
| 209 |
+
for name, importance in sorted(zip(feature_names, feature_importance), key=lambda x: x[1], reverse=True):
|
| 210 |
+
print(f"{name:20s}: {importance:.4f}")
|
| 211 |
+
|
| 212 |
+
return {
|
| 213 |
+
'accuracy': accuracy,
|
| 214 |
+
'classification_report': classification_report(y_test, y_pred, target_names=target_names, output_dict=True),
|
| 215 |
+
'confusion_matrix': cm.tolist(),
|
| 216 |
+
'feature_importance': dict(zip(feature_names, feature_importance.tolist()))
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
def save_artifacts(model, encoders, metrics):
|
| 220 |
+
"""
|
| 221 |
+
Save trained model, encoders, and metrics.
|
| 222 |
+
"""
|
| 223 |
+
print(f"\n{'='*70}")
|
| 224 |
+
print("SAVING MODEL ARTIFACTS")
|
| 225 |
+
print(f"{'='*70}")
|
| 226 |
+
|
| 227 |
+
# Save model
|
| 228 |
+
joblib.dump(model, 'model.pkl')
|
| 229 |
+
print("✓ Model saved to: model.pkl")
|
| 230 |
+
|
| 231 |
+
# Save encoders
|
| 232 |
+
joblib.dump(encoders, 'encoders.pkl')
|
| 233 |
+
print("✓ Encoders saved to: encoders.pkl")
|
| 234 |
+
|
| 235 |
+
# Save metrics and metadata
|
| 236 |
+
metadata = {
|
| 237 |
+
'model_type': 'XGBoost Classifier',
|
| 238 |
+
'model_architecture': 'Classical ML (tree-based gradient boosting)',
|
| 239 |
+
'governance_status': 'ADVISORY ONLY - NO AUTONOMOUS DECISIONS',
|
| 240 |
+
'human_review_required': True,
|
| 241 |
+
'training_date': datetime.now().isoformat(),
|
| 242 |
+
'dataset': 'BDR-AI/insurance_decision_boundaries_v1',
|
| 243 |
+
'dataset_type': 'synthetic',
|
| 244 |
+
'features': ['claim_type', 'damage_amount', 'injury_involved', 'risk_factor'],
|
| 245 |
+
'target': 'severity (advisory levels: Low/Medium/High)',
|
| 246 |
+
'decision_boundaries': DECISION_BOUNDARIES,
|
| 247 |
+
'metrics': metrics
|
| 248 |
+
}
|
| 249 |
+
|
| 250 |
+
with open('model_metadata.json', 'w') as f:
|
| 251 |
+
json.dump(metadata, f, indent=2)
|
| 252 |
+
print("✓ Metadata saved to: model_metadata.json")
|
| 253 |
+
|
| 254 |
+
print(f"\n{'='*70}")
|
| 255 |
+
print("GOVERNANCE REMINDER")
|
| 256 |
+
print(f"{'='*70}")
|
| 257 |
+
print("⚠ This model produces ADVISORY outputs only")
|
| 258 |
+
print("⚠ Human confirmation is MANDATORY for all decisions")
|
| 259 |
+
print("⚠ All outputs are NON-BINDING suggestions")
|
| 260 |
+
print("⚠ Audit trail must be maintained for all uses")
|
| 261 |
+
|
| 262 |
+
def main():
|
| 263 |
+
"""
|
| 264 |
+
Main training pipeline.
|
| 265 |
+
"""
|
| 266 |
+
print("\n" + "="*70)
|
| 267 |
+
print("INSURANCE DECISION SUPPORT MODEL - TRAINING PIPELINE")
|
| 268 |
+
print("="*70)
|
| 269 |
+
print("Governance Mode: ADVISORY (Human-in-the-Loop Required)")
|
| 270 |
+
print("Model Type: Classical ML (XGBoost)")
|
| 271 |
+
print("Autonomous Decisions: DISABLED")
|
| 272 |
+
print("="*70 + "\n")
|
| 273 |
+
|
| 274 |
+
# Load and prepare data
|
| 275 |
+
X_train, X_test, y_train, y_test, encoders = load_and_prepare_data()
|
| 276 |
+
|
| 277 |
+
# Train model
|
| 278 |
+
model = train_model(X_train, y_train)
|
| 279 |
+
|
| 280 |
+
# Evaluate model
|
| 281 |
+
metrics = evaluate_model(model, X_test, y_test, encoders)
|
| 282 |
+
|
| 283 |
+
# Save artifacts
|
| 284 |
+
save_artifacts(model, encoders, metrics)
|
| 285 |
+
|
| 286 |
+
print(f"\n{'='*70}")
|
| 287 |
+
print("TRAINING COMPLETE")
|
| 288 |
+
print(f"{'='*70}")
|
| 289 |
+
print(f"✓ Model accuracy: {metrics['accuracy']*100:.2f}%")
|
| 290 |
+
print(f"✓ Model saved: model.pkl")
|
| 291 |
+
print(f"✓ Encoders saved: encoders.pkl")
|
| 292 |
+
print(f"✓ Metadata saved: model_metadata.json")
|
| 293 |
+
print(f"\n{'='*70}")
|
| 294 |
+
print("NEXT STEPS:")
|
| 295 |
+
print(" 1. Run evaluate.py for detailed evaluation")
|
| 296 |
+
print(" 2. Run predict.py for advisory predictions")
|
| 297 |
+
print(" 3. Review model_card.md for limitations")
|
| 298 |
+
print(f"{'='*70}\n")
|
| 299 |
+
|
| 300 |
+
if __name__ == "__main__":
|
| 301 |
+
main()
|