Add risk classifier documentation
Browse files- RISK_CLASSIFIER_README.md +138 -0
RISK_CLASSIFIER_README.md
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hybrid ML + Rule-Based Risk Classifier
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
Production-ready risk detection model for financial conversations. Identifies 5 types of financial risks with 82.5% accuracy.
|
| 5 |
+
|
| 6 |
+
## Performance Metrics
|
| 7 |
+
- **Overall Accuracy:** 82.5% (tested on 160 diverse cases)
|
| 8 |
+
- **Baseline Improvement:** +22.5% (60% → 82.5%)
|
| 9 |
+
- **vs Pure ML:** +5% improvement
|
| 10 |
+
|
| 11 |
+
### Per-Category Accuracy
|
| 12 |
+
| Risk Type | Accuracy | Performance |
|
| 13 |
+
|-----------|----------|-------------|
|
| 14 |
+
| Credit Risk | 84.4% | Strong |
|
| 15 |
+
| Market Risk | 90.6% | Excellent (+21% boost) |
|
| 16 |
+
| Liquidity Risk | 71.9% | Good |
|
| 17 |
+
| Opportunity Risk | 71.9% | Good (+15% improvement) |
|
| 18 |
+
| Regulatory Risk | 93.8% | Excellent (+19% improvement) |
|
| 19 |
+
|
| 20 |
+
## Architecture
|
| 21 |
+
- **ML Engine:** Random Forest (200 trees) + Gradient Boosting ensemble
|
| 22 |
+
- **Feature Extraction:** TF-IDF Vectorizer (1,059 features with trigrams)
|
| 23 |
+
- **Detection Method:** Hybrid approach (94% rules+ML blend, 6% pure ML)
|
| 24 |
+
- **Rules:** Category-specific financial keyword patterns
|
| 25 |
+
|
| 26 |
+
## Model Files
|
| 27 |
+
- `classifier.pkl` - Random Forest classifier (1.36 MB)
|
| 28 |
+
- `classifier_gb.pkl` - Gradient Boosting classifier (0.66 MB)
|
| 29 |
+
- `vectorizer.pkl` - TF-IDF vectorizer (0.07 MB)
|
| 30 |
+
- `metadata.json` - Metrics and configuration
|
| 31 |
+
|
| 32 |
+
## Risk Categories
|
| 33 |
+
|
| 34 |
+
### 1. Credit Risk (84.4%)
|
| 35 |
+
- Inability to afford monthly payments
|
| 36 |
+
- Loan defaults and payment delinquencies
|
| 37 |
+
- Poor credit history and low creditworthiness
|
| 38 |
+
- High debt-to-income ratios
|
| 39 |
+
- Keywords: "afford", "default", "delinquent", "debt"
|
| 40 |
+
|
| 41 |
+
### 2. Market Risk (90.6%)
|
| 42 |
+
- Stock market crashes and volatility
|
| 43 |
+
- Economic downturns affecting portfolio
|
| 44 |
+
- Currency fluctuations and losses
|
| 45 |
+
- Keywords: "crash", "volatility", "bear market", "downturn"
|
| 46 |
+
|
| 47 |
+
### 3. Liquidity Risk (71.9%)
|
| 48 |
+
- Funds locked in long-term investments
|
| 49 |
+
- Cash flow constraints
|
| 50 |
+
- Difficulty accessing emergency funds
|
| 51 |
+
- Keywords: "locked", "illiquid", "cash", "shortage"
|
| 52 |
+
|
| 53 |
+
### 4. Opportunity Risk (71.9%)
|
| 54 |
+
- Missed investment opportunities
|
| 55 |
+
- Poor timing decisions
|
| 56 |
+
- Regret about investment choices
|
| 57 |
+
- Keywords: "missed", "opportunity", "regret", "timing"
|
| 58 |
+
|
| 59 |
+
### 5. Regulatory Risk (93.8%)
|
| 60 |
+
- Tax compliance requirements
|
| 61 |
+
- AML/KYC regulations
|
| 62 |
+
- Regulatory approval delays
|
| 63 |
+
- Keywords: "tax", "compliance", "regulatory", "legal"
|
| 64 |
+
|
| 65 |
+
## Usage
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
from huggingface_hub import hf_hub_download
|
| 69 |
+
import pickle
|
| 70 |
+
|
| 71 |
+
# Download risk classifier
|
| 72 |
+
classifier_path = hf_hub_download(
|
| 73 |
+
repo_id="rohin30n/Armour",
|
| 74 |
+
filename="risk_classifier/classifier.pkl",
|
| 75 |
+
token="YOUR_TOKEN"
|
| 76 |
+
)
|
| 77 |
+
|
| 78 |
+
vectorizer_path = hf_hub_download(
|
| 79 |
+
repo_id="rohin30n/Armour",
|
| 80 |
+
filename="risk_classifier/vectorizer.pkl",
|
| 81 |
+
token="YOUR_TOKEN"
|
| 82 |
+
)
|
| 83 |
+
|
| 84 |
+
# Load models
|
| 85 |
+
with open(classifier_path, 'rb') as f:
|
| 86 |
+
classifier = pickle.load(f)
|
| 87 |
+
|
| 88 |
+
with open(vectorizer_path, 'rb') as f:
|
| 89 |
+
vectorizer = pickle.load(f)
|
| 90 |
+
|
| 91 |
+
# Predict risk category
|
| 92 |
+
text = "Customer can't afford monthly EMI payments"
|
| 93 |
+
X = vectorizer.transform([text])
|
| 94 |
+
risk_pred = classifier.predict(X)[0]
|
| 95 |
+
risk_proba = classifier.predict_proba(X)[0]
|
| 96 |
+
|
| 97 |
+
print(f"Risk Category: {risk_pred}")
|
| 98 |
+
print(f"Confidence: {max(risk_proba):.2%}")
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
## Technical Details
|
| 102 |
+
|
| 103 |
+
### Training Data
|
| 104 |
+
- 152 financial conversation samples
|
| 105 |
+
- 32 samples per risk category
|
| 106 |
+
- Diverse scenarios and language variations
|
| 107 |
+
- Stratified train-test split (80/20)
|
| 108 |
+
|
| 109 |
+
### Hyperparameters
|
| 110 |
+
- Random Forest: 200 trees, max_depth=20, class_weight='balanced'
|
| 111 |
+
- Gradient Boosting: 100 estimators, max_depth=5, learning_rate=0.1
|
| 112 |
+
- TF-IDF: 1,059 features, trigrams (1-3 grams), sublinear scaling
|
| 113 |
+
|
| 114 |
+
### Evaluation
|
| 115 |
+
- 5-fold cross-validation
|
| 116 |
+
- Stratified splits for class balance
|
| 117 |
+
- Per-category metrics (precision, recall, F1)
|
| 118 |
+
- Tested on 160 diverse financial scenarios
|
| 119 |
+
|
| 120 |
+
## Integration with Armour AI
|
| 121 |
+
|
| 122 |
+
This risk classifier integrates seamlessly with the Armour AI financial NLP pipeline:
|
| 123 |
+
1. Text → Finance Classification
|
| 124 |
+
2. If financial → Risk Analysis (this model)
|
| 125 |
+
3. Risk output → Entity Extraction & Action Items
|
| 126 |
+
|
| 127 |
+
## Performance Notes
|
| 128 |
+
|
| 129 |
+
- **Weak categories boosted by rules:** Market risk (90.6%), Regulatory risk (93.8%)
|
| 130 |
+
- **Hybrid approach:** Combines ML predictions with keyword pattern matching
|
| 131 |
+
- **Fast inference:** ~50-100ms per prediction
|
| 132 |
+
- **Explainable:** Returns which detection method was used (ML vs rules)
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
**Model Location:** `/risk_classifier/` in rohin30n/Armour
|
| 137 |
+
**License:** Apache 2.0
|
| 138 |
+
**Tags:** risk-scoring, financial-nlp, hybrid-model, classification
|