rohin30n commited on
Commit
6325841
·
verified ·
1 Parent(s): 874f03a

Add risk classifier documentation

Browse files
Files changed (1) hide show
  1. RISK_CLASSIFIER_README.md +138 -0
RISK_CLASSIFIER_README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hybrid ML + Rule-Based Risk Classifier
2
+
3
+ ## Overview
4
+ Production-ready risk detection model for financial conversations. Identifies 5 types of financial risks with 82.5% accuracy.
5
+
6
+ ## Performance Metrics
7
+ - **Overall Accuracy:** 82.5% (tested on 160 diverse cases)
8
+ - **Baseline Improvement:** +22.5% (60% → 82.5%)
9
+ - **vs Pure ML:** +5% improvement
10
+
11
+ ### Per-Category Accuracy
12
+ | Risk Type | Accuracy | Performance |
13
+ |-----------|----------|-------------|
14
+ | Credit Risk | 84.4% | Strong |
15
+ | Market Risk | 90.6% | Excellent (+21% boost) |
16
+ | Liquidity Risk | 71.9% | Good |
17
+ | Opportunity Risk | 71.9% | Good (+15% improvement) |
18
+ | Regulatory Risk | 93.8% | Excellent (+19% improvement) |
19
+
20
+ ## Architecture
21
+ - **ML Engine:** Random Forest (200 trees) + Gradient Boosting ensemble
22
+ - **Feature Extraction:** TF-IDF Vectorizer (1,059 features with trigrams)
23
+ - **Detection Method:** Hybrid approach (94% rules+ML blend, 6% pure ML)
24
+ - **Rules:** Category-specific financial keyword patterns
25
+
26
+ ## Model Files
27
+ - `classifier.pkl` - Random Forest classifier (1.36 MB)
28
+ - `classifier_gb.pkl` - Gradient Boosting classifier (0.66 MB)
29
+ - `vectorizer.pkl` - TF-IDF vectorizer (0.07 MB)
30
+ - `metadata.json` - Metrics and configuration
31
+
32
+ ## Risk Categories
33
+
34
+ ### 1. Credit Risk (84.4%)
35
+ - Inability to afford monthly payments
36
+ - Loan defaults and payment delinquencies
37
+ - Poor credit history and low creditworthiness
38
+ - High debt-to-income ratios
39
+ - Keywords: "afford", "default", "delinquent", "debt"
40
+
41
+ ### 2. Market Risk (90.6%)
42
+ - Stock market crashes and volatility
43
+ - Economic downturns affecting portfolio
44
+ - Currency fluctuations and losses
45
+ - Keywords: "crash", "volatility", "bear market", "downturn"
46
+
47
+ ### 3. Liquidity Risk (71.9%)
48
+ - Funds locked in long-term investments
49
+ - Cash flow constraints
50
+ - Difficulty accessing emergency funds
51
+ - Keywords: "locked", "illiquid", "cash", "shortage"
52
+
53
+ ### 4. Opportunity Risk (71.9%)
54
+ - Missed investment opportunities
55
+ - Poor timing decisions
56
+ - Regret about investment choices
57
+ - Keywords: "missed", "opportunity", "regret", "timing"
58
+
59
+ ### 5. Regulatory Risk (93.8%)
60
+ - Tax compliance requirements
61
+ - AML/KYC regulations
62
+ - Regulatory approval delays
63
+ - Keywords: "tax", "compliance", "regulatory", "legal"
64
+
65
+ ## Usage
66
+
67
+ ```python
68
+ from huggingface_hub import hf_hub_download
69
+ import pickle
70
+
71
+ # Download risk classifier
72
+ classifier_path = hf_hub_download(
73
+ repo_id="rohin30n/Armour",
74
+ filename="risk_classifier/classifier.pkl",
75
+ token="YOUR_TOKEN"
76
+ )
77
+
78
+ vectorizer_path = hf_hub_download(
79
+ repo_id="rohin30n/Armour",
80
+ filename="risk_classifier/vectorizer.pkl",
81
+ token="YOUR_TOKEN"
82
+ )
83
+
84
+ # Load models
85
+ with open(classifier_path, 'rb') as f:
86
+ classifier = pickle.load(f)
87
+
88
+ with open(vectorizer_path, 'rb') as f:
89
+ vectorizer = pickle.load(f)
90
+
91
+ # Predict risk category
92
+ text = "Customer can't afford monthly EMI payments"
93
+ X = vectorizer.transform([text])
94
+ risk_pred = classifier.predict(X)[0]
95
+ risk_proba = classifier.predict_proba(X)[0]
96
+
97
+ print(f"Risk Category: {risk_pred}")
98
+ print(f"Confidence: {max(risk_proba):.2%}")
99
+ ```
100
+
101
+ ## Technical Details
102
+
103
+ ### Training Data
104
+ - 152 financial conversation samples
105
+ - 32 samples per risk category
106
+ - Diverse scenarios and language variations
107
+ - Stratified train-test split (80/20)
108
+
109
+ ### Hyperparameters
110
+ - Random Forest: 200 trees, max_depth=20, class_weight='balanced'
111
+ - Gradient Boosting: 100 estimators, max_depth=5, learning_rate=0.1
112
+ - TF-IDF: 1,059 features, trigrams (1-3 grams), sublinear scaling
113
+
114
+ ### Evaluation
115
+ - 5-fold cross-validation
116
+ - Stratified splits for class balance
117
+ - Per-category metrics (precision, recall, F1)
118
+ - Tested on 160 diverse financial scenarios
119
+
120
+ ## Integration with Armour AI
121
+
122
+ This risk classifier integrates seamlessly with the Armour AI financial NLP pipeline:
123
+ 1. Text → Finance Classification
124
+ 2. If financial → Risk Analysis (this model)
125
+ 3. Risk output → Entity Extraction & Action Items
126
+
127
+ ## Performance Notes
128
+
129
+ - **Weak categories boosted by rules:** Market risk (90.6%), Regulatory risk (93.8%)
130
+ - **Hybrid approach:** Combines ML predictions with keyword pattern matching
131
+ - **Fast inference:** ~50-100ms per prediction
132
+ - **Explainable:** Returns which detection method was used (ML vs rules)
133
+
134
+ ---
135
+
136
+ **Model Location:** `/risk_classifier/` in rohin30n/Armour
137
+ **License:** Apache 2.0
138
+ **Tags:** risk-scoring, financial-nlp, hybrid-model, classification