IITRohit commited on
Commit
99cb9cf
·
verified ·
1 Parent(s): 845f575

Add final reconciliation break classifier model

Browse files
Files changed (6) hide show
  1. README.md +22 -10
  2. config.json +43 -0
  3. features.json +10 -0
  4. model.joblib +3 -0
  5. predict.py +17 -0
  6. requirements.txt +4 -0
README.md CHANGED
@@ -1,14 +1,26 @@
1
- # Reconciliation Break Classifier
 
 
 
 
 
 
2
 
3
- This Space hosts a trained **ML-based reconciliation break classifier**.
4
 
5
- ## Artifacts
6
- - `best_model.joblib` – trained ML pipeline
7
- - `feature_schema.json` – input feature definition
8
- - `metrics.json` – evaluation metrics
9
 
10
- ## Usage
11
- Load using `joblib.load()` and apply on canonical reconciliation features.
12
 
13
- ## Domain
14
- Finance · Banking · Transaction Reconciliation · Fraud Detection
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - fintech
5
+ - reconciliation
6
+ - classification
7
+ ---
8
 
9
+ # FinTech Reconciliation Break Classifier (Pair-Level)
10
 
11
+ This repository contains a trained model that predicts whether a bank↔broker reconciliation **pair** is a **TRUE_BREAK (1)** or **FALSE_BREAK (0)**.
 
 
 
12
 
13
+ ## Features Used
14
+ ['large_amt_mismatch', 'amt_diff', 'amt_pct', 'amt_diff_abs', 'amt_ratio', 'settlement_gap_days', 'is_weekend', 'is_settlement_weekend']
15
 
16
+ ## Best Run (selected by PR-AUC)
17
+ - Experiment: exp_001_baseline
18
+ - PR-AUC: 1.000
19
+ - ROC-AUC: 1.000
20
+ - F1: 0.998
21
+
22
+ ## Files
23
+ - `model.joblib` : trained sklearn pipeline
24
+ - `features.json`: selected feature list
25
+ - `config.json` : training metadata + best metrics
26
+ - `predict.py` : minimal inference helper
config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task": "break",
3
+ "label_column": "recon_break_label",
4
+ "label_map": {
5
+ "TRUE_BREAK": 1,
6
+ "FALSE_BREAK": 0
7
+ },
8
+ "selected_features": [
9
+ "large_amt_mismatch",
10
+ "amt_diff",
11
+ "amt_pct",
12
+ "amt_diff_abs",
13
+ "amt_ratio",
14
+ "settlement_gap_days",
15
+ "is_weekend",
16
+ "is_settlement_weekend"
17
+ ],
18
+ "winning_experiment": "exp_001_baseline",
19
+ "winning_iteration": 1,
20
+ "winning_metrics": {
21
+ "pr_auc": 1.0,
22
+ "roc_auc": 1.0,
23
+ "f1": 0.998330550918197,
24
+ "precision": 0.9966666666666667,
25
+ "recall": 1.0
26
+ },
27
+ "trained_at": "2025-12-26T23:06:16",
28
+ "model_config": {
29
+ "model_type": "logreg",
30
+ "logreg_C": 0.3,
31
+ "rf_n_estimators": 200,
32
+ "rf_max_depth": null,
33
+ "rf_min_samples_leaf": 5,
34
+ "gb_n_estimators": 100,
35
+ "gb_learning_rate": 0.05,
36
+ "gb_max_depth": 3,
37
+ "mlp_hidden": [
38
+ 128
39
+ ],
40
+ "mlp_alpha": 0.0001,
41
+ "mlp_max_iter": 300
42
+ }
43
+ }
features.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ "large_amt_mismatch",
3
+ "amt_diff",
4
+ "amt_pct",
5
+ "amt_diff_abs",
6
+ "amt_ratio",
7
+ "settlement_gap_days",
8
+ "is_weekend",
9
+ "is_settlement_weekend"
10
+ ]
model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dc12c819dcd18d71fe845a521c1dabd2639cb5a771f853b95b2a04a2f6c9f57
3
+ size 3258
predict.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import joblib
3
+ import pandas as pd
4
+ from pathlib import Path
5
+
6
+ HERE = Path(__file__).resolve().parent
7
+ pipe = joblib.load(HERE / "model.joblib")
8
+ features = json.loads((HERE / "features.json").read_text(encoding="utf-8"))
9
+
10
+ def predict_proba(df: pd.DataFrame):
11
+ X = df[features].copy()
12
+ return pipe.predict_proba(X)[:, 1]
13
+
14
+ def predict(df: pd.DataFrame, threshold: float = 0.5):
15
+ proba = predict_proba(df)
16
+ pred = (proba >= threshold).astype(int)
17
+ return pred, proba
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ pandas
2
+ numpy
3
+ scikit-learn
4
+ joblib