Add final reconciliation break classifier model

Files changed (6) hide show

README.md CHANGED Viewed

@@ -1,14 +1,26 @@
-# Reconciliation Break Classifier
-This Space hosts a trained **ML-based reconciliation break classifier**.
-## Artifacts
-- `best_model.joblib` – trained ML pipeline
-- `feature_schema.json` – input feature definition
-- `metrics.json` – evaluation metrics
-## Usage
-Load using `joblib.load()` and apply on canonical reconciliation features.
-## Domain
-Finance · Banking · Transaction Reconciliation · Fraud Detection

+---
+license: mit
+tags:
+- fintech
+- reconciliation
+- classification
+---
+# FinTech Reconciliation Break Classifier (Pair-Level)
+This repository contains a trained model that predicts whether a bank↔broker reconciliation **pair** is a **TRUE_BREAK (1)** or **FALSE_BREAK (0)**.
+## Features Used
+['large_amt_mismatch', 'amt_diff', 'amt_pct', 'amt_diff_abs', 'amt_ratio', 'settlement_gap_days', 'is_weekend', 'is_settlement_weekend']
+## Best Run (selected by PR-AUC)
+- Experiment: exp_001_baseline
+- PR-AUC: 1.000
+- ROC-AUC: 1.000
+- F1: 0.998
+## Files
+- `model.joblib` : trained sklearn pipeline
+- `features.json`: selected feature list
+- `config.json`  : training metadata + best metrics
+- `predict.py`   : minimal inference helper

config.json ADDED Viewed

+{
+  "task": "break",
+  "label_column": "recon_break_label",
+  "label_map": {
+    "TRUE_BREAK": 1,
+    "FALSE_BREAK": 0
+  },
+  "selected_features": [
+    "large_amt_mismatch",
+    "amt_diff",
+    "amt_pct",
+    "amt_diff_abs",
+    "amt_ratio",
+    "settlement_gap_days",
+    "is_weekend",
+    "is_settlement_weekend"
+  ],
+  "winning_experiment": "exp_001_baseline",
+  "winning_iteration": 1,
+  "winning_metrics": {
+    "pr_auc": 1.0,
+    "roc_auc": 1.0,
+    "f1": 0.998330550918197,
+    "precision": 0.9966666666666667,
+    "recall": 1.0
+  },
+  "trained_at": "2025-12-26T23:06:16",
+  "model_config": {
+    "model_type": "logreg",
+    "logreg_C": 0.3,
+    "rf_n_estimators": 200,
+    "rf_max_depth": null,
+    "rf_min_samples_leaf": 5,
+    "gb_n_estimators": 100,
+    "gb_learning_rate": 0.05,
+    "gb_max_depth": 3,
+    "mlp_hidden": [
+      128
+    ],
+    "mlp_alpha": 0.0001,
+    "mlp_max_iter": 300
+  }
+}

features.json ADDED Viewed

+[
+  "large_amt_mismatch",
+  "amt_diff",
+  "amt_pct",
+  "amt_diff_abs",
+  "amt_ratio",
+  "settlement_gap_days",
+  "is_weekend",
+  "is_settlement_weekend"
+]

model.joblib ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:8dc12c819dcd18d71fe845a521c1dabd2639cb5a771f853b95b2a04a2f6c9f57
+size 3258

predict.py ADDED Viewed

+import json
+import joblib
+import pandas as pd
+from pathlib import Path
+HERE = Path(__file__).resolve().parent
+pipe = joblib.load(HERE / "model.joblib")
+features = json.loads((HERE / "features.json").read_text(encoding="utf-8"))
+def predict_proba(df: pd.DataFrame):
+    X = df[features].copy()
+    return pipe.predict_proba(X)[:, 1]
+def predict(df: pd.DataFrame, threshold: float = 0.5):
+    proba = predict_proba(df)
+    pred = (proba >= threshold).astype(int)
+    return pred, proba

requirements.txt ADDED Viewed

+pandas
+numpy
+scikit-learn
+joblib