Ikimina Digital Trust & Reliability Index

A lightweight scikit-learn + LightGBM pipeline that turns an Ikimina member's 12-month contribution history into a Reliability Index in [0, 100], designed for deployment over USSD on a feature phone in rural Rwanda.

Built for the AIMS KTT Fellowship Hackathon 2026 — Challenge T1.1.

Repo: [LINK](https://github.com/Ahmed-5/AIMS_KTT)
4-minute video: <YOUTUBE-UNLISTED-URL-HERE>

Intended use

Primary use case: surface a coarse reliability tier (low_risk / watch / high_risk) that an MFI loan officer can use as a conversation starter with an Ikimina member.
Secondary use case: trigger a Kinyarwanda / French SMS back to the Ikimina secretary after a short USSD session (*654*MEMBER_ID#, ~20 RWF per query).

Out of scope

❌ Not a credit bureau replacement. The index must not be the only signal in a loan decision.
❌ Not calibrated for non-Ikimina data. The feature math assumes a 12-month weekly-contribution context; behaviour on any other substrate is undefined.
❌ Not a deep-identity system. The USSD path transmits only a member_id pseudonym over SS7 — no PII is embedded in the model input.

How the score is produced

Read the member's monthly CSV record and their group's CSV record.
Compute 12 engineered features (see src/features.py::FEATURE_COLS): mean_on_time, on_time_volatility, recency_weighted_miss, max_on_time_streak, total_missed, penalty_paid_per_miss, borrow_to_repay, loan_burden, role_seniority, tenure_months, contrib_group_zscore, urban_flag.
Standard-scale → calibrated Logistic Regression → P(default_within_6m).
score = round(100 · (1 − P(default))) clipped to [0, 100].
Tiers (from the challenge brief): 0–40 high_risk · 41–70 watch · 71–100 low_risk.
Optional: blend with a group reliability index (stretch goal) using alpha=0.2.

Training data

Fully synthetic — generated by src/generate_data.py with seed 42, following the brief's recipe line-by-line:

500 members × 12 months × 40 groups.
Monthly missed contribution: Bernoulli(p = base_miss), with base_miss ~ Beta(2, 20) per member and AR(1) correlation ρ = 0.4 across months.
Penalties: 50 % of miss-months get one, 70 % paid next month.
Borrowing: ~30 % of members; LogNormal(mean = 3 · weekly · 8).
Default label: logistic on (missed_total, unpaid_penalties, borrow/repay, tenure) with intercept bisected to ~14 % positive rate.
Train / test split: last 100 member_ids are the holdout.

No real member data was used.

Evaluation (deterministic, seed 42)

Metric	Value
Holdout ROC-AUC	0.944
Holdout Brier score	0.056
CV AUC — Logistic Regression	0.854
CV AUC — LightGBM	0.828
Chosen model	Logistic Regression + calibration
Holdout positive rate	0.10
Tier mix on holdout	high_risk: 2 · watch: 9 · low_risk: 89

Charts (in reports/): roc_curve.png, calibration_curve.png, feature_importance.png, score_distribution.png, district_heatmap.png.

Why Logistic Regression over LightGBM?

LightGBM's CV AUC was only 0.03 lower than LR. On N = 400 training rows and a regulator audience that needs to read (and argue with) the coefficients, the interpretability of a calibrated linear model outranks a 3-point AUC gap.

Limitations & known failure modes

Skewed tier mix. Because the underlying default rate is ~10 %, most members land in low_risk. The brief fixes the tier cut-offs; we do not re-quantile them. Downstream product design (see ussd_flow.md) fails toward watch, never low_risk, whenever anything is uncertain.
Thin histories. Members with < 6 observed months are capped at the watch tier via scorer.py::score_shadow(), and the system returns a widened 80 % band.
Synthetic drift. The model is trained only on synthetic data generated from the brief's recipe. Deploying against real Ikimina records will require retraining and revalidation. The generator is shipped in-repo precisely so this retraining is one command away.
SS7 exposure. The USSD carrier layer is not encrypted. We minimise blast radius by sending only the member_id pseudonym over the wire; see ussd_flow.md for the full privacy trade-off.

Files in this model release

File	Purpose
`model.pkl`	joblib-pickled dict: `{model, scaler, uses_scaler, feature_cols, chosen_model, cv_auc, holdout}`. ≈ 5 KB.
`group_reliability_index.csv`	Per-group aggregate reliability index used by the blending stretch goal.
`metrics.json`	Holdout AUC, Brier, CV AUC for both candidate models, tier mix.
Chart PNGs	ROC, calibration, feature importance, score distribution, district heatmap.

Ethics & consent

The USSD flow has a hard consent gate on Screen 1. No score is computed until the secretary confirms.
Consent log is retained 18 months, then purged.
Members can query their own query-log and revoke future queries at no cost via *654*0#.
No PII (name, phone, district) ever enters the model's input or crosses SS7.

How to reproduce

pip install -r requirements.txt
python src/generate_data.py && python src/train_model.py && python scorer.py --member 412 --group 07

Total wall-clock on a free Colab CPU: < 2 min.

Citation

If you use this model or the synthetic generator, please cite:

Ikimina Digital Trust & Reliability Index — AIMS KTT Fellowship Hackathon 2026, Challenge T1.1. MIT licence.

Downloads last month: -; Downloads are not tracked for this model. How to track