Ikimina Digital Trust & Reliability Index
A lightweight scikit-learn + LightGBM pipeline that turns an Ikimina member's 12-month contribution history into a Reliability Index in [0, 100], designed for deployment over USSD on a feature phone in rural Rwanda.
Built for the AIMS KTT Fellowship Hackathon 2026 β Challenge T1.1.
- Repo:
[LINK](https://github.com/Ahmed-5/AIMS_KTT) - 4-minute video:
<YOUTUBE-UNLISTED-URL-HERE>
Intended use
- Primary use case: surface a coarse reliability tier (
low_risk/watch/high_risk) that an MFI loan officer can use as a conversation starter with an Ikimina member. - Secondary use case: trigger a Kinyarwanda / French SMS back to the Ikimina secretary after a short USSD session (
*654*MEMBER_ID#, ~20 RWF per query).
Out of scope
- β Not a credit bureau replacement. The index must not be the only signal in a loan decision.
- β Not calibrated for non-Ikimina data. The feature math assumes a 12-month weekly-contribution context; behaviour on any other substrate is undefined.
- β Not a deep-identity system. The USSD path transmits only a
member_idpseudonym over SS7 β no PII is embedded in the model input.
How the score is produced
- Read the member's monthly CSV record and their group's CSV record.
- Compute 12 engineered features (see
src/features.py::FEATURE_COLS):mean_on_time,on_time_volatility,recency_weighted_miss,max_on_time_streak,total_missed,penalty_paid_per_miss,borrow_to_repay,loan_burden,role_seniority,tenure_months,contrib_group_zscore,urban_flag. - Standard-scale β calibrated Logistic Regression β
P(default_within_6m). score = round(100 Β· (1 β P(default)))clipped to[0, 100].- Tiers (from the challenge brief):
0β40 high_risk Β· 41β70 watch Β· 71β100 low_risk. - Optional: blend with a group reliability index (stretch goal) using
alpha=0.2.
Training data
Fully synthetic β generated by src/generate_data.py with seed 42, following the brief's recipe line-by-line:
- 500 members Γ 12 months Γ 40 groups.
- Monthly missed contribution: Bernoulli(
p = base_miss), withbase_miss ~ Beta(2, 20)per member and AR(1) correlationΟ = 0.4across months. - Penalties: 50 % of miss-months get one, 70 % paid next month.
- Borrowing: ~30 % of members;
LogNormal(mean = 3 Β· weekly Β· 8). - Default label: logistic on
(missed_total, unpaid_penalties, borrow/repay, tenure)with intercept bisected to ~14 % positive rate. - Train / test split: last 100
member_ids are the holdout.
No real member data was used.
Evaluation (deterministic, seed 42)
| Metric | Value |
|---|---|
| Holdout ROC-AUC | 0.944 |
| Holdout Brier score | 0.056 |
| CV AUC β Logistic Regression | 0.854 |
| CV AUC β LightGBM | 0.828 |
| Chosen model | Logistic Regression + calibration |
| Holdout positive rate | 0.10 |
| Tier mix on holdout | high_risk: 2 Β· watch: 9 Β· low_risk: 89 |
Charts (in reports/): roc_curve.png, calibration_curve.png, feature_importance.png, score_distribution.png, district_heatmap.png.
Why Logistic Regression over LightGBM?
LightGBM's CV AUC was only 0.03 lower than LR. On N = 400 training rows and a regulator audience that needs to read (and argue with) the coefficients, the interpretability of a calibrated linear model outranks a 3-point AUC gap.
Limitations & known failure modes
- Skewed tier mix. Because the underlying default rate is ~10 %, most members land in
low_risk. The brief fixes the tier cut-offs; we do not re-quantile them. Downstream product design (seeussd_flow.md) fails towardwatch, neverlow_risk, whenever anything is uncertain. - Thin histories. Members with < 6 observed months are capped at the
watchtier viascorer.py::score_shadow(), and the system returns a widened 80 % band. - Synthetic drift. The model is trained only on synthetic data generated from the brief's recipe. Deploying against real Ikimina records will require retraining and revalidation. The generator is shipped in-repo precisely so this retraining is one command away.
- SS7 exposure. The USSD carrier layer is not encrypted. We minimise blast radius by sending only the
member_idpseudonym over the wire; seeussd_flow.mdfor the full privacy trade-off.
Files in this model release
| File | Purpose |
|---|---|
model.pkl |
joblib-pickled dict: {model, scaler, uses_scaler, feature_cols, chosen_model, cv_auc, holdout}. β 5 KB. |
group_reliability_index.csv |
Per-group aggregate reliability index used by the blending stretch goal. |
metrics.json |
Holdout AUC, Brier, CV AUC for both candidate models, tier mix. |
| Chart PNGs | ROC, calibration, feature importance, score distribution, district heatmap. |
Ethics & consent
- The USSD flow has a hard consent gate on Screen 1. No score is computed until the secretary confirms.
- Consent log is retained 18 months, then purged.
- Members can query their own query-log and revoke future queries at no cost via
*654*0#. - No PII (name, phone, district) ever enters the model's input or crosses SS7.
How to reproduce
pip install -r requirements.txt
python src/generate_data.py && python src/train_model.py && python scorer.py --member 412 --group 07
Total wall-clock on a free Colab CPU: < 2 min.
Citation
If you use this model or the synthetic generator, please cite:
Ikimina Digital Trust & Reliability Index β AIMS KTT Fellowship Hackathon 2026, Challenge T1.1. MIT licence.