File size: 5,041 Bytes
42a08fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
title: Credit Scoring - Home Credit Default Risk
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "4.44.1"
python_version: "3.12"
app_file: app.py
pinned: false
---

# OC_P6 - API Scoring Credit (MLOps)

## πŸš€ Demo live
https://huggingface.co/spaces/ASI-Engineer/OC_P8_prod
https://huggingface.co/spaces/ASI-Engineer/OC_P8_test

## Resultats optimisation etape 4
- Gain latence : **15.7x** (0.64 ms -> 0.04 ms par requete)
- Precision : 100 % identique
- Voir [reports/rapport_optimisation.md](reports/rapport_optimisation.md) complet

## Architecture finale
- FastAPI/Gradio + Docker (entrypoint : [app.py](app.py))
- Monitoring logs + Evidently (drift)
- Optimisation : VectorizedPreprocessor (15.7x)

## Etapes realisees
- Etape 2 : API + Docker + CI/CD
- Etape 3 : Stockage + analyse prod
- Etape 4 : Optimisation perfs (terminee)

## Apercu du projet (audit rapide)
- Donnees brutes et features : [data/raw](data/raw), [data/processed](data/processed)
- Pipeline data/model : [src/load_data.py](src/load_data.py), [src/preprocessing.py](src/preprocessing.py)
- Experiments et artefacts : [mlruns](mlruns), [models](models)
- Notebooks MLOps : [notebooks](notebooks)
- Monitoring prod : [logs/predictions.jsonl](logs/predictions.jsonl), [reports](reports)
- Tests : [tests](tests)
- Conteneurisation : [Dockerfile](Dockerfile)

## Structure du projet
```
OC_P6/
β”œβ”€β”€ app.py
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ requirements-inference.txt
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/
β”‚   └── processed/
β”œβ”€β”€ logs/
β”‚   └── predictions.jsonl
β”œβ”€β”€ mlruns/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ export_model.py
β”‚   β”œβ”€β”€ export_preprocessor.py
β”‚   β”œβ”€β”€ lightgbm.txt
β”‚   └── preprocessor.joblib
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01_exploration.ipynb
β”‚   β”œβ”€β”€ 02_preparation_features.ipynb
β”‚   β”œβ”€β”€ 03_LGBM.ipynb
β”‚   β”œβ”€β”€ 04_regression.ipynb
β”‚   β”œβ”€β”€ 05_model_interpretation.ipynb
β”‚   β”œβ”€β”€ 06_analyse_logs.ipynb
β”‚   β”œβ”€β”€ 07_detect_data_drift.ipynb
β”‚   β”œβ”€β”€ 08_analyze_logs_2.ipynb
β”‚   β”œβ”€β”€ 09_profiling.ipynb
β”‚   └── 10_optimisation.ipynb
β”œβ”€β”€ reference/
β”‚   β”œβ”€β”€ reference.csv
β”‚   └── simulate_production_calls.py
β”œβ”€β”€ reports/
β”‚   β”œβ”€β”€ data_drift_report.html
β”‚   β”œβ”€β”€ monitoring_study.md
β”‚   └── plots/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ load_data.py
β”‚   β”œβ”€β”€ mlflow_config.py
β”‚   └── preprocessing.py
└── tests/
  β”œβ”€β”€ conftest.py
  β”œβ”€β”€ test_predict.py
  └── test_preprocessing.py
```

## Installation (UV recommande)
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
```

## Donnees
Source : Kaggle Home Credit Default Risk.
Placer les fichiers dans [data/raw](data/raw) :
- application_train.csv
- application_test.csv
- bureau.csv
- bureau_balance.csv
- credit_card_balance.csv
- installments_payments.csv
- POS_CASH_balance.csv
- previous_application.csv

## Notebooks (resume)
- Exploration : [notebooks/01_exploration.ipynb](notebooks/01_exploration.ipynb)
- Feature engineering : [notebooks/02_preparation_features.ipynb](notebooks/02_preparation_features.ipynb)
- Modelling LGBM + MLflow : [notebooks/03_LGBM.ipynb](notebooks/03_LGBM.ipynb)
- Baseline regression : [notebooks/04_regression.ipynb](notebooks/04_regression.ipynb)
- Interpretation : [notebooks/05_model_interpretation.ipynb](notebooks/05_model_interpretation.ipynb)
- Monitoring et drift : [notebooks/06_analyse_logs.ipynb](notebooks/06_analyse_logs.ipynb), [notebooks/07_detect_data_drift.ipynb](notebooks/07_detect_data_drift.ipynb)
- Profiling et optimisation : [notebooks/09_profiling.ipynb](notebooks/09_profiling.ipynb), [notebooks/10_optimisation.ipynb](notebooks/10_optimisation.ipynb)

## Comment tester localement
```bash
uv sync
uv run python app.py
```

Option Docker :
```bash
docker build -t oc_p6:latest .
docker run --rm -it -p 7860:7860 oc_p6:latest
```

## Usage API (local ou HF Space)
Exemple JSON minimal :
```json
{"SK_ID_CURR": 100001, "AMT_INCOME_TOTAL": 202500.0, "AMT_CREDIT": 80000.0, "CODE_GENDER": "M", "DAYS_BIRTH": -12000}
```

Requete vers la Space de production :
```bash
curl -s -X POST "https://huggingface.co/spaces/ASI-Engineer/OC_P8_prod/api/predict" \
  -H "Content-Type: application/json" \
  -d '{"data":["{\"SK_ID_CURR\":100001,\"AMT_INCOME_TOTAL\":202500.0,\"AMT_CREDIT\":80000.0,\"CODE_GENDER\":\"M\",\"DAYS_BIRTH\":-12000}"]}'
```

## Monitoring et data drift
- Rapport monitoring : [reports/monitoring_study.md](reports/monitoring_study.md)
- Rapport drift Evidently : [reports/data_drift_report.html](reports/data_drift_report.html)
- Plots latence et scores : [reports/plots](reports/plots)
- Simulation d'appels prod : [reference/simulate_production_calls.py](reference/simulate_production_calls.py)

## Tests
```bash
uv run pytest
```

**Date** : 25 fevrier 2026  
**Statut** : Projet termine OK, pret pour soutenance