OC_P8_prod / README.md
GitHub Actions
Sync to HF Spaces [no-ci]
178345a
---
title: Credit Scoring - Home Credit Default Risk
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "4.44.1"
python_version: "3.12"
app_file: app.py
pinned: false
---
# OC_P6 - API Scoring Credit (MLOps)
## πŸš€ Demo live
https://huggingface.co/spaces/ASI-Engineer/OC_P8_prod
https://huggingface.co/spaces/ASI-Engineer/OC_P8_test
## Resultats optimisation etape 4
- Gain latence : **15.7x** (0.64 ms -> 0.04 ms par requete)
- Precision : 100 % identique
- Voir [reports/rapport_optimisation.md](reports/rapport_optimisation.md) complet
## Architecture finale
- FastAPI/Gradio + Docker (entrypoint : [app.py](app.py))
- Monitoring logs + Evidently (drift)
- Optimisation : VectorizedPreprocessor (15.7x)
## Etapes realisees
- Etape 2 : API + Docker + CI/CD
- Etape 3 : Stockage + analyse prod
- Etape 4 : Optimisation perfs (terminee)
## Apercu du projet (audit rapide)
- Donnees brutes et features : [data/raw](data/raw), [data/processed](data/processed)
- Pipeline data/model : [src/load_data.py](src/load_data.py), [src/preprocessing.py](src/preprocessing.py)
- Experiments et artefacts : [mlruns](mlruns), [models](models)
- Notebooks MLOps : [notebooks](notebooks)
- Monitoring prod : [logs/predictions.jsonl](logs/predictions.jsonl), [reports](reports)
- Tests : [tests](tests)
- Conteneurisation : [Dockerfile](Dockerfile)
## Structure du projet
```
OC_P6/
β”œβ”€β”€ app.py
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ requirements-inference.txt
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/
β”‚ └── processed/
β”œβ”€β”€ logs/
β”‚ └── predictions.jsonl
β”œβ”€β”€ mlruns/
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ export_model.py
β”‚ β”œβ”€β”€ export_preprocessor.py
β”‚ β”œβ”€β”€ lightgbm.txt
β”‚ └── preprocessor.joblib
β”œβ”€β”€ notebooks/
β”‚ β”œβ”€β”€ 01_exploration.ipynb
β”‚ β”œβ”€β”€ 02_preparation_features.ipynb
β”‚ β”œβ”€β”€ 03_LGBM.ipynb
β”‚ β”œβ”€β”€ 04_regression.ipynb
β”‚ β”œβ”€β”€ 05_model_interpretation.ipynb
β”‚ β”œβ”€β”€ 06_analyse_logs.ipynb
β”‚ β”œβ”€β”€ 07_detect_data_drift.ipynb
β”‚ β”œβ”€β”€ 08_analyze_logs_2.ipynb
β”‚ β”œβ”€β”€ 09_profiling.ipynb
β”‚ └── 10_optimisation.ipynb
β”œβ”€β”€ reference/
β”‚ β”œβ”€β”€ reference.csv
β”‚ └── simulate_production_calls.py
β”œβ”€β”€ reports/
β”‚ β”œβ”€β”€ data_drift_report.html
β”‚ β”œβ”€β”€ monitoring_study.md
β”‚ └── plots/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ load_data.py
β”‚ β”œβ”€β”€ mlflow_config.py
β”‚ └── preprocessing.py
└── tests/
β”œβ”€β”€ conftest.py
β”œβ”€β”€ test_predict.py
└── test_preprocessing.py
```
## Installation (UV recommande)
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
```
## Donnees
Source : Kaggle Home Credit Default Risk.
Placer les fichiers dans [data/raw](data/raw) :
- application_train.csv
- application_test.csv
- bureau.csv
- bureau_balance.csv
- credit_card_balance.csv
- installments_payments.csv
- POS_CASH_balance.csv
- previous_application.csv
## Notebooks (resume)
- Exploration : [notebooks/01_exploration.ipynb](notebooks/01_exploration.ipynb)
- Feature engineering : [notebooks/02_preparation_features.ipynb](notebooks/02_preparation_features.ipynb)
- Modelling LGBM + MLflow : [notebooks/03_LGBM.ipynb](notebooks/03_LGBM.ipynb)
- Baseline regression : [notebooks/04_regression.ipynb](notebooks/04_regression.ipynb)
- Interpretation : [notebooks/05_model_interpretation.ipynb](notebooks/05_model_interpretation.ipynb)
- Monitoring et drift : [notebooks/06_analyse_logs.ipynb](notebooks/06_analyse_logs.ipynb), [notebooks/07_detect_data_drift.ipynb](notebooks/07_detect_data_drift.ipynb)
- Profiling et optimisation : [notebooks/09_profiling.ipynb](notebooks/09_profiling.ipynb), [notebooks/10_optimisation.ipynb](notebooks/10_optimisation.ipynb)
## Comment tester localement
```bash
uv sync
uv run python app.py
```
Option Docker :
```bash
docker build -t oc_p6:latest .
docker run --rm -it -p 7860:7860 oc_p6:latest
```
## Usage API (local ou HF Space)
Exemple JSON minimal :
```json
{"SK_ID_CURR": 100001, "AMT_INCOME_TOTAL": 202500.0, "AMT_CREDIT": 80000.0, "CODE_GENDER": "M", "DAYS_BIRTH": -12000}
```
Requete vers la Space de production :
```bash
curl -s -X POST "https://huggingface.co/spaces/ASI-Engineer/OC_P8_prod/api/predict" \
-H "Content-Type: application/json" \
-d '{"data":["{\"SK_ID_CURR\":100001,\"AMT_INCOME_TOTAL\":202500.0,\"AMT_CREDIT\":80000.0,\"CODE_GENDER\":\"M\",\"DAYS_BIRTH\":-12000}"]}'
```
## Monitoring et data drift
- Rapport monitoring : [reports/monitoring_study.md](reports/monitoring_study.md)
- Rapport drift Evidently : [reports/data_drift_report.html](reports/data_drift_report.html)
- Plots latence et scores : [reports/plots](reports/plots)
- Simulation d'appels prod : [reference/simulate_production_calls.py](reference/simulate_production_calls.py)
## Tests
```bash
uv run pytest
```
**Date** : 25 fevrier 2026
**Statut** : Projet termine OK, pret pour soutenance