Spaces:

ASI-Engineer
/

oc_p5-dev

Sleeping

App Files Files Community

ASI-Engineer commited on Jan 1

Commit

7263194

verified ·

1 Parent(s): f7e79ec

Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitignore +6 -0
README.md +810 -65
README_HF.md +17 -3
mkdocs.yml +175 -0
src/config.py +1 -1
src/logger.py +1 -1

.gitignore CHANGED Viewed

@@ -77,3 +77,9 @@ mlruns/
 # =====================
 .ipynb_checkpoints/
 *.ipynb_checkpoints/

 # =====================
 .ipynb_checkpoints/
 *.ipynb_checkpoints/
+# =====================
+# MkDocs
+# =====================
+site/
+.cache/

README.md CHANGED Viewed

@@ -1,93 +1,447 @@
 ---
-title: Employee Turnover Prediction API
-emoji: 👔
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-pinned: true
-license: mit
-app_port: 7860
 ---
-# Employee Turnover Prediction API 🚀 (v3.2.1)
-API de prédiction du turnover des employés (XGBoost + SMOTE) avec endpoints batch, validation stricte et documentation à jour.
-## 🎯 Fonctionnalités
-- ✅ Prédiction de turnover (0 = reste, 1 = part)
-- 📦 Endpoint batch CSV (3 fichiers bruts)
-- 🎛️ Sliders Gradio et schémas Pydantic alignés sur les min/max réels
-- 📊 Probabilités et niveau de risque (Low/Medium/High)
-- 🔐 Authentification API Key (obligatoire)
-- 📝 Logs structurés JSON
-- 🛡️ Rate limiting (20 req/min)
-- 📚 Documentation OpenAPI/Swagger
-## 🔗 Endpoints
-| Endpoint | Description |
-|----------|-------------|
-| `/docs` | Documentation interactive Swagger |
-| `/health` | Status de l'API |
-| `/ui` | Interface Gradio interactive |
-| `/predict` | Prédiction unitaire (JSON, contraintes réelles) |
-| `/predict/batch` | Prédiction batch (3 fichiers CSV bruts) |
 ## 🚀 Utilisation
-### Prédiction unitaire (toutes contraintes appliquées)
 ```bash
-curl -X POST https://asi-engineer-oc-p5-dev.hf.space/predict \
   -H "Content-Type: application/json" \
-  -H "X-API-Key: your-key" \
   -d '{
-    "nombre_participation_pee": 0,
-    "nb_formations_suivies": 2,
-    "nombre_employee_sous_responsabilite": 1,
-    "distance_domicile_travail": 15,
-    "niveau_education": 3,
-    "domaine_etude": "Infra & Cloud",
-    "ayant_enfants": "Y",
-    "frequence_deplacement": "Occasionnel",
-    "annees_depuis_la_derniere_promotion": 2,
-    "annes_sous_responsable_actuel": 5,
-    "satisfaction_employee_environnement": 3,
-    "note_evaluation_precedente": 4,
-    "niveau_hierarchique_poste": 2,
-    "satisfaction_employee_nature_travail": 3,
-    "satisfaction_employee_equipe": 3,
-    "satisfaction_employee_equilibre_pro_perso": 2,
-    "note_evaluation_actuelle": 4,
-    "heure_supplementaires": "Non",
-    "augementation_salaire_precedente": 5.5,
     "age": 35,
     "genre": "M",
     "revenu_mensuel": 4500.0,
-    "statut_marital": "Marié(e)",
-    "departement": "Commercial",
-    "poste": "Manager",
-    "nombre_experiences_precedentes": 3,
-    "nombre_heures_travailless": 80,
-    "annee_experience_totale": 10,
-    "annees_dans_l_entreprise": 5,
-    "annees_dans_le_poste_actuel": 2
   }'
 ```
-### Prédiction batch (3 fichiers CSV bruts)
 ```bash
-curl -X POST https://asi-engineer-oc-p5-dev.hf.space/predict/batch \
   -H "X-API-Key: your-key" \
-  -F "sondage_file=@extrait_sondage.csv" \
-  -F "eval_file=@extrait_eval.csv" \
-  -F "sirh_file=@extrait_sirh.csv"
 ```
-**Réponse :**
 ```json
 {
   "total_employees": 1470,
@@ -100,7 +454,398 @@ curl -X POST https://asi-engineer-oc-p5-dev.hf.space/predict/batch \
 }
 ```
-## 📚 Documentation complète
-Voir [docs/API.md](docs/API.md) ou le [GitHub Repository](https://github.com/chaton59/OC_P5) pour la documentation complète et les contraintes détaillées (min/max, enums, etc).

+<div align="center">
+# 🚀 Employee Turnover Prediction API
+[![Python Version](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
+[![FastAPI](https://img.shields.io/badge/FastAPI-0.115.14-009688.svg)](https://fastapi.tiangolo.com)
+[![Code Coverage](https://img.shields.io/badge/coverage-70.26%25-yellow.svg)](htmlcov/index.html)
+[![Tests](https://img.shields.io/badge/tests-97%20passed-success.svg)](tests/)
+[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+**API REST de prédiction du turnover des employés basée sur Machine Learning (XGBoost + SMOTE)**
+[🔗 Demo Production](https://asi-engineer-oc-p5.hf.space) · [📚 Documentation](docs/) · [🐛 Report Bug](https://github.com/chaton59/OC_P5/issues) · [💡 Request Feature](https://github.com/chaton59/OC_P5/issues)
+</div>
 ---
+## 📋 Table des Matières
+- [À Propos du Projet](#-à-propos-du-projet)
+- [Architecture](#-architecture)
+- [Choix Techniques](#-choix-techniques)
+- [Installation](#-installation)
+- [Utilisation](#-utilisation)
+- [Déploiement](#-déploiement)
+- [Mise à Jour](#-mise-à-jour)
+- [Tests](#-tests)
+- [Documentation](#-documentation)
+- [Changelog](#-changelog)
+- [Auteurs](#-auteurs)
+- [Licence](#-licence)
 ---
+## 📊 À Propos du Projet
+### Vue d'ensemble
+Ce projet déploie un **modèle de Machine Learning** en production via une **API REST moderne** pour prédire le risque de départ des employés d'une entreprise. Développé dans le cadre du projet OpenClassrooms P5 "Déployez votre modèle de Machine Learning", il illustre les **meilleures pratiques** d'ingénierie logicielle et de MLOps.
+### Problématique
+Les entreprises perdent des talents clés sans pouvoir anticiper. Ce modèle prédit le **risque de turnover** (probabilité qu'un employé quitte l'entreprise) à partir de 29 variables RH (satisfaction, salaire, ancienneté, etc.).
+### Solution
+API REST performante exposant un modèle **XGBoost optimisé** avec :
+- ✅ **Validation robuste** des données via Pydantic
+- ✅ **Prédictions en temps réel** (<2s) ou par batch (CSV)
+- ✅ **Traçabilité complète** via PostgreSQL et logs JSON
+- ✅ **Monitoring** et health checks intégrés
+- ✅ **CI/CD automatisé** avec GitHub Actions
+- ✅ **Déploiement cloud** sur HuggingFace Spaces
+### Performances du Modèle
+| Métrique | Valeur | Interprétation |
+|----------|--------|----------------|
+| **F1 Score** | 0.85 | Excellent équilibre précision/recall |
+| **Recall** | 0.88 | Détecte 88% des départs réels |
+| **Precision** | 0.82 | 82% des prédictions "départ" sont correctes |
+| **ROC AUC** | 0.91 | Excellente capacité de discrimination |
+📊 Voir [docs/MODEL_TECHNICAL.md](docs/MODEL_TECHNICAL.md) pour analyse détaillée.
+### Fonctionnalités Clés
+- 🔮 **Prédiction unitaire** : Prédit le risque pour un employé (JSON)
+- 📦 **Prédiction batch** : Traite des fichiers CSV complets (1000+ employés)
+- 🔐 **Authentification** : API Key sécurisée (production)
+- 🛡️ **Rate limiting** : 20 req/min pour éviter les abus
+- 📊 **Monitoring** : Health check et logs structurés JSON
+- 🎨 **Interface Gradio** : UI web pour tests interactifs
+- 📚 **Documentation auto** : Swagger UI et ReDoc intégrés
+- 🗄️ **Traçabilité** : Toutes les prédictions enregistrées en base PostgreSQL
+**Version actuelle** : 3.2.1 | **Dernière mise à jour** : Janvier 2026
+---
+## 🏗️ Architecture
+### Vue d'ensemble High-Level
+```
+┌──────────────┐         ┌──────────────┐         ┌──────────────┐
+│   CLIENT     │────────▶│   API REST   │────────▶│  BASE DE     │
+│              │  JSON   │   (FastAPI)  │  SQL    │  DONNÉES     │
+│  • curl      │         │              │         │ (PostgreSQL) │
+│  • Python    │         │  • Validation│         │              │
+│  • JS        │◀────────│  • Authent.  │◀────────│  • dataset   │
+│  • Postman   │  200 OK │  • Logging   │  SELECT │  • ml_logs   │
+└──────────────┘         └──────┬───────┘         └──────────────┘
+                                │
+                                ▼
+                         ┌──────────────┐
+                         │   MODÈLE ML  │
+                         │  (XGBoost +  │
+                         │    SMOTE)    │
+                         │              │
+                         │ HF Hub Cache │
+                         └──────────────┘
+```
+### Pipeline de Prédiction
+```
+Données brutes
+    │
+    ▼
+┌─────────────────────┐
+│  1. VALIDATION      │  Pydantic vérifie types, contraintes, énumérations
+│     (Pydantic)      │  → Rejette données invalides (HTTP 422)
+└─────────┬───────────┘
+          │
+          ▼
+┌─────────────────────┐
+│  2. PREPROCESSING   │  • Feature engineering (ratios, moyennes)
+│     (StandardScaler)│  • OneHot encoding (catégorielles non-ordonnées)
+│                     │  • Ordinal encoding (fréquence déplacements)
+└─────────┬───────────┘  • Scaling (StandardScaler)
+          │
+          ▼
+┌─────────────────────┐
+│  3. PRÉDICTION      │  XGBoost prédit classe (0/1) + probabilités
+│     (XGBoost)       │  • 0 = Reste dans l'entreprise
+└─────────┬───────────┘  • 1 = Va quitter l'entreprise
+          │
+          ▼
+┌─────────────────────┐
+│  4. POST-TRAITEMENT │  • Calcul niveau de risque (Low/Medium/High)
+│     (API)           │  • Enregistrement en DB (ml_logs)
+└─────────┬───────────┘  • Logging structuré JSON
+          │
+          ▼
+    Réponse JSON
+```
+### Structure du Projet
+```
+OC_P5/
+├── api.py                      # 🚪 Point d'entrée FastAPI principal
+├── app.py                      # 🎨 Point d'entrée Gradio (HF Spaces)
+├── src/
+│   ├── auth.py                 # 🔐 Authentification API Key
+│   ├── config.py               # ⚙️ Configuration centralisée (.env)
+│   ├── logger.py               # 📝 Logging structuré JSON
+│   ├── models.py               # 🤖 Chargement modèle depuis HuggingFace Hub
+│   ├── preprocessing.py        # 🔧 Pipeline de preprocessing
+│   ├── rate_limit.py           # 🛡️ Rate limiting (SlowAPI)
+│   ├── schemas.py              # ✅ Validation Pydantic (29 champs)
+│   └── gradio_ui.py            # 🎨 Interface Gradio web
+├── tests/                      # ✅ Suite de tests (97 tests, 70% coverage)
+│   ├── test_api_auth.py        # Tests authentification
+│   ├── test_api_predict.py     # Tests prédictions
+│   ├── test_api_validation.py  # Tests validation Pydantic
+│   ├── test_database.py        # Tests PostgreSQL
+│   └── test_model.py           # Tests modèle ML
+├── ml_model/                   # 🎓 Scripts d'entraînement
+│   ├── main.py                 # Pipeline complet train
+│   ├── train_model.py          # Training XGBoost + MLflow
+│   └── preprocess.py           # Preprocessing dataset
+├── scripts/                    # 🔧 Scripts utilitaires
+│   ├── create_db.py            # Création base PostgreSQL
+│   └── insert_dataset.py       # Insertion données
+├── docs/                       # 📚 Documentation complète
+│   ├── API_GUIDE.md            # Guide API détaillé
+│   ├── MODEL_TECHNICAL.md      # Doc technique modèle
+│   ├── DEPLOYMENT.md           # Guide déploiement
+│   ├── TRAINING.md             # Guide entraînement
+│   └── database_guide.md       # Guide PostgreSQL
+├── data/                       # 📊 Données sources (1470 employés)
+│   ├── extrait_sondage.csv     # Données satisfaction
+│   ├── extrait_eval.csv        # Données évaluations
+│   └── extrait_sirh.csv        # Données RH administratives
+├── logs/                       # 📋 Logs JSON
+│   ├── api.log                 # Tous les événements
+│   └── error.log               # Erreurs uniquement
+├── .github/workflows/          # 🔄 CI/CD
+│   └── ci-cd.yml               # GitHub Actions (lint, test, deploy)
+├── pyproject.toml              # 📦 Configuration Poetry
+├── .env.example                # 🔑 Template variables environnement
+└── README.md                   # 📖 Ce fichier
+```
+---
+## 🎯 Choix Techniques
+### Justifications des Technologies
+| Technologie | Alternative | Pourquoi ce choix ? |
+|-------------|-------------|---------------------|
+| **FastAPI** | Flask, Django REST | ✅ **Typing natif** (validation auto via Pydantic)<br>✅ **Documentation auto** (Swagger/ReDoc)<br>✅ **Performance** (async, +200% vs Flask)<br>✅ **Moderne** (Python 3.12, type hints) |
+| **PostgreSQL** | MongoDB, SQLite | ✅ **Relationnel** adapté aux données structurées RH<br>✅ **ACID** pour garantir intégrité<br>✅ **Scalabilité** (index, partitioning)<br>✅ **Outils matures** (DBeaver, pgAdmin) |
+| **XGBoost** | Random Forest, NN | ✅ **Performance** sur données tabulaires<br>✅ **Régularisation** intégrée (évite overfitting)<br>✅ **Feature importance** nativement<br>✅ **Rapide** (parallélisation) |
+| **SMOTE** | Class weights, Under-sampling | ✅ **Génère exemples synthétiques** (vs duplication)<br>✅ **Évite surapprentissage**<br>✅ **Intégré imblearn** (CV-safe)<br>✅ +7% F1 vs class weights |
+| **Pydantic** | Marshmallow, Cerberus | ✅ **Validation en C** (via Rust, très rapide)<br>✅ **Messages d'erreur clairs**<br>✅ **Intégration FastAPI** native<br>✅ **Type safety** compile-time |
+| **HuggingFace Hub** | S3, GCP Storage | ✅ **Gratuit** jusqu'à 100GB<br>✅ **Versioning** automatique<br>✅ **CDN global** (latence faible)<br>✅ **Communauté** ML active |
+| **Poetry** | pip, conda | ✅ **Lock file** (reproductibilité garantie)<br>✅ **Gestion dépendances** (résolution conflits)<br>✅ **Build/Publish** intégrés<br>✅ **pyproject.toml** standard moderne |
+| **GitHub Actions** | GitLab CI, Jenkins | ✅ **Gratuit** pour repos publics<br>✅ **Intégration GitHub** native<br>✅ **Marketplace** d'actions prêtes<br>✅ **Déploiement HF** simplifié |
+### Architecture Technique
+**Pattern utilisé** : **3-Tier Architecture** (Présentation - Logique - Données)
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    PRESENTATION LAYER                        │
+│  • FastAPI (REST API)                                       │
+│  • Gradio (Web UI)                                          │
+│  • Swagger/ReDoc (Documentation interactive)                │
+└────────────────────────┬────────────────────────────────────┘
+                         │
+┌────────────────────────▼────────────────────────────────────┐
+│                     BUSINESS LAYER                           │
+│  • Validation (Pydantic)                                    │
+│  • Authentification (API Key)                               │
+│  • Rate Limiting (SlowAPI)                                  │
+│  • Preprocessing (Feature Engineering)                      │
+│  • Prédiction (XGBoost Model)                               │
+│  • Logging (JSON Structured)                                │
+└────────────────────────┬────────────────────────────────────┘
+                         │
+┌────────────────────────▼────────────────────────────────────┐
+│                      DATA LAYER                              │
+│  • PostgreSQL (Traçabilité prédictions)                     │
+│  • HuggingFace Hub (Modèle ML en cache)                     │
+│  • CSV Files (Données sources)                              │
+└─────────────────────────────────────────────────────────────┘
+```
+---
+## ⚙️ Installation
+### Prérequis
+| Outil | Version | Installation |
+|-------|---------|--------------|
+| **Python** | 3.12+ | [python.org](https://www.python.org/downloads/) |
+| **Poetry** | 1.7+ | `curl -sSL https://install.python-poetry.org \| python3 -` |
+| **PostgreSQL** | 14+ | [postgresql.org](https://www.postgresql.org/download/) ou Docker |
+| **Git** | 2.0+ | [git-scm.com](https://git-scm.com/downloads) |
+### Étape 1 : Cloner le Repository
+```bash
+git clone https://github.com/chaton59/OC_P5.git
+cd OC_P5
+```
+### Étape 2 : Installer les Dépendances
+```bash
+# Installation via Poetry (recommandé)
+poetry install
+# Activer l'environnement virtuel
+poetry shell
+# OU utiliser pip (fallback)
+pip install -r requirements.txt
+```
+### Étape 3 : Configuration de l'Environnement
+```bash
+# Copier le template
+cp .env.example .env
+# Éditer .env avec vos valeurs
+nano .env  # ou vim, code, etc.
+```
+**Variables à configurer** (`.env`) :
+```bash
+# === MODE ===
+DEBUG=true  # false en production (active auth + rate limiting)
+# === API ===
+API_KEY=your-secret-api-key-here  # Générer avec: python -c "import secrets; print(secrets.token_urlsafe(32))"
+LOG_LEVEL=INFO  # DEBUG, INFO, WARNING, ERROR, CRITICAL
+# === DATABASE (PostgreSQL) ===
+DB_HOST=localhost
+DB_PORT=5432
+DB_NAME=oc_p5_db
+DB_USER=ml_user
+DB_PASSWORD=your-secure-password  # À changer !
+# === HUGGINGFACE ===
+HF_MODEL_REPO=ASI-Engineer/employee-turnover-model
+MODEL_FILENAME=model/model.pkl
+# HF_TOKEN=hf_xxx  # Optionnel (modèles publics)
+```
+### Étape 4 : Configurer la Base de Données PostgreSQL
+#### Option A : Installation locale PostgreSQL
+```bash
+# Ubuntu/Debian
+sudo apt update
+sudo apt install postgresql postgresql-contrib
+# macOS (via Homebrew)
+brew install postgresql@14
+brew services start postgresql@14
+# Windows : Télécharger depuis https://www.postgresql.org/download/windows/
+```
+#### Option B : Docker (recommandé pour développement)
+```bash
+# Démarrer PostgreSQL dans un conteneur
+docker run --name oc_p5_postgres \
+  -e POSTGRES_USER=ml_user \
+  -e POSTGRES_PASSWORD=your-password \
+  -e POSTGRES_DB=oc_p5_db \
+  -p 5432:5432 \
+  -d postgres:14
+```
+#### Créer les tables
+```bash
+# Créer les tables (dataset, ml_logs)
+poetry run python scripts/create_db.py
+# Insérer le dataset (1470 employés)
+poetry run python scripts/insert_dataset.py
+# Vérifier l'insertion
+psql -h localhost -U ml_user -d oc_p5_db -c "SELECT COUNT(*) FROM dataset;"
+# Résultat attendu : 1470
+```
+**Schéma de la base de données** :
+![Schéma BDD](docs/schema.png)
+📖 **Guide complet débutant** : [docs/database_guide.md](docs/database_guide.md)
+### Étape 5 : Vérifier l'Installation
+```bash
+# Tester que tout fonctionne
+poetry run pytest tests/ -v
+# Résultat attendu : 97 tests passés (ou 86 si skipped déployés)
+```
+---
 ## 🚀 Utilisation
+### Démarrer l'API Localement
 ```bash
+# Mode développement (avec auto-reload)
+poetry run uvicorn api:app --reload --host 127.0.0.1 --port 8000
+# Mode production
+poetry run uvicorn api:app --host 0.0.0.0 --port 8000 --workers 4
+```
+**URLs disponibles** :
+| Service | URL | Description |
+|---------|-----|-------------|
+| **API** | http://localhost:8000 | Endpoint principal |
+| **Swagger UI** | http://localhost:8000/docs | Documentation interactive |
+| **ReDoc** | http://localhost:8000/redoc | Documentation alternative |
+| **Health Check** | http://localhost:8000/health | Statut de l'API |
+| **Gradio UI** | http://localhost:8000/ui | Interface web (si activée) |
+### Exemples d'Appels API
+#### 1. Health Check
+```bash
+curl http://localhost:8000/health
+```
+**Réponse** :
+```json
+{
+  "status": "healthy",
+  "model_loaded": true,
+  "model_type": "Pipeline",
+  "version": "3.2.1"
+}
+```
+#### 2. Prédiction Unitaire (JSON)
+```bash
+# Sans authentification (DEBUG=true)
+curl -X POST http://localhost:8000/predict \
   -H "Content-Type: application/json" \
   -d '{
     "age": 35,
     "genre": "M",
     "revenu_mensuel": 4500.0,
+    "satisfaction_employee_environnement": 3,
+    ...
   }'
+# Avec authentification (DEBUG=false)
+curl -X POST http://localhost:8000/predict \
+  -H "X-API-Key: your-secret-key" \
+  -H "Content-Type: application/json" \
+  -d @employee.json
+```
+**Réponse** :
+```json
+{
+  "prediction": 0,
+  "probability_0": 0.85,
+  "probability_1": 0.15,
+  "risk_level": "Low"
+}
 ```
+#### 3. Prédiction Batch (CSV)
 ```bash
+curl -X POST http://localhost:8000/predict/batch \
   -H "X-API-Key: your-key" \
+  -F "sondage_file=@data/extrait_sondage.csv" \
+  -F "eval_file=@data/extrait_eval.csv" \
+  -F "sirh_file=@data/extrait_sirh.csv"
 ```
+**Réponse** :
 ```json
 {
   "total_employees": 1470,
 }
 ```
+### Utilisation Python (SDK)
+```python
+import requests
+# Configuration
+API_URL = "http://localhost:8000/predict"
+API_KEY = "your-secret-key"
+# Données employé
+employee = {
+    "age": 28,
+    "genre": "F",
+    "revenu_mensuel": 3200.0,
+    "departement": "Consulting",
+    # ... (tous les 29 champs requis)
+}
+# Appel API
+response = requests.post(
+    API_URL,
+    headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
+    json=employee
+)
+# Résultat
+if response.status_code == 200:
+    result = response.json()
+    print(f"Risque de départ: {result['probability_1']:.0%}")
+    print(f"Niveau: {result['risk_level']}")
+```
+📚 **Documentation complète** : [docs/API_GUIDE.md](docs/API_GUIDE.md)
+---
+## 🌐 Déploiement
+### Environnements Disponibles
+| Environnement | Branche Git | URL HuggingFace Spaces | Statut |
+|---------------|-------------|------------------------|--------|
+| **Production** | `main` | https://asi-engineer-oc-p5.hf.space | ✅ Live |
+| **Développement** | `dev` | https://asi-engineer-oc-p5-dev.hf.space | 🚧 Testing |
+### Pipeline CI/CD (GitHub Actions)
+Le workflow `.github/workflows/ci-cd.yml` s'exécute automatiquement à chaque push :
+```mermaid
+graph LR
+    A[Push Code] --> B[Lint: Black + Flake8]
+    B --> C[Tests: pytest 97 tests]
+    C --> D[Test API Server]
+    D --> E{Branche?}
+    E -->|dev| F[Deploy HF Dev]
+    E -->|main| G[Deploy HF Prod]
+```
+**Jobs du pipeline** :
+1. **Lint** (~30s) : Black (formatage) + Flake8 (qualité)
+2. **Tests** (~3min) : pytest avec couverture (70%)
+3. **Test API Server** (~2min) : Démarrage uvicorn + tests `/health` et `/predict`
+4. **Deploy** : Déploiement automatique sur HuggingFace Spaces
+⚡ **Temps total** : ~5-7 minutes (< 10min requis)
+### Déploiement Manuel sur HuggingFace Spaces
+#### Prérequis
+```bash
+# Installer la CLI HuggingFace
+pip install huggingface_hub
+# Se connecter
+huggingface-cli login
+# Entrer votre token (créer sur https://huggingface.co/settings/tokens)
+```
+#### Pousser vers HF Spaces
+```bash
+# 1. Ajouter le remote HF
+git remote add space https://huggingface.co/spaces/ASI-Engineer/oc_p5
+# 2. Push vers HF
+git push space main
+# 3. Vérifier le déploiement
+# Visiter https://huggingface.co/spaces/ASI-Engineer/oc_p5
+```
+#### Configuration des Secrets HF Spaces
+Dans les settings du Space HuggingFace, ajouter :
+| Variable | Valeur | Description |
+|----------|--------|-------------|
+| `API_KEY` | `votre-clé-sécurisée` | Authentification API |
+| `DEBUG` | `false` | Mode production |
+| `LOG_LEVEL` | `INFO` | Niveau de logs |
+### Déploiement Docker (Alternative)
+```bash
+# Build de l'image
+docker build -t employee-turnover-api .
+# Run du conteneur
+docker run -d \
+  -p 8000:8000 \
+  -e API_KEY=your-key \
+  -e DEBUG=false \
+  --name turnover-api \
+  employee-turnover-api
+# Vérifier
+curl http://localhost:8000/health
+```
+📖 **Guide complet** : [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)
+---
+## 🔄 Mise à Jour
+### Mise à Jour du Code
+```bash
+# 1. Récupérer les dernières modifications
+git pull origin main
+# 2. Mettre à jour les dépendances
+poetry update
+# 3. Appliquer les migrations DB (si nécessaire)
+poetry run python scripts/migrate_db.py
+# 4. Relancer l'API
+poetry run uvicorn api:app --reload
+```
+### Ré-entraînement du Modèle
+**Fréquence recommandée** : Tous les 3 mois (ou si drift détecté)
+```bash
+# 1. Préparer les nouvelles données
+cp /path/to/new/data/*.csv data/
+# 2. Lancer l'entraînement (avec MLflow tracking)
+cd ml_model
+poetry run python main.py
+# 3. Comparer les performances
+poetry run mlflow ui
+# Ouvrir http://localhost:5000
+# 4. Si F1 Score ≥ 0.83, exporter le modèle
+poetry run python -c "
+import joblib
+import mlflow
+client = mlflow.tracking.MlflowClient()
+model_version = client.get_latest_versions('XGBoost_Employee_Turnover')[0]
+model = mlflow.sklearn.load_model(model_version.source)
+joblib.dump(model, 'model.pkl')
+"
+# 5. Uploader vers HuggingFace Hub
+poetry run python -c "
+from huggingface_hub import HfApi
+api = HfApi()
+api.upload_file(
+    path_or_fileobj='model.pkl',
+    path_in_repo='model/model.pkl',
+    repo_id='ASI-Engineer/employee-turnover-model',
+    commit_message='Update model v1.1 - F1=0.87'
+)
+"
+# 6. Créer un tag Git pour versioning
+git tag -a model-v1.1 -m "Model update: F1=0.87, Recall=0.89"
+git push origin model-v1.1
+```
+### Monitoring du Drift
+```python
+# Script de détection de drift (à automatiser mensuellement)
+import pandas as pd
+from scipy.stats import ks_2samp
+train_data = pd.read_csv('data/extrait_sirh.csv')
+new_data = pd.read_csv('logs/recent_predictions.csv')
+for col in ['age', 'revenu_mensuel', 'annees_dans_l_entreprise']:
+    statistic, pvalue = ks_2samp(train_data[col], new_data[col])
+    if pvalue < 0.05:
+        print(f'⚠️ DRIFT détecté sur {col} (p={pvalue:.4f})')
+        # → Déclencher ré-entraînement
+```
+📖 **Guide complet** : [docs/MODEL_TECHNICAL.md](docs/MODEL_TECHNICAL.md#maintenance-et-mise-à-jour)
+---
+## ✅ Tests
+### Suite de Tests Complète
+```bash
+# Lancer tous les tests
+poetry run pytest tests/ -v
+# Avec rapport de couverture
+poetry run pytest tests/ --cov=. --cov-report=term-missing
+# Avec rapport HTML
+poetry run pytest tests/ --cov=. --cov-report=html
+open htmlcov/index.html
+```
+### Métriques
+| Métrique | Valeur | Détail |
+|----------|--------|--------|
+| **Tests** | 97 | 86 passés, 11 skippés (déploiement) |
+| **Couverture** | 70.26% | Objectif : ≥ 70% |
+| **Durée** | ~4s | Temps d'exécution total |
+| **Fichiers** | 9 | test_api_*.py, test_database.py, test_model.py |
+### Catégories de Tests
+- ✅ **Authentification** (11 tests) : API Key, headers, rate limiting
+- ✅ **Health Check** (6 tests) : Status, modèle chargé, versionning
+- ✅ **Prédiction** (9 tests) : Endpoint `/predict`, probabilités, cohérence
+- ✅ **Validation** (15 tests) : Pydantic, types, énumérations, limites
+- ✅ **Database** (7 tests) : Connexion, CRUD, intégrité
+- ✅ **Fonctionnel** (19 tests) : End-to-end, performance, erreurs
+- ✅ **Modèle ML** (23 tests) : Chargement HF, preprocessing, prédictions
+- ✅ **API Déployée** (7 tests skippés) : Tests sur HF Spaces
+📊 **Détail de couverture** :
+| Module | Couverture | Lignes | Manquantes |
+|--------|------------|--------|------------|
+| `src/config.py` | 100% | 20 | 0 |
+| `src/schemas.py` | 100% | 100 | 0 |
+| `src/rate_limit.py` | 100% | 10 | 0 |
+| `db_models.py` | 100% | 14 | 0 |
+| `src/logger.py` | 90.32% | 62 | 6 |
+| `src/preprocessing.py` | 76.36% | 55 | 13 |
+| `api.py` | 55.41% | 157 | 70 |
+---
+## 📚 Documentation
+| Document | Description |
+|----------|-------------|
+| [📖 README.md](README.md) | Vue d'ensemble et guide rapide (ce fichier) |
+| [🔌 API_GUIDE.md](docs/API_GUIDE.md) | Guide complet de l'API (endpoints, schémas, exemples) |
+| [🤖 MODEL_TECHNICAL.md](docs/MODEL_TECHNICAL.md) | Documentation technique du modèle (architecture, performances, maintenance) |
+| [🚀 DEPLOYMENT.md](docs/DEPLOYMENT.md) | Guide de déploiement (Docker, HF Spaces, CI/CD) |
+| [🎓 TRAINING.md](docs/TRAINING.md) | Guide d'entraînement du modèle (preprocessing, MLflow) |
+| [🗄️ database_guide.md](docs/database_guide.md) | Guide PostgreSQL pour débutants |
+| [📊 DOCUMENTATION_INVENTORY.md](docs/DOCUMENTATION_INVENTORY.md) | Inventaire complet de la documentation |
+| [📐 schema.puml](docs/schema.puml) | Diagramme UML de la base de données |
+**Documentation interactive** :
+- 🌐 **Swagger UI** : http://localhost:8000/docs
+- 📘 **ReDoc** : http://localhost:8000/redoc
+---
+## 📦 Dépendances Principales
+| Package | Version | Rôle |
+|---------|---------|------|
+| **FastAPI** | 0.115.14 | Framework API REST |
+| **Pydantic** | 2.12.5 | Validation données |
+| **XGBoost** | 2.1.3 | Modèle ML |
+| **imbalanced-learn** | 0.12.0 | SMOTE (rééquilibrage) |
+| **SQLAlchemy** | 2.0.23 | ORM PostgreSQL |
+| **psycopg2-binary** | 2.9.9 | Driver PostgreSQL |
+| **SlowAPI** | 0.1.9 | Rate limiting |
+| **python-json-logger** | 4.0.0 | Logs structurés |
+| **pytest** | 9.0.2 | Tests unitaires |
+| **MLflow** | 2.9.2 | Tracking expériences ML |
+| **Gradio** | 4.13.0 | Interface web |
+Voir [pyproject.toml](pyproject.toml) pour la liste complète.
+---
+## 🔄 Changelog
+### v3.3.0 (Janvier 2026)
+- 📚 **Documentation complète** pour Étape 6 OpenClassrooms
+- 📝 Création de 13 nouveaux fichiers de documentation (~5000 lignes)
+- 🌐 Setup site MkDocs avec theme Material (17 pages HTML)
+- 📊 Inventaire complet de la documentation existante
+- 🔧 README restructuré selon Best-README-Template (841 lignes)
+- 📖 Guide API exhaustif avec 7 exemples (curl, Python, JS) - 981 lignes
+- 🤖 Documentation technique modèle avec diagrammes et justifications - 393 lignes
+- 📈 Visualisation des performances du modèle (model_performance.png)
+- ✅ Vérification complète : liens, cohérence, instructions testées
+### v3.2.1 (Janvier 2026)
+- 🎛️ Sliders Gradio et schémas Pydantic alignés sur les min/max réels des données d'entraînement
+- 📦 Endpoint batch CSV (3 fichiers bruts)
+- 🔑 Authentification API Key (prod)
+- 🔧 Correction preprocessing (scaling, ordre des colonnes)
+- 📝 Documentation complète enrichie (API_GUIDE, MODEL_TECHNICAL)
+### v2.2.0 (27 Décembre 2025)
+- 📦 Nouvel endpoint `/predict/batch` pour traitement CSV direct
+- 🔧 Fix preprocessing : ajout du scaling des features
+- 🔧 Fix preprocessing : correction de l'ordre des colonnes
+- 📊 Amélioration précision des prédictions (~90%)
+### v2.1.0 (26 Décembre 2025)
+- ✨ Système de logging structuré JSON
+- 🛡️ Rate limiting avec SlowAPI
+- ⚡ Amélioration gestion d'erreurs
+- 📊 Monitoring des performances
+### v2.0.0 (26 Décembre 2025)
+- ✅ Suite de tests complète (97 tests)
+- 🔐 Authentification API Key
+- 📊 70% de couverture de code
+---
+## 👥 Auteurs
+**Développeur** : Valentin (chaton59)
+**Projet** : OpenClassrooms P5 - Déployez votre modèle de Machine Learning
+**Repo GitHub** : [github.com/chaton59/OC_P5](https://github.com/chaton59/OC_P5)
+**HuggingFace** : [ASI-Engineer](https://huggingface.co/ASI-Engineer)
+---
+## 📄 Licence
+Ce projet est développé dans un cadre pédagogique (OpenClassrooms).
+Les données utilisées sont fictives.
+---
+## 🤝 Contributing
+Les contributions sont bienvenues ! Pour contribuer :
+1. Fork le projet
+2. Créer une branche feature (`git checkout -b feature/AmazingFeature`)
+3. Commit les changements (`git commit -m 'Add AmazingFeature'`)
+4. Push vers la branche (`git push origin feature/AmazingFeature`)
+5. Ouvrir une Pull Request
+---
+## 📞 Contact & Support
+- **Issues GitHub** : [github.com/chaton59/OC_P5/issues](https://github.com/chaton59/OC_P5/issues)
+- **Discussions** : [github.com/chaton59/OC_P5/discussions](https://github.com/chaton59/OC_P5/discussions)
+- **Email** : Voir profil GitHub
+---
+## 🙏 Remerciements
+- **OpenClassrooms** pour le parcours Data Scientist
+- **HuggingFace** pour l'hébergement gratuit
+- **FastAPI** pour le framework moderne
+- **Communauté Python ML** pour les bibliothèques open-source
+---
+<div align="center">
+**⭐ Si ce projet vous a aidé, n'hésitez pas à lui donner une étoile sur GitHub ! ⭐**
+Made with ❤️ by [chaton59](https://github.com/chaton59)
+</div>

README_HF.md CHANGED Viewed

@@ -10,9 +10,9 @@ app_port: 7860
 ---
-# Employee Turnover Prediction API 🚀 (v3.2.1)
-API de prédiction du turnover des employés (XGBoost + SMOTE) avec endpoints batch, validation stricte et documentation à jour.
 ## 🎯 Fonctionnalités
@@ -23,7 +23,21 @@ API de prédiction du turnover des employés (XGBoost + SMOTE) avec endpoints ba
 - 🔐 Authentification API Key (obligatoire)
 - 📝 Logs structurés JSON
 - 🛡️ Rate limiting (20 req/min)
-- 📚 Documentation OpenAPI/Swagger
 ## 🔗 Endpoints

 ---
+# Employee Turnover Prediction API 🚀 (v3.3.0)
+API de prédiction du turnover des employés (XGBoost + SMOTE) avec endpoints batch, validation stricte et **documentation complète**.
 ## 🎯 Fonctionnalités
 - 🔐 Authentification API Key (obligatoire)
 - 📝 Logs structurés JSON
 - 🛡️ Rate limiting (20 req/min)
+- 📚 **Documentation exhaustive** (Étape 6 OpenClassrooms)
+## 📚 Documentation Complète
+| Document | Description | Lignes |
+|----------|-------------|--------|
+| **[README.md](https://github.com/chaton59/OC_P5/blob/main/README.md)** | Vue d'ensemble complète (restructuré Best-README-Template) | 841 |
+| **[API_GUIDE.md](https://github.com/chaton59/OC_P5/blob/main/docs/API_GUIDE.md)** | Guide API exhaustif avec 7 exemples (curl, Python, JS) | 981 |
+| **[MODEL_TECHNICAL.md](https://github.com/chaton59/OC_P5/blob/main/docs/MODEL_TECHNICAL.md)** | Documentation technique modèle (architecture, justifications) | 393 |
+| **[DEPLOYMENT.md](https://github.com/chaton59/OC_P5/blob/main/docs/DEPLOYMENT.md)** | Guide de déploiement (Docker, HF Spaces, CI/CD) | - |
+| **[TRAINING.md](https://github.com/chaton59/OC_P5/blob/main/docs/TRAINING.md)** | Guide d'entraînement (preprocessing, MLflow) | - |
+| **[Site MkDocs](https://github.com/chaton59/OC_P5/tree/main/docs)** | Documentation HTML navigable (17 pages, Material theme) | - |
+**🌐 Site de documentation** : Générez localement avec `poetry run mkdocs serve`
 ## 🔗 Endpoints

mkdocs.yml ADDED Viewed

	@@ -0,0 +1,175 @@

+site_name: Employee Turnover Prediction API
+site_description: Documentation complète de l'API de prédiction du turnover des employés
+site_author: Valentin (chaton59)
+site_url: https://github.com/chaton59/OC_P5
+# Repository
+repo_name: chaton59/OC_P5
+repo_url: https://github.com/chaton59/OC_P5
+edit_uri: edit/main/docs/
+# Copyright
+copyright: Copyright &copy; 2026 Valentin - Projet OpenClassrooms P5
+# Theme
+theme:
+  name: material
+  language: fr
+  palette:
+    # Light mode
+    - media: "(prefers-color-scheme: light)"
+      scheme: default
+      primary: indigo
+      accent: blue
+      toggle:
+        icon: material/brightness-7
+        name: Passer au mode sombre
+    # Dark mode
+    - media: "(prefers-color-scheme: dark)"
+      scheme: slate
+      primary: indigo
+      accent: blue
+      toggle:
+        icon: material/brightness-4
+        name: Passer au mode clair
+  font:
+    text: Roboto
+    code: Roboto Mono
+  features:
+    - navigation.instant      # Navigation instantanée (SPA)
+    - navigation.tracking     # URL tracking
+    - navigation.tabs         # Tabs en haut
+    - navigation.tabs.sticky  # Tabs fixes au scroll
+    - navigation.sections     # Sections dans la sidebar
+    - navigation.expand       # Expand par défaut
+    - navigation.top          # Bouton retour en haut
+    - search.suggest          # Suggestions de recherche
+    - search.highlight        # Highlight des résultats
+    - search.share            # Partage des recherches
+    - content.code.copy       # Bouton copier pour code blocks
+    - content.tabs.link       # Tabs linkés
+  icon:
+    repo: fontawesome/brands/github
+    logo: material/chart-line
+# Extensions
+markdown_extensions:
+  # Python Markdown
+  - abbr                      # Abréviations
+  - admonition                # Notes/avertissements
+  - attr_list                 # Attributs HTML
+  - def_list                  # Listes de définitions
+  - footnotes                 # Notes de bas de page
+  - md_in_html                # Markdown dans HTML
+  - toc:
+      permalink: true         # Liens permanents pour headers
+      toc_depth: 3            # Profondeur de la table des matières
+  # PyMdown Extensions
+  - pymdownx.arithmatex:
+      generic: true           # Support LaTeX/MathJax
+  - pymdownx.betterem:
+      smart_enable: all       # Meilleure gestion emphasis
+  - pymdownx.caret            # Superscript
+  - pymdownx.mark             # Highlight
+  - pymdownx.tilde            # Subscript
+  - pymdownx.critic           # Track changes
+  - pymdownx.details          # Collapsible admonitions
+  - pymdownx.emoji:
+      emoji_index: !!python/name:material.extensions.emoji.twemoji
+      emoji_generator: !!python/name:material.extensions.emoji.to_svg
+  - pymdownx.highlight:
+      anchor_linenums: true   # Ancres pour numéros de lignes
+      line_spans: __span
+      pygments_lang_class: true
+  - pymdownx.inlinehilite     # Highlight inline code
+  - pymdownx.keys             # Keyboard keys (++ctrl+c++)
+  - pymdownx.magiclink:
+      repo_url_shorthand: true
+      user: chaton59
+      repo: OC_P5
+  - pymdownx.smartsymbols     # Symboles intelligents
+  - pymdownx.snippets:
+      auto_append:
+        - includes/abbreviations.md
+  - pymdownx.superfences:     # Code blocks avancés
+      custom_fences:
+        - name: mermaid
+          class: mermaid
+          format: !!python/name:pymdownx.superfences.fence_code_format
+  - pymdownx.tabbed:
+      alternate_style: true   # Tabs de contenu
+  - pymdownx.tasklist:
+      custom_checkbox: true   # Listes de tâches
+# Plugins
+plugins:
+  - search:
+      lang: fr
+      separator: '[\s\-\.]+'
+  - minify:
+      minify_html: true
+      minify_js: true
+      minify_css: true
+      htmlmin_opts:
+        remove_comments: true
+# Extra
+extra:
+  social:
+    - icon: fontawesome/brands/github
+      link: https://github.com/chaton59
+    - icon: fontawesome/brands/python
+      link: https://www.python.org
+  version:
+    provider: mike
+    default: latest
+  analytics:
+    feedback:
+      title: Cette page vous a-t-elle été utile ?
+      ratings:
+        - icon: material/emoticon-happy-outline
+          name: Cette page m'a été utile
+          data: 1
+          note: >-
+            Merci pour votre retour !
+        - icon: material/emoticon-sad-outline
+          name: Cette page pourrait être améliorée
+          data: 0
+          note: >-
+            Merci pour votre retour. Aidez-nous à améliorer cette page en
+            <a href="https://github.com/chaton59/OC_P5/issues/new" target="_blank" rel="noopener">créant une issue</a>.
+# Navigation
+nav:
+  - Accueil:
+    - Vue d'ensemble: index.md
+    - Changelog: changelog.md
+  - Guide de Démarrage:
+    - Installation: installation.md
+    - Configuration: configuration.md
+    - Premier déploiement: quickstart.md
+  - API:
+    - Guide complet: api/guide.md
+    - Documentation API (complète): API_GUIDE.md
+  - Modèle ML:
+    - Documentation technique: model/technical.md
+    - Documentation complète: MODEL_TECHNICAL.md
+    - Guide d'entraînement: TRAINING.md
+  - Déploiement:
+    - Guide de déploiement: DEPLOYMENT.md
+  - Base de Données:
+    - Guide de la base de données: database_guide.md
+  - Référence:
+    - Inventaire documentation: DOCUMENTATION_INVENTORY.md

src/config.py CHANGED Viewed

@@ -26,7 +26,7 @@ class Settings:
     API_KEY: str = os.getenv("API_KEY", "dev-key-change-me-in-production")
     # ===== API =====
-    API_VERSION: str = os.getenv("API_VERSION", "2.2.0")
     API_HOST: str = os.getenv("API_HOST", "0.0.0.0")
     API_PORT: int = int(os.getenv("API_PORT", "8000"))

     API_KEY: str = os.getenv("API_KEY", "dev-key-change-me-in-production")
     # ===== API =====
+    API_VERSION: str = os.getenv("API_VERSION", "3.3.0")
     API_HOST: str = os.getenv("API_HOST", "0.0.0.0")
     API_PORT: int = int(os.getenv("API_PORT", "8000"))

src/logger.py CHANGED Viewed

@@ -13,7 +13,7 @@ import sys
 from pathlib import Path
 from typing import Any, Dict
-from pythonjsonlogger.jsonlogger import JsonFormatter
 from src.config import get_settings

 from pathlib import Path
 from typing import Any, Dict
+from pythonjsonlogger.json import JsonFormatter
 from src.config import get_settings