Spaces:

donizetti-yoann
/

technova-ml-api

Sleeping

App Files Files Community

github-actions commited on Jan 4

Commit

5fa8558

0 Parent(s):

deploy: snapshot

Browse files

Files changed (43) hide show

.env.example +11 -0
.github/workflows/ci.yml +28 -0
.github/workflows/deploy.yml +32 -0
.gitignore +11 -0
Dockerfile +19 -0
README.md +350 -0
app/__init__.py +0 -0
app/core/__init__.py +0 -0
app/core/config.py +35 -0
app/db/__init__.py +0 -0
app/db/engine.py +8 -0
app/db/queries.py +20 -0
app/main.py +95 -0
app/ml/__init__.py +0 -0
app/ml/loader.py +49 -0
app/ml/predict.py +84 -0
app/ml/preprocessing.py +18 -0
app/schemas/__init__.py +0 -0
app/schemas/prediction.py +39 -0
app/security/__init__.py +0 -0
app/security/auth.py +14 -0
app/services/__init__.py +0 -0
app/services/audit.py +31 -0
app/services/features.py +18 -0
app/services/predict.py +35 -0
config/threshold.json +1 -0
db/01_schema.sql +15 -0
db/02_raw_tables.sql +58 -0
db/03_load_raw.sql +35 -0
db/04_staging.sql +119 -0
db/05_mart.sql +74 -0
db/06_audit.sql +44 -0
db/README_SQL.md +167 -0
encoder/__init__.py +0 -0
encoder/custom_encoder.py +54 -0
requirements.txt +15 -0
tests/conftest.py +22 -0
tests/test_api.py +108 -0
tests/test_audit.py +20 -0
tests/test_engine.py +9 -0
tests/test_feature.py +24 -0
tests/test_health.py +9 -0
tests/test_predict.py +153 -0

.env.example ADDED Viewed

	@@ -0,0 +1,11 @@

+# Authentification API
+API_KEY=your_api_key_here
+# Base de données PostgreSQL
+DATABASE_URL=postgresql://user:password@localhost:5432/technova
+# Modèle ML (local ou Hugging Face)
+MODEL_PATH=path/to/model.joblib
+HF_MODEL_REPO=donizetti-yoann/technova-ml-model
+HF_MODEL_FILENAME=model.joblib
+HF_TOKEN=your_huggingface_token_here

.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: CI
+on:
+  push:
+    branches: ["main", "develop", "feature/**"]
+  pull_request:
+    branches: ["main", "develop"]
+jobs:
+  tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+          cache: "pip"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt
+      - name: Run tests
+        run: |
+          python -m pytest -q

.github/workflows/deploy.yml ADDED Viewed

	@@ -0,0 +1,32 @@

+name: Deploy to Hugging Face Space
+on:
+  push:
+    branches: ["main"]
+  workflow_dispatch:
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout (shallow)
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 1
+      - name: Prepare clean deploy repo (no git history)
+        run: |
+          rm -rf .git
+          git init
+          git config user.email "actions@github.com"
+          git config user.name "github-actions"
+          git add .
+          git commit -m "deploy: snapshot"
+      - name: Push snapshot to Hugging Face Space
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+          HF_SPACE_REPO: ${{ vars.HF_SPACE_REPO }}
+        run: |
+          git remote add hf https://user:${HF_TOKEN}@huggingface.co/spaces/${HF_SPACE_REPO}
+          git push hf HEAD:main --force

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+.venv/
+__pycache__/
+*.pyc
+.vscode/
+.idea/
+models/*.joblib
+.env
+data/
+.coverage
+htmlcov/
+coverage.xml

Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+FROM python:3.11-slim
+# HF Spaces Docker: le container tourne avec UID 1000
+RUN useradd -m -u 1000 user
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH
+WORKDIR /home/user/app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+# Utilise le port attendu par HF (7860 par défaut)
+CMD ["sh", "-c", "uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-7860}"]

README.md ADDED Viewed

	@@ -0,0 +1,350 @@

+---
+title: Technova ML API
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+pinned: false
+---
+# Technova ML API
+## Sommaire
+- [Présentation du projet](#présentation-du-projet)
+- [Structure du projet](#structure-du-projet)
+- [Architecture globale](#architecture-globale)
+- [Intégration Continue et Déploiement Continu](#intégration-continue-et-déploiement-continu)
+- [Architecture des données (BDD)](#architecture-des-données-bdd)
+- [Modèle de Machine Learning](#modèle-de-machine-learning)
+- [API FastAPI](#api-fastapi)
+- [Sécurité et authentification](#sécurité-et-authentification)
+- [Audit et traçabilité](#audit-et-traçabilité)
+- [Tests et qualité du code](#tests-et-qualité-du-code)
+- [Déploiement](#déploiement)
+- [Installation et utilisation](#installation-et-utilisation)
+---
+## Présentation du projet
+**Technova ML API** est une API de prédiction d’attrition des employés basée sur un modèle de Machine Learning.
+Elle permet de prédire la probabilité de départ d’un employé à partir de données RH structurées.
+---
+## Structure du projet
+Le dépôt est organisé de manière modulaire afin de séparer
+les responsabilités (API, ML, base de données, tests).
+```text
+technova-ml-api/
+├─ app/
+│  ├─ main.py              # Point d’entrée de l’API FastAPI
+│  ├─ core/                # Configuration et settings
+│  ├─ security/            # Authentification (API Key)
+│  ├─ ml/                  # Chargement du modèle et prédictions
+│  ├─ services/            # Logique métier (predict, audit)
+│  ├─ db/                  # Connexion DB et scripts SQL
+│  └─ schemas/             # Schémas Pydantic (entrées/sorties)
+│
+├─ db/
+│  ├─ schema.sql           # Création des schémas PostgreSQL
+│  ├─ raw.sql              # Tables de données brutes
+│  ├─ staging.sql          # Nettoyage et transformations
+│  ├─ mart.sql             # Dataset final pour le modèle ML
+│  └─ audit.sql            # Journalisation des prédictions
+│
+├─ tests/                  # Tests unitaires et fonctionnels (Pytest)
+│
+├─ .github/workflows/      # Pipeline CI (tests automatiques)
+├─ requirements.txt
+└─ README.md
+```
+---
+## Architecture globale
+- API développée avec **FastAPI**
+- Base de données **PostgreSQL**
+- Modèle ML entraîné en amont (hors API), puis chargé au démarrage de l’application
+- Déploiement local et sur **Hugging Face Spaces**
+- Sécurité par **API Key**
+- Tests automatisés avec **Pytest**
+---
+## Intégration Continue et Déploiement Continu
+Le projet intègre une démarche d’intégration continue (CI) afin de garantir
+la qualité et la stabilité du code à chaque modification.
+Les mises à jour du code ou du modèle sont réalisées via des commits sur la branche principale,
+déclenchant automatiquement les tests et le redéploiement de l’API grâce au pipeline CI/CD.
+### Intégration Continue (CI)
+- Pipeline automatisé via **GitHub Actions**
+- Exécution des tests Pytest à chaque push et pull request
+- Validation du code avant fusion sur la branche principale
+- Détection précoce des régressions
+### Déploiement Continu (CD)
+- Déploiement de l’API sur **Hugging Face Spaces**
+- Gestion des secrets (API Key, accès modèle) via variables d’environnement
+- Séparation des environnements (local / CI / production)
+Cette approche permet un déploiement fiable, reproductible et sécurisé
+du modèle de Machine Learning exposé par l’API.
+---
+## Architecture des données (BDD)
+Les détails techniques concernant la base de données
+(schémas, tables, scripts SQL et pipeline de transformation)
+sont documentés dans `db/README_SQL.md`.
+Le schéma ci-dessous présente le flux logique des données,
+de l’ingestion jusqu’à l’audit des prédictions.
+### Pipeline de données
+```mermaid
+flowchart TD
+    RAW[RAW<br/>Données brutes]
+    RAW_SIRH[extrait_sirh]
+    RAW_EVAL[extrait_eval]
+    RAW_SONDAGE[extrait_sondage]
+    STAGING[STAGING<br/>Nettoyage & normalisation]
+    STAGING_EMP[employee_base]
+    MART[MART<br/>Dataset ML]
+    MART_EMP[employee_features]
+    AUDIT[AUDIT<br/>Traçabilité API]
+    AUDIT_REQ[prediction_requests]
+    AUDIT_RES[prediction_responses]
+    RAW --> RAW_SIRH
+    RAW --> RAW_EVAL
+    RAW --> RAW_SONDAGE
+    RAW_SIRH --> STAGING
+    RAW_EVAL --> STAGING
+    RAW_SONDAGE --> STAGING
+    STAGING --> STAGING_EMP
+    STAGING_EMP --> MART
+    MART --> MART_EMP
+    MART_EMP -->|utilisé par le modèle ML| AUDIT
+    AUDIT --> AUDIT_REQ
+    AUDIT --> AUDIT_RES
+```
+### Description des schémas
+- **RAW** : données brutes sans transformation.
+- **STAGING** : nettoyage, normalisation et jointure des sources.
+- **MART** : dataset final utilisé par le modèle ML.
+- **AUDIT** : traçabilité complète des appels API et des prédictions.
+---
+## Modèle de Machine Learning
+- Type : classification binaire
+- Cible : départ de l’employé
+- Sortie : probabilité + décision selon un seuil configurable
+- Seuil stocké dans un fichier de configuration
+Les performances du modèle ont été évaluées en amont lors du projet de data science,
+et le modèle est ici réutilisé comme un composant validé pour un usage en production.
+Le modèle peut être remplacé ou mis à jour sans modification de l’API, en respectant le même schéma d’entrée.
+---
+## API FastAPI
+### Endpoints principaux
+- `GET /health` : état de l’API
+- `POST /predict` : prédiction à partir de données fournies
+- `GET /predict/{id_employee}` : prédiction à partir de la base de données
+L’endpoint `/health` permet de vérifier l’état de l’API(chargement du modèle, seuil, configuration de la base) et peut être utilisé pour le monitoring.
+La documentation est disponible via Swagger :
+`/docs`
+---
+### Exemple POST /predict
+```json
+{
+  "age": 41,
+  "genre": "femme",
+  "revenu_mensuel": 5993,
+  "statut_marital": "célibataire",
+  "departement": "commercial",
+  "poste": "cadre commercial",
+  "nombre_experiences_precedentes": 8,
+  "annees_dans_l_entreprise": 2,
+  "satisfaction_employee_environnement": 4,
+  "satisfaction_employee_nature_travail": 1,
+  "satisfaction_employee_equipe": 1,
+  "satisfaction_employee_equilibre_pro_perso": 1,
+  "heure_supplementaires": True,
+  "augmentation_salaire_precedente": 11,
+  "nombre_participation_pee": 0,
+  "nb_formations_suivies": 0,
+  "distance_domicile_travail": 1,
+  "niveau_education": 2,
+  "domaine_etude": "infra & cloud",
+  "frequence_deplacement": "occasionnel",
+  "annees_sous_responsable_actuel": 0,
+  "annees_dans_le_poste_actuel": 0,
+  "note_evaluation_actuelle": 0,
+  "note_evaluation_precedente": 0,
+  "annees_depuis_la_derniere_promotion": 0
+}
+```
+```bash
+curl -X POST http://localhost:8000/predict \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: <YOUR_API_KEY>" \
+  -d @payload.json
+  ```
+### Exemple GET /predict/7
+Prédiction à partir des données stockées pour l’employé d’identifiant 7.
+---
+## Sécurité et authentification
+- Protection des endpoints sensibles via **API Key**
+- Clé transmise dans le header : `X-API-Key`
+- Gestion des secrets via variables d’environnement
+- Compatible CI/CD et Hugging Face Spaces
+---
+## Audit et traçabilité
+Chaque appel de prédiction est enregistré :
+- **prediction_requests** : payload d’entrée
+- **prediction_responses** : probabilité, décision, seuil
+Cette approche garantit la reproductibilité et l’auditabilité des prédictions.
+---
+## Tests et qualité du code
+- Tests unitaires et fonctionnels avec **Pytest**
+- Tests de sécurité (API Key)
+- Tests des endpoints critiques
+- Exécution automatisée en CI
+Les dépendances externes (chargement du modèle distant, connexion PostgreSQL réelle) ne sont pas testées en CI afin de garantir des tests rapides et reproductibles. Ces scénarios relèvent de tests d’intégration ou d’environnements
+### Couverture de tests
+Le projet intègre une mesure de la couverture de tests afin d’évaluer
+la robustesse du code et la fiabilité de l’API.
+Les tests sont exécutés avec **pytest** et **pytest-cov**.
+```bash
+python -m pytest --cov=app --cov-report=term
+```
+```text
+---
+Name                        Stmts   Miss  Cover
+-----------------------------------------------
+app\__init__.py                 0      0   100%
+app\core\__init__.py            0      0   100%
+app\core\config.py             17      0   100%
+app\db\__init__.py              0      0   100%
+app\db\engine.py                7      1    86%
+app\db\queries.py               4      0   100%
+app\main.py                    51     13    75%
+app\ml\__init__.py              0      0   100%
+app\ml\loader.py               26     18    31%
+app\ml\predict.py              28      7    75%
+app\ml\preprocessing.py         8      0   100%
+app\schemas\__init__.py         0      0   100%
+app\schemas\prediction.py      31      0   100%
+app\security\__init__.py        0      0   100%
+app\security\auth.py            9      1    89%
+app\services\__init__.py        0      0   100%
+app\services\audit.py           7      0   100%
+app\services\features.py        7      1    86%
+app\services\predict.py        19      0   100%
+-----------------------------------------------
+TOTAL                         214     41    81%
+```
+Les fichiers les moins couverts sont surtout ceux liés au démarrage et aux dépendances externes (chargement du modèle, Hugging Face, connexion DB). En mode test, j’isole ces dépendances avec un DummyModel pour avoir des tests rapides et reproductibles. Les tests couvrent en priorité l’API, la sécurité et les scénarios critiques. Les chemins restants seraient plutôt couverts via tests d’intégration.
+---
+## Déploiement
+- Déploiement local (Python)
+- Déploiement cloud sur Hugging Face Spaces
+- Gestion des secrets via variables d’environnement
+Lien de l’API déployée :
+https://huggingface.co/spaces/donizetti-yoann/technova-ml-api
+---
+## Variables d’environnement
+Les variables suivantes sont nécessaires au fonctionnement de l’API :
+- `API_KEY` : clé d’authentification des endpoints
+- `DATABASE_URL` : chaîne de connexion PostgreSQL
+- `MODEL_PATH` : chemin vers le modèle local (optionnel)
+- `HF_MODEL_REPO` / `HF_MODEL_FILENAME` : modèle hébergé sur Hugging Face
+- `HF_TOKEN` : token Hugging Face
+Ces variables sont fournies via l’environnement d’exécution
+(local, CI/CD ou Hugging Face Spaces) et ne sont jamais stockées
+en clair dans le dépôt.
+### Configuration des variables d’environnement
+Le projet utilise des variables d’environnement pour gérer la configuration
+et les secrets.
+Un fichier `.env.example` est fourni à la racine du dépôt.
+Il peut être copié et renommé en `.env`, puis complété avec les valeurs
+appropriées selon l’environnement d’exécution (local, CI/CD, production).
+```bash
+cp .env.example .env
+```
+---
+## Installation et utilisation
+```bash
+git clone https://github.com/yoann-donizetti/technova-ml-api
+cd technova-ml-api
+pip install -r requirements.txt
+uvicorn app.main:app --reload
+```
+---
+Ce projet met en œuvre une API de Machine Learning prête pour un usage en production,
+avec une attention particulière portée à la sécurité, à la testabilité et à la reproductibilité.

app/__init__.py ADDED Viewed

File without changes

app/core/__init__.py ADDED Viewed

File without changes

app/core/config.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import os
+from dataclasses import dataclass
+from functools import lru_cache
+from dotenv import load_dotenv
+load_dotenv()#Charge automatiquement les variables définies dans un fichier .env
+@dataclass(frozen=True)# Crée une classe de configuration immuable
+class Settings:
+    DATABASE_URL: str | None
+    THRESHOLD_PATH: str
+    MODEL_PATH: str | None
+    HF_MODEL_REPO: str | None
+    HF_MODEL_FILENAME: str
+    HF_TOKEN: str | None
+    API_KEY: str | None
+@lru_cache # Cache le résultat de la fonction en mémoire
+def get_settings() -> Settings:
+    '''
+    La fonction get_settings retourne explicitement
+    un objet de type Settings, ce qui rend la configuration claire,
+      typée et plus sûre à l’échelle de l’application.
+    '''
+    return Settings(
+        DATABASE_URL=os.getenv("DATABASE_URL"),
+        THRESHOLD_PATH=os.getenv("THRESHOLD_PATH", "config/threshold.json"),
+        MODEL_PATH=os.getenv("MODEL_PATH"),
+        HF_MODEL_REPO=os.getenv("HF_MODEL_REPO"),
+        HF_MODEL_FILENAME=os.getenv("HF_MODEL_FILENAME", "model.joblib"),
+        HF_TOKEN=os.getenv("HF_TOKEN"),
+        API_KEY=os.getenv("API_KEY"),
+    )

app/db/__init__.py ADDED Viewed

File without changes

app/db/engine.py ADDED Viewed

	@@ -0,0 +1,8 @@

+from app.core.config import get_settings
+from sqlalchemy import create_engine
+# Initialise la connexion à la base de données à partir de la configuration centralisée
+def get_engine():
+    settings = get_settings()
+    if not settings.DATABASE_URL:
+        return None
+    return create_engine(settings.DATABASE_URL, pool_pre_ping=True)

app/db/queries.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from sqlalchemy import text
+# Requetes SQL utilisées par l'API
+SQL_INSERT_REQUEST = text("""
+INSERT INTO audit.prediction_requests (payload)
+VALUES (CAST(:payload AS jsonb))
+RETURNING request_id
+""")
+SQL_INSERT_RESPONSE = text("""
+INSERT INTO audit.prediction_responses
+(request_id, proba, prediction, threshold)
+VALUES (:request_id, :proba, :prediction, :threshold)
+""")
+SQL_GET_EMPLOYEE_FEATURES = text("""
+SELECT *
+FROM mart.employee_features
+WHERE id_employee = :id_employee
+LIMIT 1
+""")

app/main.py ADDED Viewed

	@@ -0,0 +1,95 @@

+# app/main.py
+import os
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, HTTPException, Depends
+from fastapi.responses import RedirectResponse
+from app.security.auth import require_api_key
+from app.ml.loader import load_model, load_threshold
+from app.schemas.prediction import PredictionRequest, PredictionResponse
+from app.services.predict import run_predict_manual, run_predict_by_id
+from app.db.engine import get_engine
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+#mode test
+    if os.getenv("APP_ENV") == "test":
+        class DummyModel:
+            def predict_proba(self, X):
+                return [[0.2, 0.8]]
+        app.state.model = DummyModel()
+        app.state.threshold = 0.292
+        app.state.engine = None
+        yield
+        return
+    # mode normal
+    app.state.model = load_model()
+    app.state.threshold = float(load_threshold())
+    app.state.engine = get_engine()
+    print("[startup] model + threshold loaded OK")
+    yield
+app = FastAPI(title="Technova ML API", version="1.0.0", lifespan=lifespan)
+@app.get("/", include_in_schema=False)
+def root():
+    return RedirectResponse(url="/docs")
+@app.get("/health")
+def health():
+    model_loaded = getattr(app.state, "model", None) is not None
+    threshold = getattr(app.state, "threshold", None)
+    engine = getattr(app.state, "engine", None)
+    return {
+        "status": "ok",
+        "model_loaded": model_loaded,
+        "threshold": threshold,
+        "db_configured": engine is not None,
+    }
+@app.post(
+    "/predict",
+    response_model=PredictionResponse,
+    tags=["default"],
+    dependencies=[Depends(require_api_key)],
+)
+def predict_manual(data: PredictionRequest):
+    try:
+        proba, pred, _payload = run_predict_manual(
+            payload=data.model_dump(),
+            model=app.state.model,
+            threshold=float(app.state.threshold),
+            engine=getattr(app.state, "engine", None),
+        )
+        return PredictionResponse(proba=float(proba), prediction=int(pred), threshold=float(app.state.threshold))
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=str(e))
+@app.get(
+    "/predict/{id_employee}",
+    response_model=PredictionResponse,
+    tags=["default"],
+    dependencies=[Depends(require_api_key)],
+)
+def predict_by_id(id_employee: int):
+    try:
+        proba, pred, _payload = run_predict_by_id(
+            id_employee=id_employee,
+            model=app.state.model,
+            threshold=float(app.state.threshold),
+            engine=getattr(app.state, "engine", None),
+        )
+        return PredictionResponse(proba=float(proba), prediction=int(pred), threshold=float(app.state.threshold))
+    except KeyError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=str(e))

app/ml/__init__.py ADDED Viewed

File without changes

app/ml/loader.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import json
+import os
+import joblib
+from huggingface_hub import hf_hub_download
+from app.core.config import get_settings
+def load_threshold() -> float:
+    settings = get_settings()
+    threshold_path = settings.THRESHOLD_PATH
+    if not os.path.exists(threshold_path):
+        raise FileNotFoundError(f"Threshold file not found: {threshold_path}")
+    with open(threshold_path, "r", encoding="utf-8") as f:
+        return float(json.load(f)["threshold"])
+def load_model():
+    """
+    Charge le modèle.
+    - Sur HF Spaces: on utilise HF_MODEL_REPO + HF_MODEL_FILENAME
+    - En local: on peut utiliser MODEL_PATH si tu l'as (optionnel)
+    """
+    settings = get_settings()
+    # 1) modèle local si présent
+    if settings.MODEL_PATH:
+        if not os.path.exists(settings.MODEL_PATH):
+            raise FileNotFoundError(f"Local model not found: {settings.MODEL_PATH}")
+        return joblib.load(settings.MODEL_PATH)
+    # 2) sinon HF
+    if not settings.HF_MODEL_REPO or not settings.HF_MODEL_FILENAME:
+        raise RuntimeError("HF_MODEL_REPO and/or HF_MODEL_FILENAME not set")
+    model_path = hf_hub_download(
+        repo_id=settings.HF_MODEL_REPO,
+        filename=settings.HF_MODEL_FILENAME,
+        token=settings.HF_TOKEN,  # ok même si None
+    )
+    return joblib.load(model_path)
+def load_artifacts():
+    model = load_model()
+    threshold = load_threshold()
+    return model, threshold

app/ml/predict.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import pandas as pd
+from app.ml.preprocessing import normalize_text
+# Colonnes attendues par le modèle
+FEATURE_COLUMNS = [
+    "age",
+    "genre",
+    "revenu_mensuel",
+    "statut_marital",
+    "departement",
+    "poste",
+    "nombre_experiences_precedentes",
+    "annees_dans_l_entreprise",
+    "satisfaction_employee_environnement",
+    "satisfaction_employee_nature_travail",
+    "satisfaction_employee_equipe",
+    "satisfaction_employee_equilibre_pro_perso",
+    "heure_supplementaires",
+    "augmentation_salaire_precedente",
+    "nombre_participation_pee",
+    "nb_formations_suivies",
+    "distance_domicile_travail",
+    "niveau_education",
+    "domaine_etude",
+    "frequence_deplacement",
+    "ratio_manager_anciennete",
+    "mobilite_relative",
+    "evolution_performance",
+    "pression_stagnation",
+]
+def add_features_from_raw(df: pd.DataFrame) -> pd.DataFrame:
+    """Ajoute les features calculées À PARTIR DES CHAMPS BRUTS (manuel)."""
+    df = df.copy()
+    df["ratio_manager_anciennete"] = (
+        (df["annees_sous_responsable_actuel"] + 1)
+        / (df["annees_dans_l_entreprise"] + 1)
+    )
+    mobilite_interne = df["annees_dans_l_entreprise"] - df["annees_dans_le_poste_actuel"]
+    df["mobilite_relative"] = mobilite_interne / (df["annees_dans_l_entreprise"] + 1)
+    df["evolution_performance"] = df["note_evaluation_actuelle"] - df["note_evaluation_precedente"]
+    df["pression_stagnation"] = (
+        df["annees_depuis_la_derniere_promotion"]
+        / (df["annees_dans_l_entreprise"] + 1)
+    )
+    return df
+def predict_manual(payload: dict, model, threshold: float):
+    """
+    Cas /predict (manuel) : payload = champs bruts => on calcule features.
+    """
+    df = pd.DataFrame([payload])
+    df = normalize_text(df)
+    df = add_features_from_raw(df)
+    X = df[FEATURE_COLUMNS]
+    proba = float(model.predict_proba(X)[0][1])
+    pred = int(proba >= float(threshold))
+    payload_enrichi = df.iloc[0].to_dict()
+    return proba, pred, payload_enrichi
+def predict_from_employee_features(employee_row: dict, model, threshold: float):
+    """
+    Cas /predict/{id_employee} : la table mart.employee_features doit déjà contenir
+    les colonnes calculées (ratio_manager_anciennete, etc.).
+    """
+    df = pd.DataFrame([employee_row])
+    df = normalize_text(df)
+    X = df[FEATURE_COLUMNS]
+    proba = float(model.predict_proba(X)[0][1])
+    pred = int(proba >= float(threshold))
+    payload_enrichi = df.iloc[0].to_dict()
+    return proba, pred, payload_enrichi

app/ml/preprocessing.py ADDED Viewed

	@@ -0,0 +1,18 @@

+import pandas as pd
+TEXT_COLUMNS = [
+    "genre",
+    "statut_marital",
+    "departement",
+    "poste",
+    "domaine_etude",
+    "frequence_deplacement",
+]
+def normalize_text(df: pd.DataFrame) -> pd.DataFrame:
+    df = df.copy()
+    for col in TEXT_COLUMNS:
+        if col in df.columns:
+            df[col] = df[col].astype(str).str.strip().str.lower()
+    return df

app/schemas/__init__.py ADDED Viewed

File without changes

app/schemas/prediction.py ADDED Viewed

	@@ -0,0 +1,39 @@

+from pydantic import BaseModel, Field
+class PredictionRequest(BaseModel):
+    age: int = Field(..., ge=0)
+    genre: str
+    revenu_mensuel: int = Field(..., ge=0)
+    statut_marital: str
+    departement: str
+    poste: str
+    nombre_experiences_precedentes: int = Field(..., ge=0)
+    annees_dans_l_entreprise: int = Field(..., ge=0)
+    satisfaction_employee_environnement: int = Field(..., ge=0)
+    satisfaction_employee_nature_travail: int = Field(..., ge=0)
+    satisfaction_employee_equipe: int = Field(..., ge=0)
+    satisfaction_employee_equilibre_pro_perso: int = Field(..., ge=0)
+    heure_supplementaires: bool
+    augmentation_salaire_precedente: int = Field(..., ge=0)
+    nombre_participation_pee: int = Field(..., ge=0)
+    nb_formations_suivies: int = Field(..., ge=0)
+    distance_domicile_travail: int = Field(..., ge=0)
+    niveau_education: int = Field(..., ge=0)
+    domaine_etude: str
+    frequence_deplacement: str
+    # champs BRUTS (uniquement pour la route /predict manuel)
+    annees_sous_responsable_actuel: int = Field(..., ge=0)
+    annees_dans_le_poste_actuel: int = Field(..., ge=0)
+    note_evaluation_actuelle: int
+    note_evaluation_precedente: int
+    annees_depuis_la_derniere_promotion: int = Field(..., ge=0)
+class PredictionResponse(BaseModel):
+    proba: float
+    prediction: int
+    threshold: float

app/security/__init__.py ADDED Viewed

File without changes

app/security/auth.py ADDED Viewed

	@@ -0,0 +1,14 @@

+from fastapi import Header, HTTPException
+from app.core.config import get_settings
+def require_api_key(x_api_key: str | None = Header(default=None, alias="X-API-Key")):
+    settings = get_settings()
+    if not settings.API_KEY:
+        raise HTTPException(status_code=500, detail="API_KEY not configured")
+    if x_api_key != settings.API_KEY:
+        raise HTTPException(status_code=401, detail="Unauthorized")
+    return True

app/services/__init__.py ADDED Viewed

File without changes

app/services/audit.py ADDED Viewed

	@@ -0,0 +1,31 @@

+import json
+from sqlalchemy import Connection
+from app.db.queries import SQL_INSERT_REQUEST, SQL_INSERT_RESPONSE
+def log_audit(conn: Connection, payload: dict, proba: float, prediction: int, threshold: float) -> int:
+    '''
+        log_audit sert à enregistrer une prédiction dans la base de données
+        Elle trace :
+        ce que l’API a reçu (entrée)
+        ce que le modèle a produit (sortie)
+    '''
+    req_id = conn.execute(
+        SQL_INSERT_REQUEST,
+        {"payload": json.dumps(payload, ensure_ascii=False, default=str)},
+    ).scalar_one()#récupère l’id généré par la base (clé primaire)
+    conn.execute(
+        SQL_INSERT_RESPONSE,
+        {
+            "request_id": req_id,
+            "proba": float(proba),
+            "prediction": int(prediction),
+            "threshold": float(threshold),
+        },
+    )
+    return int(req_id)

app/services/features.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from app.db.queries import SQL_GET_EMPLOYEE_FEATURES
+def get_employee_features_by_id(engine, id_employee: int) -> dict | None:
+    """
+    Récupère une ligne depuis mart.employee_features pour un id_employee.
+    Retourne un dict ou None si absent.
+    """
+    if engine is None:
+        raise RuntimeError("DATABASE_URL non configurée (engine = None).")
+    with engine.connect() as conn:
+        row = (
+            conn.execute(SQL_GET_EMPLOYEE_FEATURES, {"id_employee": id_employee})
+            .mappings()
+            .first()
+        )
+        return dict(row) if row else None

app/services/predict.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from app.services.audit import log_audit
+from app.services.features import get_employee_features_by_id
+from app.ml.predict import predict_manual, predict_from_employee_features
+def run_predict_manual(payload: dict, model, threshold: float, engine):
+    '''
+    Gère une prédiction à partir des données envoyées par l’utilisateur.
+    '''
+    proba, pred, payload_enrichi = predict_manual(payload, model, threshold)
+    if engine is not None:
+        with engine.begin() as conn:
+            log_audit(conn, payload_enrichi, proba, pred, threshold)
+    return proba, pred, payload_enrichi
+def run_predict_by_id(id_employee: int, model, threshold: float, engine):
+    '''
+    Gère une prédiction à partir d’un employé existant en base.
+    '''
+    employee = get_employee_features_by_id(engine, id_employee)
+    if employee is None:
+        raise KeyError(f"id_employee {id_employee} introuvable dans mart.employee_features")
+    proba, pred, payload_enrichi = predict_from_employee_features(employee, model, threshold)
+    if engine is not None:
+        with engine.begin() as conn:
+            payload_enrichi["id_employee"] = id_employee
+            log_audit(conn, payload_enrichi, proba, pred, threshold)
+    return proba, pred, payload_enrichi

config/threshold.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ { "threshold": 0.292 }

db/01_schema.sql ADDED Viewed

	@@ -0,0 +1,15 @@

+-- =====================================================
+-- Création des schémas du projet Technova ML API
+-- =====================================================
+-- Schéma pour les données brutes (RAW)
+CREATE SCHEMA IF NOT EXISTS raw;
+-- Schéma pour les données nettoyées / intermédiaires
+CREATE SCHEMA IF NOT EXISTS staging;
+-- Schéma pour les données finales utilisées par le modèle
+CREATE SCHEMA IF NOT EXISTS mart;
+-- Schéma pour l’audit et la traçabilité (API, prédictions)
+CREATE SCHEMA IF NOT EXISTS audit;

db/02_raw_tables.sql ADDED Viewed

	@@ -0,0 +1,58 @@

+-- =====================================================
+-- Tables RAW - données brutes sans transformation
+-- =====================================================
+-- -------------------------------
+-- Table extrait_sirh
+-- -------------------------------
+DROP TABLE IF EXISTS raw.extrait_sirh;
+CREATE TABLE raw.extrait_sirh (
+    id_employee INTEGER,
+    age INTEGER,
+    genre TEXT,
+    revenu_mensuel INTEGER,
+    statut_marital TEXT,
+    departement TEXT,
+    poste TEXT,
+    nombre_experiences_precedentes INTEGER,
+    nombre_heures_travailless INTEGER,
+    annee_experience_totale INTEGER,
+    annees_dans_l_entreprise INTEGER,
+    annees_dans_le_poste_actuel INTEGER
+);
+-- -------------------------------
+-- Table extrait_eval
+-- -------------------------------
+DROP TABLE IF EXISTS raw.extrait_eval;
+CREATE TABLE raw.extrait_eval (
+    satisfaction_employee_environnement INTEGER,
+    note_evaluation_precedente INTEGER,
+    niveau_hierarchique_poste INTEGER,
+    satisfaction_employee_nature_travail INTEGER,
+    satisfaction_employee_equipe INTEGER,
+    satisfaction_employee_equilibre_pro_perso INTEGER,
+    eval_number TEXT,
+    note_evaluation_actuelle INTEGER,
+    heure_supplementaires TEXT,
+    augementation_salaire_precedente TEXT
+);
+-- -------------------------------
+-- Table extrait_sondage
+-- -------------------------------
+DROP TABLE IF EXISTS raw.extrait_sondage;
+CREATE TABLE raw.extrait_sondage (
+    a_quitte_l_entreprise TEXT,
+    nombre_participation_pee INTEGER,
+    nb_formations_suivies INTEGER,
+    nombre_employee_sous_responsabilite INTEGER,
+    code_sondage INTEGER,
+    distance_domicile_travail INTEGER,
+    niveau_education INTEGER,
+    domaine_etude TEXT,
+    ayant_enfants TEXT,
+    frequence_deplacement TEXT,
+    annees_depuis_la_derniere_promotion INTEGER,
+    annes_sous_responsable_actuel INTEGER
+);

db/03_load_raw.sql ADDED Viewed

	@@ -0,0 +1,35 @@

+-- =====================================================
+-- 03_load_raw.sql — Chargement des données BRUTES
+-- =====================================================
+-- Nettoyage avant rechargement (idempotent)
+TRUNCATE TABLE raw.extrait_sirh;
+TRUNCATE TABLE raw.extrait_eval;
+TRUNCATE TABLE raw.extrait_sondage;
+-- -------------------------
+-- Chargement SIRH
+-- -------------------------
+COPY raw.extrait_sirh
+FROM 'C:/Users/yoann/OneDrive/Documents/OpenClassrooms/Déployez un modèle de Machine Learning/technova-ml-api/data/extrait_sirh.csv'
+DELIMITER ';'
+CSV HEADER
+ENCODING 'UTF8';
+-- -------------------------
+-- Chargement EVAL
+-- -------------------------
+COPY raw.extrait_eval
+FROM 'C:/Users/yoann/OneDrive/Documents/OpenClassrooms/Déployez un modèle de Machine Learning/technova-ml-api/data/extrait_eval.csv'
+DELIMITER ';'
+CSV HEADER
+ENCODING 'UTF8';
+-- -------------------------
+-- Chargement SONDAGE
+-- -------------------------
+COPY raw.extrait_sondage
+FROM 'C:/Users/yoann/OneDrive/Documents/OpenClassrooms/Déployez un modèle de Machine Learning/technova-ml-api/data/extrait_sondage.csv'
+DELIMITER ';'
+CSV HEADER
+ENCODING 'UTF8';

db/04_staging.sql ADDED Viewed

	@@ -0,0 +1,119 @@

+-- =====================================================
+-- 04_staging.sql — Nettoyage + normalisation (STAGING)
+-- =====================================================
+-- Sécurité: recréer proprement
+DROP TABLE IF EXISTS staging.sirh_clean;
+DROP TABLE IF EXISTS staging.eval_clean;
+DROP TABLE IF EXISTS staging.sondage_clean;
+DROP TABLE IF EXISTS staging.employee_base;
+-- -------------------------
+-- 1) SIRH
+-- - supprimer nombre_heures_travailless (valeur constante 80)
+-- - normaliser genre (Homme/Femme vs M/F)
+-- -------------------------
+CREATE TABLE staging.sirh_clean AS
+SELECT
+    id_employee,
+    age,
+    CASE
+        WHEN lower(trim(genre)) IN ('h', 'homme', 'm') THEN 'Homme'
+        WHEN lower(trim(genre)) IN ('f', 'femme') THEN 'Femme'
+        ELSE NULL
+    END AS genre,
+    revenu_mensuel,
+    statut_marital,
+    departement,
+    poste,
+    nombre_experiences_precedentes,
+    annee_experience_totale,
+    annees_dans_l_entreprise,
+    annees_dans_le_poste_actuel
+FROM raw.extrait_sirh;
+-- -------------------------
+-- 2) EVAL
+-- - heure_supplementaires -> bool
+-- - augmentation salaire -> numérique (retirer %)
+-- - eval_number: retirer "e_" -> int
+-- - renommer eval_number -> id_employee
+-- -------------------------
+CREATE TABLE staging.eval_clean AS
+SELECT
+    CAST(replace(lower(trim(eval_number)), 'e_', '') AS INT) AS id_employee,
+    satisfaction_employee_environnement,
+    note_evaluation_precedente,
+    niveau_hierarchique_poste,
+    satisfaction_employee_nature_travail,
+    satisfaction_employee_equipe,
+    satisfaction_employee_equilibre_pro_perso,
+    note_evaluation_actuelle,
+    CASE
+        WHEN lower(trim(heure_supplementaires)) IN ('yes', 'y', 'oui', 'true', '1') THEN TRUE
+        WHEN lower(trim(heure_supplementaires)) IN ('no', 'n', 'non', 'false', '0') THEN FALSE
+        ELSE NULL
+    END AS heure_supplementaires,
+    NULLIF(REPLACE(TRIM(augementation_salaire_precedente), '%', ''), '')::INT AS augmentation_salaire_precedente
+FROM raw.extrait_eval;
+-- -------------------------
+-- 3) SONDAGE
+-- - supprimer ayant_enfants (constante Y)
+-- - supprimer nombre_employee_sous_responsabilite (constante 1)
+-- - a_quitte_l_entreprise -> bool
+-- - code_sondage -> id_employee
+-- - annes_sous_responsable_actuel -> annees_sous_responsable_actuel
+-- -------------------------
+CREATE TABLE staging.sondage_clean AS
+SELECT
+    code_sondage AS id_employee,
+    CASE
+        WHEN lower(trim(a_quitte_l_entreprise)) IN ('yes', 'y', 'oui', 'true', '1') THEN TRUE
+        WHEN lower(trim(a_quitte_l_entreprise)) IN ('no', 'n', 'non', 'false', '0') THEN FALSE
+        ELSE NULL
+    END AS a_quitte_l_entreprise,
+    nombre_participation_pee,
+    nb_formations_suivies,
+    distance_domicile_travail,
+    niveau_education,
+    domaine_etude,
+    frequence_deplacement,
+    annees_depuis_la_derniere_promotion,
+    annes_sous_responsable_actuel AS annees_sous_responsable_actuel
+FROM raw.extrait_sondage;
+-- -------------------------
+-- 4) Jointure STAGING (1 ligne = 1 employé)
+-- -------------------------
+CREATE TABLE staging.employee_base AS
+SELECT
+    s.*,
+    e.satisfaction_employee_environnement,
+    e.note_evaluation_precedente,
+    e.niveau_hierarchique_poste,
+    e.satisfaction_employee_nature_travail,
+    e.satisfaction_employee_equipe,
+    e.satisfaction_employee_equilibre_pro_perso,
+    e.note_evaluation_actuelle,
+    e.heure_supplementaires,
+    e.augmentation_salaire_precedente,
+    so.a_quitte_l_entreprise,
+    so.nombre_participation_pee,
+    so.nb_formations_suivies,
+    so.distance_domicile_travail,
+    so.niveau_education,
+    so.domaine_etude,
+    so.frequence_deplacement,
+    so.annees_depuis_la_derniere_promotion,
+    so.annees_sous_responsable_actuel
+FROM staging.sirh_clean s
+LEFT JOIN staging.eval_clean e USING (id_employee)
+LEFT JOIN staging.sondage_clean so USING (id_employee);

db/05_mart.sql ADDED Viewed

	@@ -0,0 +1,74 @@

+-- =====================================================
+-- MART — Dataset final pour le modèle ML
+-- =====================================================
+CREATE SCHEMA IF NOT EXISTS mart;
+DROP TABLE IF EXISTS mart.employee_features;
+CREATE TABLE mart.employee_features AS
+SELECT
+    -- Identifiant
+    id_employee,
+    -- =========================
+    -- Variables de base (X)
+    -- =========================
+    age,
+    lower(trim(genre)) AS genre,
+    revenu_mensuel,
+    lower(trim(statut_marital)) AS statut_marital,
+    lower(trim(departement)) AS departement,
+    lower(trim(poste)) AS poste,
+    nombre_experiences_precedentes,
+    annees_dans_l_entreprise,
+    satisfaction_employee_environnement,
+    satisfaction_employee_nature_travail,
+    satisfaction_employee_equipe,
+    satisfaction_employee_equilibre_pro_perso,
+    heure_supplementaires,
+    augmentation_salaire_precedente,
+    nombre_participation_pee,
+    nb_formations_suivies,
+    distance_domicile_travail,
+    niveau_education,
+    lower(trim(domaine_etude)) AS domaine_etude,
+    lower(trim(frequence_deplacement)) AS frequence_deplacement,
+    -- =========================
+    -- Features calculées
+    -- =========================
+    (
+      (COALESCE(annees_sous_responsable_actuel, 0) + 1)::double precision
+      /
+      (COALESCE(annees_dans_l_entreprise, 0) + 1)::double precision
+    ) AS ratio_manager_anciennete,
+    (
+      (COALESCE(annees_dans_l_entreprise, 0) - COALESCE(annees_dans_le_poste_actuel, 0))::double precision
+      /
+      (COALESCE(annees_dans_l_entreprise, 0) + 1)::double precision
+    ) AS mobilite_relative,
+    (COALESCE(note_evaluation_actuelle, 0) - COALESCE(note_evaluation_precedente, 0)) AS evolution_performance,
+    (
+      COALESCE(annees_depuis_la_derniere_promotion, 0)::double precision
+      /
+      (COALESCE(annees_dans_l_entreprise, 0) + 1)::double precision
+    ) AS pression_stagnation,
+    -- =========================
+    -- Target (y)
+    -- =========================
+    a_quitte_l_entreprise
+FROM staging.employee_base;
+CREATE INDEX IF NOT EXISTS idx_mart_employee_features_id
+ON mart.employee_features(id_employee);

db/06_audit.sql ADDED Viewed

	@@ -0,0 +1,44 @@

+-- =====================================================
+-- AUDIT : Traçabilité des appels API / prédictions
+-- =====================================================
+CREATE SCHEMA IF NOT EXISTS audit;
+-- 1) Requêtes (inputs envoyés au modèle)
+DROP TABLE IF EXISTS audit.prediction_requests;
+CREATE TABLE audit.prediction_requests (
+    request_id    BIGSERIAL PRIMARY KEY,
+    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    id_employee   INT NULL,
+    payload       JSONB NOT NULL
+);
+-- 2) Réponses (outputs générés par le modèle)
+DROP TABLE IF EXISTS audit.prediction_responses;
+CREATE TABLE audit.prediction_responses (
+    response_id    BIGSERIAL PRIMARY KEY,
+    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    request_id     BIGINT NOT NULL
+                 REFERENCES audit.prediction_requests(request_id)
+                 ON DELETE CASCADE,
+    proba          DOUBLE PRECISION NOT NULL,
+    prediction     INT NOT NULL,
+    threshold      DOUBLE PRECISION NOT NULL,
+    status         TEXT NOT NULL DEFAULT 'OK',
+    error_message  TEXT NULL
+);
+-- Index utiles
+CREATE INDEX IF NOT EXISTS idx_pred_req_employee
+  ON audit.prediction_requests(id_employee);
+CREATE INDEX IF NOT EXISTS idx_pred_req_created_at
+  ON audit.prediction_requests(created_at);
+CREATE INDEX IF NOT EXISTS idx_pred_res_request_id
+  ON audit.prediction_responses(request_id);

db/README_SQL.md ADDED Viewed

	@@ -0,0 +1,167 @@

+# Technova ML API – Documentation SQL & Base de Données
+## Objectif
+Ce document décrit en détail l’architecture de la base de données PostgreSQL utilisée par **Technova ML API**.
+La base est organisée selon une approche analytique en couches : **RAW → STAGING → MART → AUDIT**.
+---
+### Initialisation de la base de données
+Les scripts SQL doivent être exécutés dans l’ordre suivant afin de garantir
+la cohérence des données et des dépendances entre les schémas :
+1. `schema.sql` : création des schémas PostgreSQL (raw, staging, mart, audit)
+2. `raw.sql` : création des tables de données brutes
+3. `load_raw.sql` : chargement des données sources
+4. `staging.sql` : nettoyage, normalisation et jointure des données
+5. `mart.sql` : création du dataset final pour le modèle ML
+6. `audit.sql` : création des tables de traçabilité des prédictions
+Cet ordre permet d’assurer l’intégrité des données et la reproductibilité
+du pipeline de traitement.
+Les scripts sont conçus pour être idempotents
+(`DROP TABLE IF EXISTS`, `TRUNCATE`) afin de permettre
+une réexécution sans effet de bord.
+## Vue d’ensemble du pipeline de données
+```text
+RAW
+ ├─ extrait_sirh
+ ├─ extrait_eval
+ └─ extrait_sondage
+        │
+        ▼
+STAGING
+ └─ employee_base
+        │
+        ▼
+MART
+ └─ employee_features
+        │
+        ├─ utilisé par le modèle de Machine Learning
+        ▼
+AUDIT
+ ├─ prediction_requests
+ └─ prediction_responses
+```
+---
+## Diagramme UML – Modèle de données (ERD)
+```mermaid
+erDiagram
+    RAW_EXTRAIT_SIRH {
+        INT id_employee
+        INT age
+        TEXT genre
+        INT revenu_mensuel
+        TEXT statut_marital
+        TEXT departement
+        TEXT poste
+        INT nombre_experiences_precedentes
+        INT annee_experience_totale
+        INT annees_dans_l_entreprise
+        INT annees_dans_le_poste_actuel
+    }
+    RAW_EXTRAIT_EVAL {
+        TEXT eval_number
+        INT satisfaction_employee_environnement
+        INT note_evaluation_precedente
+        INT niveau_hierarchique_poste
+        INT satisfaction_employee_nature_travail
+        INT satisfaction_employee_equipe
+        INT satisfaction_employee_equilibre_pro_perso
+        INT note_evaluation_actuelle
+        TEXT heure_supplementaires
+        TEXT augementation_salaire_precedente
+    }
+    RAW_EXTRAIT_SONDAGE {
+        INT code_sondage
+        TEXT a_quitte_l_entreprise
+        INT nombre_participation_pee
+        INT nb_formations_suivies
+        INT distance_domicile_travail
+        INT niveau_education
+        TEXT domaine_etude
+        TEXT frequence_deplacement
+        INT annees_depuis_la_derniere_promotion
+        INT annes_sous_responsable_actuel
+    }
+    STAGING_EMPLOYEE_BASE {
+        INT id_employee
+        INT age
+        TEXT genre
+        INT revenu_mensuel
+        TEXT statut_marital
+        TEXT departement
+        TEXT poste
+        BOOLEAN heure_supplementaires
+        BOOLEAN a_quitte_l_entreprise
+    }
+    MART_EMPLOYEE_FEATURES {
+        INT id_employee
+        FLOAT ratio_manager_anciennete
+        FLOAT mobilite_relative
+        INT evolution_performance
+        FLOAT pression_stagnation
+        BOOLEAN a_quitte_l_entreprise
+    }
+    AUDIT_PREDICTION_REQUESTS {
+        BIGINT request_id PK
+        TIMESTAMPTZ created_at
+        INT id_employee
+        JSONB payload
+    }
+    AUDIT_PREDICTION_RESPONSES {
+        BIGINT response_id PK
+        BIGINT request_id FK
+        FLOAT proba
+        INT prediction
+        FLOAT threshold
+    }
+    RAW_EXTRAIT_SIRH }o--|| STAGING_EMPLOYEE_BASE : clean
+    RAW_EXTRAIT_EVAL }o--|| STAGING_EMPLOYEE_BASE : clean
+    RAW_EXTRAIT_SONDAGE }o--|| STAGING_EMPLOYEE_BASE : clean
+    STAGING_EMPLOYEE_BASE ||--|| MART_EMPLOYEE_FEATURES : feature_engineering
+    AUDIT_PREDICTION_REQUESTS ||--o{ AUDIT_PREDICTION_RESPONSES : logs
+```
+---
+## Description des couches
+### RAW
+- Données brutes issues de différentes sources RH.
+- Aucune transformation.
+- Chargement via scripts SQL (`COPY`).
+### STAGING
+- Nettoyage des valeurs.
+- Normalisation des types.
+- Jointure des sources autour de `id_employee`.
+### MART
+- Dataset final utilisé par le modèle de Machine Learning.
+- Features calculées (ratios, évolutions, indicateurs).
+- Contient la cible `a_quitte_l_entreprise`.
+### AUDIT
+- Journalisation des appels API.
+- Séparation claire entre requêtes et réponses.
+- Garantit la traçabilité et l’auditabilité des prédictions.
+---

encoder/__init__.py ADDED Viewed

File without changes

encoder/custom_encoder.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import pandas as pd
+import numpy as np
+from sklearn.base import BaseEstimator, TransformerMixin
+from sklearn.preprocessing import OneHotEncoder
+class CustomEncoder(BaseEstimator, TransformerMixin):
+    def __init__(self, bool_cols=None, cat_onehot_cols=None, num_cols=None):
+        self.bool_cols = bool_cols or []
+        self.cat_onehot_cols = cat_onehot_cols or []
+        self.num_cols = num_cols or []
+    def fit(self, X, y=None):
+        # Stockage des colonnes
+        self.bool_cols_ = list(self.bool_cols)
+        self.cat_onehot_cols_ = list(self.cat_onehot_cols)
+        self.num_cols_ = list(self.num_cols)
+        # OneHot
+        self.ohe_ = OneHotEncoder(handle_unknown="ignore", sparse_output=False)
+        if self.cat_onehot_cols_:
+            self.ohe_.fit(X[self.cat_onehot_cols_])
+        return self
+    def transform(self, X):
+        parts = []
+        # Booléens
+        if self.bool_cols_:
+            df_bool = X[self.bool_cols_].astype(int)
+            parts.append(df_bool)
+        # Numériques
+        if self.num_cols_:
+            df_num = X[self.num_cols_]
+            parts.append(df_num)
+        # OneHot
+        if self.cat_onehot_cols_:
+            ohe_data = self.ohe_.transform(X[self.cat_onehot_cols_])
+            ohe_df = pd.DataFrame(
+                ohe_data,
+                columns=self.ohe_.get_feature_names_out(self.cat_onehot_cols_),
+                index=X.index
+            )
+            parts.append(ohe_df)
+        # Fusion
+        df_final = pd.concat(parts, axis=1)
+        # Stockage des colonnes finales (utile pour FI)
+        self.feature_names_ = df_final.columns.tolist()
+        return df_final

requirements.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+fastapi
+uvicorn
+pydantic
+pytest
+httpx
+pandas
+joblib
+numpy
+huggingface_hub
+scikit-learn==1.6.1
+xgboost==3.1.2
+SQLAlchemy>=2.0
+psycopg[binary]
+python-dotenv
+pytest-cov

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,22 @@

+import pytest
+from fastapi.testclient import TestClient
+from app.core.config import get_settings
+@pytest.fixture
+def client(monkeypatch):
+    monkeypatch.setenv("APP_ENV", "test")      # <-- clé du fix
+    monkeypatch.setenv("API_KEY", "test-key")
+    get_settings.cache_clear()
+    from app.main import app
+    with TestClient(app) as c:
+        yield c
+@pytest.fixture
+def auth_headers():
+    return {"X-API-Key": "test-key"}

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,108 @@

+import pytest
+# Payload valide pour /predict
+PAYLOAD_OK = {
+  "age": 41,
+  "genre": "homme",
+  "revenu_mensuel": 3993,
+  "statut_marital": "célibataire",
+  "departement": "commercial",
+  "poste": "cadre commercial",
+  "nombre_experiences_precedentes": 2,
+  "annees_dans_l_entreprise": 5,
+  "satisfaction_employee_environnement": 4,
+  "satisfaction_employee_nature_travail": 1,
+  "satisfaction_employee_equipe": 1,
+  "satisfaction_employee_equilibre_pro_perso": 1,
+  "heure_supplementaires": True,
+  "augmentation_salaire_precedente": 11,
+  "nombre_participation_pee": 0,
+  "nb_formations_suivies": 0,
+  "distance_domicile_travail": 1,
+  "niveau_education": 2,
+  "domaine_etude": "infra & cloud",
+  "frequence_deplacement": "occasionnel",
+  "annees_sous_responsable_actuel": 0,
+  "annees_dans_le_poste_actuel": 0,
+  "note_evaluation_actuelle": 0,
+  "note_evaluation_precedente": 0,
+  "annees_depuis_la_derniere_promotion": 0
+}
+# -------------------------------------------------------------------
+# Utilitaire : injecter un état minimal dans l'app (pas de HF / pas de DB)
+# -------------------------------------------------------------------
+def _inject_dummy_state():
+    from app.main import app
+    class DummyModel:
+        def predict_proba(self, X):
+            return [[0.2, 0.8]]  # proba classe 1
+    app.state.model = DummyModel()
+    app.state.threshold = 0.292
+    app.state.engine = None
+# =========================
+# /predict (POST)
+# =========================
+def test_post_predict_unauthorized_without_api_key(client):
+    r = client.post("/predict", json=PAYLOAD_OK)
+    assert r.status_code == 401
+def test_post_predict_unauthorized_with_wrong_api_key(client):
+    r = client.post(
+        "/predict",
+        json=PAYLOAD_OK,
+        headers={"X-API-Key": "WRONG"},
+    )
+    assert r.status_code == 401
+def test_post_predict_ok_with_api_key(client, auth_headers):
+    _inject_dummy_state()
+    r = client.post("/predict", json=PAYLOAD_OK, headers=auth_headers)
+    assert r.status_code == 200, r.text
+    body = r.json()
+    assert body["threshold"] == 0.292
+    assert body["prediction"] in (0, 1)
+    assert "proba" in body
+# =========================
+# /predict/{id} (GET)
+# =========================
+def test_get_predict_by_id_unauthorized_without_api_key(client):
+    r = client.get("/predict/7")
+    assert r.status_code == 401
+def test_get_predict_by_id_unauthorized_with_wrong_api_key(client):
+    r = client.get("/predict/7", headers={"X-API-Key": "WRONG"})
+    assert r.status_code == 401
+def test_get_predict_by_id_ok_with_api_key(client, auth_headers, monkeypatch):
+    _inject_dummy_state()
+    import app.main as main_module
+    def fake_run_predict_by_id(*, id_employee, model, threshold, engine):
+        return 0.55, 1, {"id_employee": id_employee}
+    monkeypatch.setattr(main_module, "run_predict_by_id", fake_run_predict_by_id)
+    r = client.get("/predict/7", headers=auth_headers)
+    assert r.status_code == 200, r.text
+    body = r.json()
+    assert body["threshold"] == 0.292
+    assert body["prediction"] in (0, 1)
+    assert "proba" in body

tests/test_audit.py ADDED Viewed

	@@ -0,0 +1,20 @@

+def test_log_audit_returns_request_id(monkeypatch):
+    from app.services.audit import log_audit
+    class DummyResult:
+        def scalar_one(self):
+            return 42
+    class DummyConn:
+        def execute(self, *args, **kwargs):
+            return DummyResult()
+    req_id = log_audit(
+        conn=DummyConn(),
+        payload={"a": 1},
+        proba=0.7,
+        prediction=1,
+        threshold=0.3,
+    )
+    assert req_id == 42

tests/test_engine.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from app.db.engine import get_engine
+from app.core.config import get_settings
+def test_get_engine_without_database_url(monkeypatch):
+    monkeypatch.delenv("DATABASE_URL", raising=False)
+    get_settings.cache_clear()
+    engine = get_engine()
+    assert engine is None

tests/test_feature.py ADDED Viewed

	@@ -0,0 +1,24 @@

+def test_get_employee_features_not_found(monkeypatch):
+    from app.services.features import get_employee_features_by_id
+    class DummyResult:
+        def mappings(self):
+            return self
+        def first(self):
+            return None
+    class DummyConn:
+        def execute(self, *args, **kwargs):
+            return DummyResult()
+    class DummyEngine:
+        def connect(self):
+            return self
+        def __enter__(self):
+            return DummyConn()
+        def __exit__(self, *args):
+            pass
+    result = get_employee_features_by_id(DummyEngine(), 999)
+    assert result is None

tests/test_health.py ADDED Viewed

	@@ -0,0 +1,9 @@

+def test_health_ok(client):
+    r = client.get("/health")
+    assert r.status_code == 200
+    data = r.json()
+    assert data["status"] == "ok"
+    assert "model_loaded" in data
+    assert "threshold" in data
+    assert "db_configured" in data

tests/test_predict.py ADDED Viewed

	@@ -0,0 +1,153 @@

+# tests/test_services_predict.py
+import pytest
+def test_run_predict_manual_without_engine(monkeypatch):
+    """
+    Cas simple : engine=None => pas d'audit, on renvoie proba/pred/payload enrichi.
+    """
+    from app.services import predict as predict_service
+    # Fake predict_manual (ML)
+    def fake_predict_manual(payload, model, threshold):
+        return 0.8, 1, {"x": 1, "enrich": True}
+    monkeypatch.setattr(predict_service, "predict_manual", fake_predict_manual)
+    proba, pred, payload_enrichi = predict_service.run_predict_manual(
+        payload={"x": 1},
+        model=object(),
+        threshold=0.3,
+        engine=None,
+    )
+    assert proba == 0.8
+    assert pred == 1
+    assert payload_enrichi["enrich"] is True
+def test_run_predict_manual_with_engine_calls_audit(monkeypatch):
+    """
+    Cas engine présent : log_audit doit être appelé.
+    """
+    from app.services import predict as predict_service
+    # Fake predict_manual
+    def fake_predict_manual(payload, model, threshold):
+        return 0.2, 0, {"foo": "bar"}
+    monkeypatch.setattr(predict_service, "predict_manual", fake_predict_manual)
+    # Spy log_audit
+    calls = {"count": 0, "args": None}
+    def fake_log_audit(conn, payload, proba, prediction, threshold):
+        calls["count"] += 1
+        calls["args"] = (conn, payload, proba, prediction, threshold)
+        return 123
+    monkeypatch.setattr(predict_service, "log_audit", fake_log_audit)
+    # Dummy engine.begin() context manager
+    class DummyEngine:
+        def begin(self):
+            return self
+        def __enter__(self):
+            return "dummy-conn"
+        def __exit__(self, exc_type, exc, tb):
+            return False
+    proba, pred, payload_enrichi = predict_service.run_predict_manual(
+        payload={"hello": "world"},
+        model=object(),
+        threshold=0.292,
+        engine=DummyEngine(),
+    )
+    assert proba == 0.2
+    assert pred == 0
+    assert payload_enrichi == {"foo": "bar"}
+    assert calls["count"] == 1
+    assert calls["args"][0] == "dummy-conn"
+    assert calls["args"][1] == {"foo": "bar"}
+    assert calls["args"][2] == 0.2
+    assert calls["args"][3] == 0
+    assert calls["args"][4] == 0.292
+def test_run_predict_by_id_not_found_raises_keyerror(monkeypatch):
+    """
+    Cas id absent : get_employee_features_by_id renvoie None => KeyError attendu.
+    """
+    from app.services import predict as predict_service
+    def fake_get_employee_features_by_id(engine, id_employee):
+        return None
+    monkeypatch.setattr(predict_service, "get_employee_features_by_id", fake_get_employee_features_by_id)
+    with pytest.raises(KeyError):
+        predict_service.run_predict_by_id(
+            id_employee=999,
+            model=object(),
+            threshold=0.5,
+            engine=object(),
+        )
+def test_run_predict_by_id_with_engine_calls_audit_and_adds_id(monkeypatch):
+    """
+    Cas nominal : on récupère un employé, on prédit, on log en audit,
+    et on ajoute id_employee dans payload_enrichi avant log.
+    """
+    from app.services import predict as predict_service
+    # Fake features fetch
+    def fake_get_employee_features_by_id(engine, id_employee):
+        return {"id_employee": id_employee, "age": 40}
+    monkeypatch.setattr(predict_service, "get_employee_features_by_id", fake_get_employee_features_by_id)
+    # Fake predict_from_employee_features
+    def fake_predict_from_employee_features(employee, model, threshold):
+        # payload enrichi sans id -> le service doit l'ajouter
+        return 0.55, 1, {"age": employee["age"]}
+    monkeypatch.setattr(predict_service, "predict_from_employee_features", fake_predict_from_employee_features)
+    # Spy log_audit
+    calls = {"count": 0, "payload": None}
+    def fake_log_audit(conn, payload, proba, prediction, threshold):
+        calls["count"] += 1
+        calls["payload"] = payload
+        return 456
+    monkeypatch.setattr(predict_service, "log_audit", fake_log_audit)
+    class DummyEngine:
+        def begin(self):
+            return self
+        def __enter__(self):
+            return "dummy-conn"
+        def __exit__(self, exc_type, exc, tb):
+            return False
+    proba, pred, payload_enrichi = predict_service.run_predict_by_id(
+        id_employee=7,
+        model=object(),
+        threshold=0.292,
+        engine=DummyEngine(),
+    )
+    assert proba == 0.55
+    assert pred == 1
+    assert payload_enrichi["age"] == 40
+    assert payload_enrichi["id_employee"] == 7
+    assert calls["count"] == 1
+    assert calls["payload"]["id_employee"] == 7