OC_P5 / README.md
KLEB38's picture
Updated README file
49ff53a
metadata
title: HR Attrition Prediction API - Futurisys
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

Contributors Forks Stargazers Issues


HR Attrition Prediction API — Futurisys

A production-grade REST API that predicts employee attrition using a Gradient Boosting pipeline with SHAP explainability, deployed on Hugging Face Spaces.
Explore the docs »

View Live Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Running Tests
  5. Contact
  6. Acknowledgments

About The Project

Futurisys is a tech consulting firm used as the business context for this OpenClassrooms Data Science project (Project 5). The objective is to help HR departments proactively identify employees at risk of attrition before they leave.

This project delivers a complete, containerised ML system:

  • A Gradient Boosting classifier trained on HR data and serialised into a Scikit-Learn pipeline with automated preprocessing
  • Custom feature engineering — overall satisfaction score, expertise inconsistency (department vs. study domain mismatch), managerial stagnation, and development stagnation signals
  • A custom classification threshold of 0.37 (tuned for recall on the attrition class rather than the default 0.50)
  • A FastAPI REST API with full input validation via Pydantic, exposing two prediction modes: submit raw employee data (POST /predict) or look up an existing employee by ID (GET /predict/{id_employee})
  • SHAP-based explainability — every prediction is accompanied by the top 5 most influential features and their direction of impact
  • PostgreSQL prediction logging — every prediction (inputs, result, SHAP factors) is automatically stored in a predictions_log table for auditability
  • A complete CI/CD pipeline via GitHub Actions that runs the full test suite on every push before deploying to Hugging Face Spaces

(back to top)

Built With

  • Python
  • FastAPI
  • scikit-learn
  • pandas
  • SHAP
  • PostgreSQL
  • SQLAlchemy
  • Docker
  • GitHub Actions

(back to top)


Getting Started

Prerequisites

  • Python 3.13+
  • A PostgreSQL database (local or remote)
  • Docker (for containerised deployment)
  • Git

Installation

Option 1 — Run locally with Python

  1. Clone the repository
    git clone https://github.com/KL38/OC_P5_v2.git
    cd OC_P5_v2
    
  2. Install dependencies
    pip install -r requirements.txt
    
  3. Create a .env file at the project root with your database connection string
    DATABASE_URL=postgresql://user:password@host:5432/dbname
    
  4. Start the API
    uvicorn app.main:app --reload
    
    The API is available at http://127.0.0.1:8000. The interactive Swagger UI is at http://127.0.0.1:8000/docs.

Option 2 — Run with Docker

  1. Build the image
    docker build -t futurisys-api .
    
  2. Run the container, passing the database URL as an environment variable
    docker run -p 7860:7860 -e DATABASE_URL=postgresql://user:password@host:5432/dbname futurisys-api
    
    The API is available at http://localhost:7860.

(back to top)


Usage

Endpoints

Method Endpoint Description
GET / Health check — returns a welcome message
GET /predict/{id_employee} Fetches an employee from the database by ID and returns a prediction
POST /predict Predicts employee attrition risk from a submitted JSON payload

POST /predict — Input Schema

All fields use their French alias as the JSON key.

JSON key Type Accepted values / notes
Genre string "M" or "F"
Statut Marital string "Marié(e)", "Célibataire", "Divorcé(e)"
Département string "Consulting", "Commercial", "Ressources Humaines"
Poste string "Consultant", "Manager", "Tech Lead", …
Domaine d'étude string "Infra & Cloud", "Marketing", "Ressources Humaines", …
Fréquence de déplacement string "Aucun", "Occasionnel", "Frequent"
Heures supplémentaires string "Oui" or "Non"
Âge int
Revenu mensuel int
Nombre d'expériences précédentes int
Années d'expérience totale int
Années dans l'entreprise int
Années dans le poste actuel int
Nombre de formations suivies int
Distance domicile-travail int
Niveau d'éducation int
Années depuis la dernière promotion int
Années sous responsable actuel int
Satisfaction environnement int 1–4
Satisfaction nature du travail int 1–4
Satisfaction équipe int 1–4
Satisfaction équilibre pro/perso int 1–4
Note évaluation précédente int 1–4
Note évaluation actuelle int 1–4
Augmentation salaire précédente string Percentage as string, e.g. "18%"

Example Request

curl -X POST "http://localhost:7860/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "Genre": "M",
    "Statut Marital": "Marié(e)",
    "Département": "Consulting",
    "Poste": "Consultant",
    "Domaine d'\''étude": "Infra & Cloud",
    "Fréquence de déplacement": "Occasionnel",
    "Heures supplémentaires": "Non",
    "Âge": 32,
    "Revenu mensuel": 4883,
    "Nombre d'\''expériences précédentes": 1,
    "Années d'\''expérience totale": 10,
    "Années dans l'\''entreprise": 10,
    "Années dans le poste actuel": 4,
    "Nombre de formations suivies": 3,
    "Distance domicile-travail": 7,
    "Niveau d'\''éducation": 2,
    "Années depuis la dernière promotion": 1,
    "Années sous responsable actuel": 1,
    "Satisfaction environnement": 4,
    "Note évaluation précédente": 3,
    "Satisfaction nature du travail": 3,
    "Satisfaction équipe": 1,
    "Satisfaction équilibre pro/perso": 3,
    "Note évaluation actuelle": 3,
    "Augmentation salaire précédente": "18%"
  }'

Example Response

{
  "statut_employe": "The staff has a LOW probability of resigning",
  "probability_score": 0.28,
  "model_threshold": 0.37,
  "note": "Decision based on a strategic threshold of 0.37, not 0.50",
  "top_5_factors": {
    "revenu_mensuel": {
      "interpretation": "Primary driver — decreases resignation risk",
      "feature_value": 4883.0
    },
    "annees_dans_l_entreprise": {
      "interpretation": "Strong factor — decreases resignation risk",
      "feature_value": 10.0
    },
    "statut_marital_Célibataire": {
      "interpretation": "Moderate factor — decreases resignation risk",
      "feature_value": "encoded"
    },
    "distance_domicile_travail": {
      "interpretation": "Contributing factor — decreases resignation risk",
      "feature_value": 7.0
    },
    "overall_satisfaction": {
      "interpretation": "Notable factor — decreases resignation risk",
      "feature_value": 2.75
    }
  }
}

Response Schema

Field Type Description
statut_employe string Human-readable verdict: "LOW probability of resigning" or "HIGH probability of resigning"
probability_score float Raw model probability of resignation (0–1), rounded to 2 decimal places
model_threshold float Decision threshold applied — 0.37 (prediction is HIGH if score ≥ 0.37)
note string Reminder that the threshold is strategically set to 0.37, not the default 0.50
top_5_factors object Top 5 features ranked by absolute SHAP value (most influential first)

Each entry in top_5_factors is keyed by the feature name and contains:

Sub-field Type Description
interpretation string Rank label (Primary driver, Strong factor, Moderate factor, Contributing factor, Notable factor) followed by the direction of impact (increases or decreases resignation risk)
feature_value float | string The actual value of that feature for this employee. Returns "encoded" for one-hot encoded categorical features (e.g. statut_marital_Célibataire) whose original value is lost after encoding

The interactive Swagger UI (auto-generated by FastAPI) is available at /docs on any running instance.

(back to top)


Running Tests

The test suite covers unit tests for feature engineering helpers and functional tests for all API endpoints, including valid predictions, input validation (HTTP 422), employee lookup by ID, and warning logging.

Setting TESTING=true disables database writes so the test suite runs without a live database connection. DATABASE_URL is still required to import the app (SQLAlchemy initialises at import time).

TESTING=true DATABASE_URL=postgresql://user:password@host:5432/dbname pytest tests/ -v --cov=app --cov-report=term-missing

The same configuration is used in CI (GitHub Actions): DATABASE_URL is injected from a repository secret, and TESTING is hardcoded to "true" directly in the workflow file.

(back to top)


Contact

Kevin Lebayle — GitHub @KL38

Project Link: https://github.com/KL38/OC_P5_v2 Live Demo: https://huggingface.co/spaces/KLEB38/OC_P5

(back to top)


Acknowledgments

(back to top)