---
title: HR Attrition Prediction API - Futurisys
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
Table of Contents
-
About The Project
-
Getting Started
- Usage
- Running Tests
- Contact
- Acknowledgments
---
## About The Project
**Futurisys** is a tech consulting firm used as the business context for this OpenClassrooms Data Science project (Project 5). The objective is to help HR departments proactively identify employees at risk of attrition before they leave.
This project delivers a complete, containerised ML system:
- A **Gradient Boosting classifier** trained on HR data and serialised into a Scikit-Learn pipeline with automated preprocessing
- **Custom feature engineering** — overall satisfaction score, expertise inconsistency (department vs. study domain mismatch), managerial stagnation, and development stagnation signals
- A **custom classification threshold of 0.37** (tuned for recall on the attrition class rather than the default 0.50)
- A **FastAPI REST API** with full input validation via Pydantic, exposing two prediction modes: submit raw employee data (`POST /predict`) or look up an existing employee by ID (`GET /predict/{id_employee}`)
- **SHAP-based explainability** — every prediction is accompanied by the top 5 most influential features and their direction of impact
- **PostgreSQL prediction logging** — every prediction (inputs, result, SHAP factors) is automatically stored in a `predictions_log` table for auditability
- A complete **CI/CD pipeline** via GitHub Actions that runs the full test suite on every push before deploying to Hugging Face Spaces
(back to top)
### Built With
* [![Python][Python-badge]][Python-url]
* [![FastAPI][FastAPI-badge]][FastAPI-url]
* [![scikit-learn][sklearn-badge]][sklearn-url]
* [![pandas][pandas-badge]][pandas-url]
* [![SHAP][SHAP-badge]][SHAP-url]
* [![PostgreSQL][Postgres-badge]][Postgres-url]
* [![SQLAlchemy][SQLAlchemy-badge]][SQLAlchemy-url]
* [![Docker][Docker-badge]][Docker-url]
* [![GitHub Actions][GHActions-badge]][GHActions-url]
(back to top)
---
## Getting Started
### Prerequisites
- Python 3.13+
- A PostgreSQL database (local or remote)
- Docker (for containerised deployment)
- Git
### Installation
#### Option 1 — Run locally with Python
1. Clone the repository
```sh
git clone https://github.com/KL38/OC_P5_v2.git
cd OC_P5_v2
```
2. Install dependencies
```sh
pip install -r requirements.txt
```
3. Create a `.env` file at the project root with your database connection string
```sh
DATABASE_URL=postgresql://user:password@host:5432/dbname
```
4. Start the API
```sh
uvicorn app.main:app --reload
```
The API is available at `http://127.0.0.1:8000`. The interactive Swagger UI is at `http://127.0.0.1:8000/docs`.
#### Option 2 — Run with Docker
1. Build the image
```sh
docker build -t futurisys-api .
```
2. Run the container, passing the database URL as an environment variable
```sh
docker run -p 7860:7860 -e DATABASE_URL=postgresql://user:password@host:5432/dbname futurisys-api
```
The API is available at `http://localhost:7860`.
(back to top)
---
## Usage
### Endpoints
| Method | Endpoint | Description |
|--------|---------------------------|--------------------------------------------------------------------|
| `GET` | `/` | Health check — returns a welcome message |
| `GET` | `/predict/{id_employee}` | Fetches an employee from the database by ID and returns a prediction |
| `POST` | `/predict` | Predicts employee attrition risk from a submitted JSON payload |
### `POST /predict` — Input Schema
All fields use their **French alias** as the JSON key.
| JSON key | Type | Accepted values / notes |
|-----------------------------------------|--------|---------------------------------------------------------------------|
| `Genre` | string | `"M"` or `"F"` |
| `Statut Marital` | string | `"Marié(e)"`, `"Célibataire"`, `"Divorcé(e)"` |
| `Département` | string | `"Consulting"`, `"Commercial"`, `"Ressources Humaines"` |
| `Poste` | string | `"Consultant"`, `"Manager"`, `"Tech Lead"`, … |
| `Domaine d'étude` | string | `"Infra & Cloud"`, `"Marketing"`, `"Ressources Humaines"`, … |
| `Fréquence de déplacement` | string | `"Aucun"`, `"Occasionnel"`, `"Frequent"` |
| `Heures supplémentaires` | string | `"Oui"` or `"Non"` |
| `Âge` | int | |
| `Revenu mensuel` | int | |
| `Nombre d'expériences précédentes` | int | |
| `Années d'expérience totale` | int | |
| `Années dans l'entreprise` | int | |
| `Années dans le poste actuel` | int | |
| `Nombre de formations suivies` | int | |
| `Distance domicile-travail` | int | |
| `Niveau d'éducation` | int | |
| `Années depuis la dernière promotion` | int | |
| `Années sous responsable actuel` | int | |
| `Satisfaction environnement` | int | 1–4 |
| `Satisfaction nature du travail` | int | 1–4 |
| `Satisfaction équipe` | int | 1–4 |
| `Satisfaction équilibre pro/perso` | int | 1–4 |
| `Note évaluation précédente` | int | 1–4 |
| `Note évaluation actuelle` | int | 1–4 |
| `Augmentation salaire précédente` | string | Percentage as string, e.g. `"18%"` |
### Example Request
```bash
curl -X POST "http://localhost:7860/predict" \
-H "Content-Type: application/json" \
-d '{
"Genre": "M",
"Statut Marital": "Marié(e)",
"Département": "Consulting",
"Poste": "Consultant",
"Domaine d'\''étude": "Infra & Cloud",
"Fréquence de déplacement": "Occasionnel",
"Heures supplémentaires": "Non",
"Âge": 32,
"Revenu mensuel": 4883,
"Nombre d'\''expériences précédentes": 1,
"Années d'\''expérience totale": 10,
"Années dans l'\''entreprise": 10,
"Années dans le poste actuel": 4,
"Nombre de formations suivies": 3,
"Distance domicile-travail": 7,
"Niveau d'\''éducation": 2,
"Années depuis la dernière promotion": 1,
"Années sous responsable actuel": 1,
"Satisfaction environnement": 4,
"Note évaluation précédente": 3,
"Satisfaction nature du travail": 3,
"Satisfaction équipe": 1,
"Satisfaction équilibre pro/perso": 3,
"Note évaluation actuelle": 3,
"Augmentation salaire précédente": "18%"
}'
```
### Example Response
```json
{
"statut_employe": "The staff has a LOW probability of resigning",
"probability_score": 0.28,
"model_threshold": 0.37,
"note": "Decision based on a strategic threshold of 0.37, not 0.50",
"top_5_factors": {
"revenu_mensuel": {
"interpretation": "Primary driver — decreases resignation risk",
"feature_value": 4883.0
},
"annees_dans_l_entreprise": {
"interpretation": "Strong factor — decreases resignation risk",
"feature_value": 10.0
},
"statut_marital_Célibataire": {
"interpretation": "Moderate factor — decreases resignation risk",
"feature_value": "encoded"
},
"distance_domicile_travail": {
"interpretation": "Contributing factor — decreases resignation risk",
"feature_value": 7.0
},
"overall_satisfaction": {
"interpretation": "Notable factor — decreases resignation risk",
"feature_value": 2.75
}
}
}
```
### Response Schema
| Field | Type | Description |
|-------------------|--------|-----------------------------------------------------------------------------|
| `statut_employe` | string | Human-readable verdict: `"LOW probability of resigning"` or `"HIGH probability of resigning"` |
| `probability_score` | float | Raw model probability of resignation (0–1), rounded to 2 decimal places |
| `model_threshold` | float | Decision threshold applied — `0.37` (prediction is `HIGH` if score ≥ 0.37) |
| `note` | string | Reminder that the threshold is strategically set to 0.37, not the default 0.50 |
| `top_5_factors` | object | Top 5 features ranked by absolute SHAP value (most influential first) |
Each entry in `top_5_factors` is keyed by the **feature name** and contains:
| Sub-field | Type | Description |
|-------------------|----------------|-----------------------------------------------------------------------------|
| `interpretation` | string | Rank label (`Primary driver`, `Strong factor`, `Moderate factor`, `Contributing factor`, `Notable factor`) followed by the direction of impact (`increases` or `decreases resignation risk`) |
| `feature_value` | float \| string | The actual value of that feature for this employee. Returns `"encoded"` for one-hot encoded categorical features (e.g. `statut_marital_Célibataire`) whose original value is lost after encoding |
> The interactive Swagger UI (auto-generated by FastAPI) is available at `/docs` on any running instance.
(back to top)
---
## Running Tests
The test suite covers unit tests for feature engineering helpers and functional tests for all API endpoints, including valid predictions, input validation (HTTP 422), employee lookup by ID, and warning logging.
Setting `TESTING=true` disables database writes so the test suite runs without a live database connection. `DATABASE_URL` is still required to import the app (SQLAlchemy initialises at import time).
```sh
TESTING=true DATABASE_URL=postgresql://user:password@host:5432/dbname pytest tests/ -v --cov=app --cov-report=term-missing
```
The same configuration is used in CI (GitHub Actions): `DATABASE_URL` is injected from a repository secret, and `TESTING` is hardcoded to `"true"` directly in the workflow file.
(back to top)
---
## Contact
Kevin Lebayle — [GitHub @KL38](https://github.com/KL38)
Project Link: [https://github.com/KL38/OC_P5_v2](https://github.com/KL38/OC_P5_v2)
Live Demo: [https://huggingface.co/spaces/KLEB38/OC_P5](https://huggingface.co/spaces/KLEB38/OC_P5)
(back to top)
---
## Acknowledgments
* [FastAPI](https://fastapi.tiangolo.com/) — high-performance async web framework
* [SHAP](https://shap.readthedocs.io/) — model explainability
* [scikit-learn](https://scikit-learn.org/) — ML pipeline and Gradient Boosting classifier
* [Hugging Face Spaces](https://huggingface.co/spaces) — Docker-based free deployment
* [othneildrew/Best-README-Template](https://github.com/othneildrew/Best-README-Template) — README structure
(back to top)
[contributors-shield]: https://img.shields.io/github/contributors/KL38/OC_P5_v2.svg?style=for-the-badge
[contributors-url]: https://github.com/KL38/OC_P5_v2/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/KL38/OC_P5_v2.svg?style=for-the-badge
[forks-url]: https://github.com/KL38/OC_P5_v2/network/members
[stars-shield]: https://img.shields.io/github/stars/KL38/OC_P5_v2.svg?style=for-the-badge
[stars-url]: https://github.com/KL38/OC_P5_v2/stargazers
[issues-shield]: https://img.shields.io/github/issues/KL38/OC_P5_v2.svg?style=for-the-badge
[issues-url]: https://github.com/KL38/OC_P5_v2/issues
[Python-badge]: https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white
[Python-url]: https://www.python.org/
[FastAPI-badge]: https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white
[FastAPI-url]: https://fastapi.tiangolo.com/
[sklearn-badge]: https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white
[sklearn-url]: https://scikit-learn.org/
[pandas-badge]: https://img.shields.io/badge/pandas-150458?style=for-the-badge&logo=pandas&logoColor=white
[pandas-url]: https://pandas.pydata.org/
[SHAP-badge]: https://img.shields.io/badge/SHAP-FF6B6B?style=for-the-badge&logoColor=white
[SHAP-url]: https://shap.readthedocs.io/
[Postgres-badge]: https://img.shields.io/badge/PostgreSQL-4169E1?style=for-the-badge&logo=postgresql&logoColor=white
[Postgres-url]: https://www.postgresql.org/
[SQLAlchemy-badge]: https://img.shields.io/badge/SQLAlchemy-D71F00?style=for-the-badge&logo=sqlalchemy&logoColor=white
[SQLAlchemy-url]: https://www.sqlalchemy.org/
[Docker-badge]: https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white
[Docker-url]: https://www.docker.com/
[GHActions-badge]: https://img.shields.io/badge/GitHub_Actions-2088FF?style=for-the-badge&logo=github-actions&logoColor=white
[GHActions-url]: https://github.com/features/actions