KLEB38 commited on
Commit
f67be28
·
1 Parent(s): 6b7677b

feat/update README file

Browse files
Files changed (1) hide show
  1. README.md +287 -34
README.md CHANGED
@@ -6,58 +6,311 @@ sdk: docker
6
  pinned: false
7
 
8
  ---
9
- # HR Attrition Prediction API - Futurisys
10
 
11
- This project provides a professional-grade REST API designed to predict employee attrition for **Futurisys**.
12
- It uses a Machine Learning pipeline to analyze employee data and provide actionable insights for HR departments.
 
 
13
 
14
- ## Project Overview
15
- The objective is to identify employees at risk of leaving the company by analyzing HR features.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- **Key Features:**
18
- - **Machine Learning Pipeline:** A robust model (Gradient Boosting/Random Forest) integrated with automated preprocessing.
19
- - **FastAPI Framework:** High-performance API with built-in validation and asynchronous support.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ---
22
 
23
- ## Project Structure
24
 
25
- ```text
26
- .
27
- ├── app/
28
- │ ├── main.py # Core API logic and Pydantic schemas
29
- │ └── pipeline_rh.joblib # Serialized Scikit-Learn pipeline (Model + Scalers)
30
- ├── notebooks/ # Research, EDA, and model training notebooks
31
- ├── .gitignore # Ensures clean version control by ignoring temp files
32
- ├── requirements.txt # List of Python dependencies
33
- └── README.md # Project documentation
34
 
35
- ```
36
- ## Installation & Setup
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- 1. Prerequisites
39
 
40
- Python 3.8+
41
- Git
42
 
43
- 3. Clone the Repository
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
- git clone <your-repository-url>
46
- cd <your-project-folder>
47
 
48
- 3. Install dependencies
 
 
 
 
 
 
 
 
49
 
50
- pip install -r requirements.txt
 
 
51
 
52
  ## Usage
53
 
54
- ### Running the API
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
- ## API Endpoints
57
 
58
- GET /
 
 
 
 
59
 
60
- POST /predict
61
 
62
- ## Author
63
- Kevin L. - Data Science & Machine Learning Student
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  pinned: false
7
 
8
  ---
9
+ <a id="readme-top"></a>
10
 
11
+ [![Contributors][contributors-shield]][contributors-url]
12
+ [![Forks][forks-shield]][forks-url]
13
+ [![Stargazers][stars-shield]][stars-url]
14
+ [![Issues][issues-shield]][issues-url]
15
 
16
+ <br />
17
+ <div align="center">
18
+ <h3 align="center">HR Attrition Prediction API — Futurisys</h3>
19
+ <p align="center">
20
+ A production-grade REST API that predicts employee attrition using a Gradient Boosting pipeline with SHAP explainability, deployed on Hugging Face Spaces.
21
+ <br />
22
+ <a href="https://github.com/KL38/OC_P5_v2"><strong>Explore the docs »</strong></a>
23
+ <br />
24
+ <br />
25
+ <a href="https://huggingface.co/spaces/KLEB38/OC_P5">View Live Demo</a>
26
+ &middot;
27
+ <a href="https://github.com/KL38/OC_P5_v2/issues/new?labels=bug&template=bug-report---.md">Report Bug</a>
28
+ &middot;
29
+ <a href="https://github.com/KL38/OC_P5_v2/issues/new?labels=enhancement&template=feature-request---.md">Request Feature</a>
30
+ </p>
31
+ </div>
32
 
33
+ <details>
34
+ <summary>Table of Contents</summary>
35
+ <ol>
36
+ <li>
37
+ <a href="#about-the-project">About The Project</a>
38
+ <ul>
39
+ <li><a href="#built-with">Built With</a></li>
40
+ </ul>
41
+ </li>
42
+ <li>
43
+ <a href="#getting-started">Getting Started</a>
44
+ <ul>
45
+ <li><a href="#prerequisites">Prerequisites</a></li>
46
+ <li><a href="#installation">Installation</a></li>
47
+ </ul>
48
+ </li>
49
+ <li><a href="#usage">Usage</a></li>
50
+ <li><a href="#running-tests">Running Tests</a></li>
51
+ <li><a href="#contact">Contact</a></li>
52
+ <li><a href="#acknowledgments">Acknowledgments</a></li>
53
+ </ol>
54
+ </details>
55
 
56
  ---
57
 
58
+ ## About The Project
59
 
60
+ **Futurisys** is a tech consulting firm used as the business context for this OpenClassrooms Data Science project (Project 5). The objective is to help HR departments proactively identify employees at risk of attrition before they leave.
 
 
 
 
 
 
 
 
61
 
62
+ This project delivers a complete, containerised ML system:
63
+
64
+ - A **Gradient Boosting classifier** trained on HR data and serialised into a Scikit-Learn pipeline with automated preprocessing
65
+ - **Custom feature engineering** — overall satisfaction score, expertise inconsistency (department vs. study domain mismatch), managerial stagnation, and development stagnation signals
66
+ - A **custom classification threshold of 0.37** (tuned for recall on the attrition class rather than the default 0.50)
67
+ - A **FastAPI REST API** with full input validation via Pydantic
68
+ - **SHAP-based explainability** — every prediction is accompanied by the top 5 most influential features and their direction of impact
69
+ - A complete **CI/CD pipeline** via GitHub Actions that automatically deploys to Hugging Face Spaces on every push to `main`
70
+
71
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
72
+
73
+ ### Built With
74
+
75
+ * [![Python][Python-badge]][Python-url]
76
+ * [![FastAPI][FastAPI-badge]][FastAPI-url]
77
+ * [![scikit-learn][sklearn-badge]][sklearn-url]
78
+ * [![pandas][pandas-badge]][pandas-url]
79
+ * [![SHAP][SHAP-badge]][SHAP-url]
80
+ * [![Docker][Docker-badge]][Docker-url]
81
+ * [![GitHub Actions][GHActions-badge]][GHActions-url]
82
+
83
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
84
+
85
+ ---
86
+
87
+ ## Getting Started
88
+
89
+ ### Prerequisites
90
+
91
+ - Python 3.13+
92
+ - Docker (for containerised deployment)
93
+ - Git
94
 
95
+ ### Installation
96
 
97
+ #### Option 1 — Run locally with Python
 
98
 
99
+ 1. Clone the repository
100
+ ```sh
101
+ git clone https://github.com/KL38/OC_P5_v2.git
102
+ cd OC_P5_v2
103
+ ```
104
+ 2. Install dependencies
105
+ ```sh
106
+ pip install -r requirements.txt
107
+ ```
108
+ 3. Start the API
109
+ ```sh
110
+ uvicorn app.main:app --reload
111
+ ```
112
+ The API is available at `http://127.0.0.1:8000`. The interactive Swagger UI is at `http://127.0.0.1:8000/docs`.
113
 
114
+ #### Option 2 — Run with Docker
 
115
 
116
+ 1. Build the image
117
+ ```sh
118
+ docker build -t futurisys-api .
119
+ ```
120
+ 2. Run the container
121
+ ```sh
122
+ docker run -p 7860:7860 futurisys-api
123
+ ```
124
+ The API is available at `http://localhost:7860`.
125
 
126
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
127
+
128
+ ---
129
 
130
  ## Usage
131
 
132
+ ### Endpoints
133
+
134
+ | Method | Endpoint | Description |
135
+ |--------|------------|------------------------------------------|
136
+ | `GET` | `/` | Health check — returns a welcome message |
137
+ | `POST` | `/predict` | Predicts employee attrition risk |
138
+
139
+ ### `POST /predict` — Input Schema
140
+
141
+ All fields use their **French alias** as the JSON key.
142
+
143
+ | JSON key | Type | Accepted values / notes |
144
+ |-----------------------------------------|--------|---------------------------------------------------------------------|
145
+ | `Genre` | string | `"M"` or `"F"` |
146
+ | `Statut Marital` | string | `"Marié(e)"`, `"Célibataire"`, `"Divorcé(e)"` |
147
+ | `Département` | string | `"Consulting"`, `"Commercial"`, `"Ressources Humaines"` |
148
+ | `Poste` | string | `"Consultant"`, `"Manager"`, `"Tech Lead"`, … |
149
+ | `Domaine d'étude` | string | `"Infra & Cloud"`, `"Marketing"`, `"Ressources Humaines"`, … |
150
+ | `Fréquence de déplacement` | string | `"Aucun"`, `"Occasionnel"`, `"Frequent"` |
151
+ | `Heures supplémentaires` | string | `"Oui"` or `"Non"` |
152
+ | `Âge` | int | |
153
+ | `Revenu mensuel` | int | |
154
+ | `Nombre d'expériences précédentes` | int | |
155
+ | `Années d'expérience totale` | int | |
156
+ | `Années dans l'entreprise` | int | |
157
+ | `Années dans le poste actuel` | int | |
158
+ | `Nombre de formations suivies` | int | |
159
+ | `Distance domicile-travail` | int | |
160
+ | `Niveau d'éducation` | int | |
161
+ | `Années depuis la dernière promotion` | int | |
162
+ | `Années sous responsable actuel` | int | |
163
+ | `Satisfaction environnement` | int | 1–4 |
164
+ | `Satisfaction nature du travail` | int | 1–4 |
165
+ | `Satisfaction équipe` | int | 1–4 |
166
+ | `Satisfaction équilibre pro/perso` | int | 1–4 |
167
+ | `Note évaluation précédente` | int | 1–4 |
168
+ | `Note évaluation actuelle` | int | 1–4 |
169
+ | `Augmentation salaire précédente` | string | Percentage as string, e.g. `"18%"` |
170
+
171
+ ### Example Request
172
+
173
+ ```bash
174
+ curl -X POST "http://localhost:7860/predict" \
175
+ -H "Content-Type: application/json" \
176
+ -d '{
177
+ "Genre": "M",
178
+ "Statut Marital": "Marié(e)",
179
+ "Département": "Consulting",
180
+ "Poste": "Consultant",
181
+ "Domaine d'\''étude": "Infra & Cloud",
182
+ "Fréquence de déplacement": "Occasionnel",
183
+ "Heures supplémentaires": "Non",
184
+ "Âge": 32,
185
+ "Revenu mensuel": 4883,
186
+ "Nombre d'\''expériences précédentes": 1,
187
+ "Années d'\''expérience totale": 10,
188
+ "Années dans l'\''entreprise": 10,
189
+ "Années dans le poste actuel": 4,
190
+ "Nombre de formations suivies": 3,
191
+ "Distance domicile-travail": 7,
192
+ "Niveau d'\''éducation": 2,
193
+ "Années depuis la dernière promotion": 1,
194
+ "Années sous responsable actuel": 1,
195
+ "Satisfaction environnement": 4,
196
+ "Note évaluation précédente": 3,
197
+ "Satisfaction nature du travail": 3,
198
+ "Satisfaction équipe": 1,
199
+ "Satisfaction équilibre pro/perso": 3,
200
+ "Note évaluation actuelle": 3,
201
+ "Augmentation salaire précédente": "18%"
202
+ }'
203
+ ```
204
+
205
+ ### Example Response
206
+
207
+ ```json
208
+ {
209
+ "statut_employe": "The staff has a LOW probability of resigning",
210
+ "probability_score": 0.28,
211
+ "model_threshold": 0.37,
212
+ "note": "Decision based on a strategic threshold of 0.37, not 0.50",
213
+ "top_5_factors": {
214
+ "revenu_mensuel": {
215
+ "interpretation": "Primary driver — decreases resignation risk",
216
+ "feature_value": 4883.0
217
+ },
218
+ "annees_dans_l_entreprise": {
219
+ "interpretation": "Strong factor — decreases resignation risk",
220
+ "feature_value": 10.0
221
+ },
222
+ "statut_marital_Célibataire": {
223
+ "interpretation": "Moderate factor — decreases resignation risk",
224
+ "feature_value": "encoded"
225
+ },
226
+ "distance_domicile_travail": {
227
+ "interpretation": "Contributing factor — decreases resignation risk",
228
+ "feature_value": 7.0
229
+ },
230
+ "overall_satisfaction": {
231
+ "interpretation": "Notable factor — decreases resignation risk",
232
+ "feature_value": 2.75
233
+ }
234
+ }
235
+ }
236
+ ```
237
+
238
+ ### Response Schema
239
+
240
+ | Field | Type | Description |
241
+ |-------------------|--------|-----------------------------------------------------------------------------|
242
+ | `statut_employe` | string | Human-readable verdict: `"LOW probability of resigning"` or `"HIGH probability of resigning"` |
243
+ | `probability_score` | float | Raw model probability of resignation (0–1), rounded to 2 decimal places |
244
+ | `model_threshold` | float | Decision threshold applied — `0.37` (prediction is `HIGH` if score ≥ 0.37) |
245
+ | `note` | string | Reminder that the threshold is strategically set to 0.37, not the default 0.50 |
246
+ | `top_5_factors` | object | Top 5 features ranked by absolute SHAP value (most influential first) |
247
+
248
+ Each entry in `top_5_factors` is keyed by the **feature name** and contains:
249
+
250
+ | Sub-field | Type | Description |
251
+ |-------------------|----------------|-----------------------------------------------------------------------------|
252
+ | `interpretation` | string | Rank label (`Primary driver`, `Strong factor`, `Moderate factor`, `Contributing factor`, `Notable factor`) followed by the direction of impact (`increases` or `decreases resignation risk`) |
253
+ | `feature_value` | float \| string | The actual value of that feature for this employee. Returns `"encoded"` for one-hot encoded categorical features (e.g. `statut_marital_Célibataire`) whose original value is lost after encoding |
254
+
255
+ > The interactive Swagger UI (auto-generated by FastAPI) is available at `/docs` on any running instance.
256
+
257
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
258
+
259
+ ---
260
+
261
+ ## Running Tests
262
+
263
+ The test suite covers unit tests for feature engineering helpers and functional tests for all API endpoints, including valid predictions, input validation (HTTP 422), and warning logging.
264
+
265
+ ```sh
266
+ pytest tests/ --cov=app
267
+ ```
268
+
269
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
270
+
271
+ ---
272
+
273
+ ## Contact
274
+
275
+ Kevin Lebayle — [GitHub @KL38](https://github.com/KL38)
276
+
277
+ Project Link: [https://github.com/KL38/OC_P5_v2](https://github.com/KL38/OC_P5_v2)
278
+ Live Demo: [https://huggingface.co/spaces/KLEB38/OC_P5](https://huggingface.co/spaces/KLEB38/OC_P5)
279
+
280
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
281
+
282
+ ---
283
 
284
+ ## Acknowledgments
285
 
286
+ * [FastAPI](https://fastapi.tiangolo.com/) — high-performance async web framework
287
+ * [SHAP](https://shap.readthedocs.io/) — model explainability
288
+ * [scikit-learn](https://scikit-learn.org/) — ML pipeline and Gradient Boosting classifier
289
+ * [Hugging Face Spaces](https://huggingface.co/spaces) — Docker-based free deployment
290
+ * [othneildrew/Best-README-Template](https://github.com/othneildrew/Best-README-Template) — README structure
291
 
292
+ <p align="right">(<a href="#readme-top">back to top</a>)</p>
293
 
294
+ <!-- MARKDOWN LINKS & IMAGES -->
295
+ [contributors-shield]: https://img.shields.io/github/contributors/KL38/OC_P5_v2.svg?style=for-the-badge
296
+ [contributors-url]: https://github.com/KL38/OC_P5_v2/graphs/contributors
297
+ [forks-shield]: https://img.shields.io/github/forks/KL38/OC_P5_v2.svg?style=for-the-badge
298
+ [forks-url]: https://github.com/KL38/OC_P5_v2/network/members
299
+ [stars-shield]: https://img.shields.io/github/stars/KL38/OC_P5_v2.svg?style=for-the-badge
300
+ [stars-url]: https://github.com/KL38/OC_P5_v2/stargazers
301
+ [issues-shield]: https://img.shields.io/github/issues/KL38/OC_P5_v2.svg?style=for-the-badge
302
+ [issues-url]: https://github.com/KL38/OC_P5_v2/issues
303
+ [Python-badge]: https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white
304
+ [Python-url]: https://www.python.org/
305
+ [FastAPI-badge]: https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white
306
+ [FastAPI-url]: https://fastapi.tiangolo.com/
307
+ [sklearn-badge]: https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white
308
+ [sklearn-url]: https://scikit-learn.org/
309
+ [pandas-badge]: https://img.shields.io/badge/pandas-150458?style=for-the-badge&logo=pandas&logoColor=white
310
+ [pandas-url]: https://pandas.pydata.org/
311
+ [SHAP-badge]: https://img.shields.io/badge/SHAP-FF6B6B?style=for-the-badge&logoColor=white
312
+ [SHAP-url]: https://shap.readthedocs.io/
313
+ [Docker-badge]: https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white
314
+ [Docker-url]: https://www.docker.com/
315
+ [GHActions-badge]: https://img.shields.io/badge/GitHub_Actions-2088FF?style=for-the-badge&logo=github-actions&logoColor=white
316
+ [GHActions-url]: https://github.com/features/actions