sks01dev commited on
Commit
0226ed0
·
verified ·
1 Parent(s): aeee3af

Delete Week 4

Browse files
Files changed (2) hide show
  1. Week 4/Week_4_Evaluation.ipynb +0 -0
  2. Week 4/readme.md +0 -63
Week 4/Week_4_Evaluation.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
Week 4/readme.md DELETED
@@ -1,63 +0,0 @@
1
- # Lead Scoring with Bank Marketing Dataset
2
-
3
- [![Python](https://img.shields.io/badge/Python-3.12-blue?logo=python&logoColor=white)](https://www.python.org/)
4
- [![Scikit-Learn](https://img.shields.io/badge/scikit--learn-1.3.2-orange?logo=scikit-learn&logoColor=white)](https://scikit-learn.org/)
5
- [![Jupyter Notebook](https://img.shields.io/badge/Jupyter-Notebook-orange?logo=jupyter&logoColor=white)](https://jupyter.org/)
6
-
7
- ---
8
-
9
- ## Overview
10
-
11
- This notebook demonstrates building a **lead scoring model** using the Bank Marketing dataset. The goal is to predict whether a client will **convert** (sign up for a service) based on various features.
12
-
13
- We cover:
14
-
15
- 1. Data preparation and handling missing values.
16
- 2. Feature importance using ROC AUC for numerical variables.
17
- 3. Logistic regression modeling with **one-hot encoding**.
18
- 4. Precision, recall, and F1 score analysis to select thresholds.
19
- 5. 5-fold cross-validation to check model stability.
20
- 6. Hyperparameter tuning to select the best regularization parameter.
21
-
22
- ---
23
-
24
- ## Key Results
25
-
26
- - **Best numerical feature (ROC AUC):** `number_of_courses_viewed`
27
- - **Validation AUC:** `0.794`
28
- - **Threshold where precision ≈ recall:** `0.59`
29
- - **Threshold with max F1:** `0.47`
30
- - **Standard deviation of AUC across folds:** `0.01`
31
- - **Best regularization parameter C:** `0.001`
32
-
33
- ---
34
-
35
- ## Lessons Learned
36
-
37
- - ROC AUC can help identify predictive features even before modeling.
38
- - Logistic regression combined with one-hot encoding provides a strong baseline.
39
- - Threshold tuning is crucial for balancing precision and recall based on business needs.
40
- - Cross-validation confirms the robustness of the model and prevents overfitting.
41
- - Hyperparameter tuning improves model performance and reliability.
42
-
43
- ---
44
-
45
- ## Environment
46
-
47
- - Python 3.12
48
- - Jupyter Notebook
49
- - Libraries: `pandas`, `numpy`, `scikit-learn`, `matplotlib`, `seaborn`
50
-
51
- ---
52
-
53
- ## Dataset
54
-
55
- Bank Marketing dataset used in this project is publicly available:
56
- [Bank Marketing Dataset CSV](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv)
57
-
58
- ---
59
-
60
- ## Author
61
-
62
- Created as part of **ML Zoomcamp 2025 Homework 4**.
63
-