sks01dev commited on
Commit
4108ad2
Β·
1 Parent(s): f6a8417

Create readme.md

Browse files
Files changed (1) hide show
  1. Week 3/readme.md +100 -0
Week 3/readme.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Machine Learning Zoomcamp 2025 - Homework 3
2
+
3
+ [![Python](https://img.shields.io/badge/Python-3.11-blue?logo=python&logoColor=white)](https://www.python.org/)
4
+ [![Pandas](https://img.shields.io/badge/Pandas-1.5.3-orange?logo=pandas&logoColor=white)](https://pandas.pydata.org/)
5
+ [![Scikit-Learn](https://img.shields.io/badge/Scikit--Learn-1.3.1-green?logo=scikit-learn&logoColor=white)](https://scikit-learn.org/stable/)
6
+ [![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-yellow?logo=jupyter&logoColor=white)](https://jupyter.org/)
7
+
8
+ ---
9
+
10
+ ## Homework 3: Machine Learning for Classification
11
+
12
+ This repository contains solutions for **Homework 3** of **Machine Learning Zoomcamp 2025**, focused on **classification tasks** using the Bank Marketing dataset.
13
+
14
+ ---
15
+
16
+ ## πŸ“‚ Project Overview
17
+
18
+ - **Dataset:** [Bank Marketing Dataset](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv)
19
+ - **Target variable:** `converted` (whether the client signed up)
20
+ - **Objective:** Data preprocessing, exploratory analysis, feature selection, and training logistic regression models (regularized and unregularized).
21
+
22
+ **Tech Stack:**
23
+ - **Python 3.11** – core programming language
24
+ - **Pandas** – data manipulation
25
+ - **NumPy** – numerical operations
26
+ - **Scikit-Learn** – machine learning models, feature selection, evaluation
27
+ - **Jupyter Notebook** – interactive coding and documentation
28
+
29
+ ---
30
+
31
+ ## πŸ”Ή Questions & Answers
32
+
33
+ | Question | Task | Answer |
34
+ |----------|------|--------|
35
+ | 1 | Mode of `industry` | `retail` |
36
+ | 2 | Biggest correlation (numerical features) | `annual_income` and `interaction_count` |
37
+ | 3 | Biggest mutual information (categorical features) | `lead_source` |
38
+ | 4 | Logistic regression validation accuracy | 0.74 |
39
+ | 5 | Least useful feature (feature elimination) | `lead_score` |
40
+ | 6 | Best `C` value for regularized logistic regression | 1 |
41
+
42
+ ---
43
+
44
+ ## πŸ“Œ Approach / Key Steps
45
+
46
+ 1. **Data Cleaning & Preparation**
47
+ - Filled missing values: categorical β†’ `'NA'`, numerical β†’ `0.0`
48
+ - Verified feature types and correlations
49
+
50
+ 2. **Exploratory Analysis**
51
+ - Mode of categorical variables
52
+ - Correlation matrix for numerical features
53
+
54
+ 3. **Feature Selection**
55
+ - Calculated mutual information for categorical variables using `mutual_info_score`
56
+ - Identified least useful features via feature elimination
57
+
58
+ 4. **Model Training**
59
+ - Logistic Regression with one-hot encoded categorical variables
60
+ - Regularized logistic regression with hyperparameter tuning (`C` values)
61
+
62
+ ---
63
+
64
+ ## πŸ“ˆ Results
65
+
66
+ - Baseline logistic regression accuracy: **0.74**
67
+ - Least useful feature: **`lead_score`**
68
+ - Best regularization parameter `C`: **1**
69
+
70
+ ---
71
+
72
+ ## βš™ How to Run
73
+
74
+ 1. Clone the repository:
75
+ ```bash
76
+ git clone https://github.com/yourusername/ml-zoomcamp-hw3.git
77
+ ```
78
+
79
+ 2. Install requirements:
80
+ ```bash
81
+ pip install -r requirements.txt
82
+ ```
83
+
84
+ 3. Open the Jupyter Notebook and run cells sequentially:
85
+ ```bash
86
+ jupyter notebook
87
+ ```
88
+
89
+ ---
90
+
91
+ ## πŸ“š References
92
+
93
+ - [Bank Marketing Dataset](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv)
94
+ - [Scikit-Learn Documentation](https://scikit-learn.org/stable/)
95
+ - [Pandas Documentation](https://pandas.pydata.org/)
96
+ - [NumPy Documentation](https://numpy.org/)
97
+ - [Jupyter Notebook Documentation](https://jupyter.org/)
98
+
99
+ ---
100
+