{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 01 - EDA & Feature Engineering\n", "\n", "Welcome to the first module of your Machine Learning practice! \n", "\n", "In this notebook, we will focus on the most critical part of the ML pipeline: **Understanding and Preparing your data.**\n", "\n", "### Resources:\n", "This practice guide is integrated with your [DataScience Learning Hub](https://aashishgarg13.github.io/DataScience/). Specifically, you can refer to the **Feature Engineering Guide** section on the website for interactive visual explanations of these concepts.\n", "\n", "### Objectives:\n", "1. **EDA**: Visualize distributions, correlations, and outliers.\n", "2. **Data Cleaning**: Handle missing values and data inconsistencies.\n", "3. **Feature Engineering**: Create new features and transform existing ones (Encoding, Scaling).\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Environment Setup\n", "First, let's load the necessary libraries and the dataset. We'll use the **Titanic Dataset** for this exercise." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset Shape: (891, 15)\n" ] }, { "data": { "text/html": [ "
| \n", " | survived | \n", "pclass | \n", "sex | \n", "age | \n", "sibsp | \n", "parch | \n", "fare | \n", "embarked | \n", "class | \n", "who | \n", "adult_male | \n", "deck | \n", "embark_town | \n", "alive | \n", "alone | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "3 | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "7.2500 | \n", "S | \n", "Third | \n", "man | \n", "True | \n", "NaN | \n", "Southampton | \n", "no | \n", "False | \n", "
| 1 | \n", "1 | \n", "1 | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "71.2833 | \n", "C | \n", "First | \n", "woman | \n", "False | \n", "C | \n", "Cherbourg | \n", "yes | \n", "False | \n", "
| 2 | \n", "1 | \n", "3 | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "7.9250 | \n", "S | \n", "Third | \n", "woman | \n", "False | \n", "NaN | \n", "Southampton | \n", "yes | \n", "True | \n", "
| 3 | \n", "1 | \n", "1 | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "53.1000 | \n", "S | \n", "First | \n", "woman | \n", "False | \n", "C | \n", "Southampton | \n", "yes | \n", "False | \n", "
| 4 | \n", "0 | \n", "3 | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "8.0500 | \n", "S | \n", "Third | \n", "man | \n", "True | \n", "NaN | \n", "Southampton | \n", "no | \n", "True | \n", "