{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 02 - Linear Regression\n", "\n", "In this module, we will explore **Linear Regression**, one of the most fundamental algorithms in Machine Learning used for predicting continuous values.\n", "\n", "### Resources:\n", "Check out the [Mathematics for Data Science](https://aashishgarg13.github.io/DataScience/math-ds-complete/) section on your hub to understand the Linear Algebra and Optimization (Gradient Descent) behind Linear Regression.\n", "\n", "### Objectives:\n", "1. **Preprocessing**: Prepare numeric and categorical features.\n", "2. **Splitting**: Divide data into training and testing sets.\n", "3. **Training**: Fit a Linear Regression model.\n", "4. **Evaluation**: Use metrics like R-squared and Root Mean Squared Error (RMSE).\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "We will use the `diamonds` dataset to predict the `price` of diamonds based on their features." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error, r2_score\n", "\n", "# Load dataset\n", "df = sns.load_dataset('diamonds')\n", "print(\"Dataset Shape:\", df.shape)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Preprocessing\n", "\n", "### Task 1: Encode Categorical Variables\n", "The columns `cut`, `color`, and `clarity` are categorical. Use One-Hot Encoding to convert them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "df_encoded = pd.get_dummies(df, columns=['cut', 'color', 'clarity'], drop_first=True)\n", "df_encoded.head()\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2: Features and Target Selection\n", "Define `X` (features) and `y` (target: 'price')." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "X = df_encoded.drop('price', axis=1)\n", "y = df_encoded['price']\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 3: Train-Test Split\n", "Split the data into 80% training and 20% testing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "print(f\"Train size: {X_train.shape[0]}, Test size: {X_test.shape[0]}\")\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Modeling\n", "\n", "### Task 4: Training the Model\n", "Create a LinearRegression object and fit it on the training data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "model = LinearRegression()\n", "model.fit(X_train, y_train)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 5: Making Predictions\n", "Predict the values for the test set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "y_pred = model.predict(X_test)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Evaluation\n", "\n", "### Task 6: Error Metrics\n", "Calculate R2 Score and RMSE." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "r2 = r2_score(y_test, y_pred)\n", "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", "\n", "print(f\"R2 Score: {r2:.4f}\")\n", "print(f\"RMSE: {rmse:.2f}\")\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Well Done! \n", "You have successfully built and evaluated a Linear Regression model. \n", "Next module: **Logistic Regression** for classification!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }