{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 02 - Linear Regression\n",
                "\n",
                "In this module, we will explore **Linear Regression**, one of the most fundamental algorithms in Machine Learning used for predicting continuous values.\n",
                "\n",
                "### Resources:\n",
                "Check out the [Mathematics for Data Science](https://aashishgarg13.github.io/DataScience/math-ds-complete/) section on your hub to understand the Linear Algebra and Optimization (Gradient Descent) behind Linear Regression.\n",
                "\n",
                "### Objectives:\n",
                "1. **Preprocessing**: Prepare numeric and categorical features.\n",
                "2. **Splitting**: Divide data into training and testing sets.\n",
                "3. **Training**: Fit a Linear Regression model.\n",
                "4. **Evaluation**: Use metrics like R-squared and Root Mean Squared Error (RMSE).\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use the `diamonds` dataset to predict the `price` of diamonds based on their features."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "import numpy as np\n",
                "import matplotlib.pyplot as plt\n",
                "import seaborn as sns\n",
                "from sklearn.model_selection import train_test_split\n",
                "from sklearn.linear_model import LinearRegression\n",
                "from sklearn.metrics import mean_squared_error, r2_score\n",
                "\n",
                "# Load dataset\n",
                "df = sns.load_dataset('diamonds')\n",
                "print(\"Dataset Shape:\", df.shape)\n",
                "df.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Preprocessing\n",
                "\n",
                "### Task 1: Encode Categorical Variables\n",
                "The columns `cut`, `color`, and `clarity` are categorical. Use One-Hot Encoding to convert them."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "df_encoded = pd.get_dummies(df, columns=['cut', 'color', 'clarity'], drop_first=True)\n",
                "df_encoded.head()\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Task 2: Features and Target Selection\n",
                "Define `X` (features) and `y` (target: 'price')."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "X = df_encoded.drop('price', axis=1)\n",
                "y = df_encoded['price']\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Task 3: Train-Test Split\n",
                "Split the data into 80% training and 20% testing."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
                "print(f\"Train size: {X_train.shape[0]}, Test size: {X_test.shape[0]}\")\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Modeling\n",
                "\n",
                "### Task 4: Training the Model\n",
                "Create a LinearRegression object and fit it on the training data."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "model = LinearRegression()\n",
                "model.fit(X_train, y_train)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Task 5: Making Predictions\n",
                "Predict the values for the test set."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "y_pred = model.predict(X_test)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 4. Evaluation\n",
                "\n",
                "### Task 6: Error Metrics\n",
                "Calculate R2 Score and RMSE."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "r2 = r2_score(y_test, y_pred)\n",
                "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n",
                "\n",
                "print(f\"R2 Score: {r2:.4f}\")\n",
                "print(f\"RMSE: {rmse:.2f}\")\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Well Done! \n",
                "You have successfully built and evaluated a Linear Regression model. \n",
                "Next module: **Logistic Regression** for classification!"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.8.0"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}