Spaces:

AashishAIHub
/

DataScience

Running

File size: 6,778 Bytes

854c114

{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Python Library Practice: Pandas\n",
                "\n",
                "Pandas is the primary tool for data manipulation and analysis in Python. It provides data structures like `DataFrame` and `Series` that make working with tabular data easy.\n",
                "\n",
                "### Resources:\n",
                "Refer to the **[Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)** on your hub for data cleaning and transformation concepts using Pandas.\n",
                "\n",
                "### Objectives:\n",
                "1. **DataFrame Creation**: Building dataframes from dictionaries.\n",
                "2. **Selection & Filtering**: Querying data.\n",
                "3. **Grouping & Aggregation**: Summarizing data.\n",
                "4. **Handling Missing Data**: Methods to clean datasets.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. DataFrame Basics\n",
                "\n",
                "### Task 1: Create a DataFrame\n",
                "Create a DataFrame from a dictionary with columns: `Name`, `Age`, and `City` for 5 people."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "\n",
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "data = {\n",
                "    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],\n",
                "    'Age': [24, 30, 22, 35, 29],\n",
                "    'City': ['NY', 'LA', 'Chicago', 'Houston', 'Miami']\n",
                "}\n",
                "df = pd.DataFrame(data)\n",
                "print(df)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Selection and Filtering\n",
                "\n",
                "### Task 2: Conditional Selection\n",
                "Using the DataFrame from Task 1, select all rows where `Age` is greater than 25."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "filtered_df = df[df['Age'] > 25]\n",
                "print(filtered_df)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. GroupBy and Aggregation\n",
                "\n",
                "### Task 3: Grouping Data\n",
                "Create a DataFrame with `Category` and `Sales`. Group by `Category` and calculate the average `Sales`."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "sales_data = {\n",
                "    'Category': ['Electronics', 'Clothing', 'Electronics', 'Home', 'Clothing', 'Home'],\n",
                "    'Sales': [100, 50, 200, 300, 40, 150]\n",
                "}\n",
                "sales_df = pd.DataFrame(sales_data)\n",
                "\n",
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "result = sales_df.groupby('Category').mean()\n",
                "print(result)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 4. Merging and Joining\n",
                "\n",
                "### Task 4: Merge DataFrames\n",
                "Merge two DataFrames on a common `ID` column."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value1': ['A', 'B', 'C']})\n",
                "df2 = pd.DataFrame({'ID': [2, 3, 4], 'Value2': ['X', 'Y', 'Z']})\n",
                "\n",
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "merged = pd.merge(df1, df2, on='ID', how='inner')\n",
                "print(merged)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Excellent Pandas Practice! \n",
                "You're becoming a data manipulator pro.\n",
                "Next: **Matplotlib & Seaborn Practice**."
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}