{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Library Practice: Pandas\n", "\n", "Pandas is the primary tool for data manipulation and analysis in Python. It provides data structures like `DataFrame` and `Series` that make working with tabular data easy.\n", "\n", "### Resources:\n", "Refer to the **[Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)** on your hub for data cleaning and transformation concepts using Pandas.\n", "\n", "### Objectives:\n", "1. **DataFrame Creation**: Building dataframes from dictionaries.\n", "2. **Selection & Filtering**: Querying data.\n", "3. **Grouping & Aggregation**: Summarizing data.\n", "4. **Handling Missing Data**: Methods to clean datasets.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. DataFrame Basics\n", "\n", "### Task 1: Create a DataFrame\n", "Create a DataFrame from a dictionary with columns: `Name`, `Age`, and `City` for 5 people." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "data = {\n", " 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],\n", " 'Age': [24, 30, 22, 35, 29],\n", " 'City': ['NY', 'LA', 'Chicago', 'Houston', 'Miami']\n", "}\n", "df = pd.DataFrame(data)\n", "print(df)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Selection and Filtering\n", "\n", "### Task 2: Conditional Selection\n", "Using the DataFrame from Task 1, select all rows where `Age` is greater than 25." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "filtered_df = df[df['Age'] > 25]\n", "print(filtered_df)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. GroupBy and Aggregation\n", "\n", "### Task 3: Grouping Data\n", "Create a DataFrame with `Category` and `Sales`. Group by `Category` and calculate the average `Sales`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sales_data = {\n", " 'Category': ['Electronics', 'Clothing', 'Electronics', 'Home', 'Clothing', 'Home'],\n", " 'Sales': [100, 50, 200, 300, 40, 150]\n", "}\n", "sales_df = pd.DataFrame(sales_data)\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "result = sales_df.groupby('Category').mean()\n", "print(result)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Merging and Joining\n", "\n", "### Task 4: Merge DataFrames\n", "Merge two DataFrames on a common `ID` column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value1': ['A', 'B', 'C']})\n", "df2 = pd.DataFrame({'ID': [2, 3, 4], 'Value2': ['X', 'Y', 'Z']})\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "merged = pd.merge(df1, df2, on='ID', how='inner')\n", "print(merged)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Excellent Pandas Practice! \n", "You're becoming a data manipulator pro.\n", "Next: **Matplotlib & Seaborn Practice**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }