{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Library Practice: Pandas\n",
"\n",
"Pandas is the primary tool for data manipulation and analysis in Python. It provides data structures like `DataFrame` and `Series` that make working with tabular data easy.\n",
"\n",
"### Resources:\n",
"Refer to the **[Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)** on your hub for data cleaning and transformation concepts using Pandas.\n",
"\n",
"### Objectives:\n",
"1. **DataFrame Creation**: Building dataframes from dictionaries.\n",
"2. **Selection & Filtering**: Querying data.\n",
"3. **Grouping & Aggregation**: Summarizing data.\n",
"4. **Handling Missing Data**: Methods to clean datasets.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. DataFrame Basics\n",
"\n",
"### Task 1: Create a DataFrame\n",
"Create a DataFrame from a dictionary with columns: `Name`, `Age`, and `City` for 5 people."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"data = {\n",
" 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],\n",
" 'Age': [24, 30, 22, 35, 29],\n",
" 'City': ['NY', 'LA', 'Chicago', 'Houston', 'Miami']\n",
"}\n",
"df = pd.DataFrame(data)\n",
"print(df)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Selection and Filtering\n",
"\n",
"### Task 2: Conditional Selection\n",
"Using the DataFrame from Task 1, select all rows where `Age` is greater than 25."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"filtered_df = df[df['Age'] > 25]\n",
"print(filtered_df)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. GroupBy and Aggregation\n",
"\n",
"### Task 3: Grouping Data\n",
"Create a DataFrame with `Category` and `Sales`. Group by `Category` and calculate the average `Sales`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sales_data = {\n",
" 'Category': ['Electronics', 'Clothing', 'Electronics', 'Home', 'Clothing', 'Home'],\n",
" 'Sales': [100, 50, 200, 300, 40, 150]\n",
"}\n",
"sales_df = pd.DataFrame(sales_data)\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"result = sales_df.groupby('Category').mean()\n",
"print(result)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Merging and Joining\n",
"\n",
"### Task 4: Merge DataFrames\n",
"Merge two DataFrames on a common `ID` column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value1': ['A', 'B', 'C']})\n",
"df2 = pd.DataFrame({'ID': [2, 3, 4], 'Value2': ['X', 'Y', 'Z']})\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"merged = pd.merge(df1, df2, on='ID', how='inner')\n",
"print(merged)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Excellent Pandas Practice! \n",
"You're becoming a data manipulator pro.\n",
"Next: **Matplotlib & Seaborn Practice**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}