{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ML Practice Series: Module 17 - Reinforcement Learning (Q-Learning)\n",
"\n",
"Welcome to Module 17! We are exploring **Reinforcement Learning** (RL). Unlike supervised learning, RL agents learn by interacting with an environment and receiving rewards or penalties.\n",
"\n",
"### Resources:\n",
"Check out the **[Q-Learning Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for a breakdown of the Bellman Equation ($Q(s,a)$) and how the Agent-Environment loop works.\n",
"\n",
"### Objectives:\n",
"1. **Agent-Environment Loop**: States, Actions, and Rewards.\n",
"2. **Exploration vs. Exploitation**: The Epsilon-Greedy strategy.\n",
"3. **Q-Table**: Learning the quality of actions.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Environment Simulation\n",
"We will implement a simple \"Grid World\" where an agent has to find a treasure while avoiding traps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"class SimpleGridWorld:\n",
" def __init__(self, size=5):\n",
" self.size = size\n",
" self.state = (0, 0)\n",
" self.goal = (size-1, size-1)\n",
" self.trap = (size//2, size//2)\n",
" \n",
" def step(self, action):\n",
" # 0=Up, 1=Down, 2=Left, 3=Right\n",
" r, c = self.state\n",
" if action == 0: r = max(0, r-1)\n",
" elif action == 1: r = min(self.size-1, r+1)\n",
" elif action == 2: c = max(0, c-1)\n",
" elif action == 3: c = min(self.size-1, c+1)\n",
" \n",
" self.state = (r, c)\n",
" \n",
" if self.state == self.goal:\n",
" return self.state, 10, True\n",
" elif self.state == self.trap:\n",
" return self.state, -5, True\n",
" return self.state, -1, False\n",
"\n",
" def reset(self):\n",
" self.state = (0, 0)\n",
" return self.state\n",
"\n",
"env = SimpleGridWorld()\n",
"print(\"Environment initialized!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Q-Learning Algorithm\n",
"\n",
"### Task 1: Training the Agent\n",
"Initialize a Q-Table (5x5x4) with zeros and train the agent for 1000 episodes using the update rule:\n",
"$Q(s, a) = Q(s, a) + \\alpha [R + \\gamma \\max Q(s', a') - Q(s, a)]$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"alpha = 0.1 # Learning rate\n",
"gamma = 0.9 # Discount factor\n",
"epsilon = 0.2 # Exploration rate\n",
"q_table = np.zeros((5, 5, 4))\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"for episode in range(1000):\n",
" state = env.reset()\n",
" done = False\n",
" \n",
" while not done:\n",
" # Choose action\n",
" if np.random.uniform(0, 1) < epsilon:\n",
" action = np.random.choice(4) # Explore\n",
" else:\n",
" action = np.argmax(q_table[state[0], state[1]]) # Exploit\n",
" \n",
" next_state, reward, done = env.step(action)\n",
" \n",
" # Update Q-table\n",
" old_value = q_table[state[0], state[1], action]\n",
" next_max = np.max(q_table[next_state[0], next_state[1]])\n",
" \n",
" new_value = old_value + alpha * (reward + gamma * next_max - old_value)\n",
" q_table[state[0], state[1], action] = new_value\n",
" \n",
" state = next_state\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Policy Visualization\n",
"\n",
"### Task 2: What did it learn?\n",
"Display the learned policy by showing the best action for each cell in the grid."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"policy = np.argmax(q_table, axis=2)\n",
"print(\"Learned Policy (0=Up, 1=Down, 2=Left, 3=Right):\")\n",
"print(policy)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Awesome Work! \n",
"You've implemented a classic RL agent from scratch. This is how robots and game AI learn!\n",
"You have now completed the entire practice series!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}