{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 17 - Reinforcement Learning (Q-Learning)\n", "\n", "Welcome to Module 17! We are exploring **Reinforcement Learning** (RL). Unlike supervised learning, RL agents learn by interacting with an environment and receiving rewards or penalties.\n", "\n", "### Resources:\n", "Check out the **[Q-Learning Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for a breakdown of the Bellman Equation ($Q(s,a)$) and how the Agent-Environment loop works.\n", "\n", "### Objectives:\n", "1. **Agent-Environment Loop**: States, Actions, and Rewards.\n", "2. **Exploration vs. Exploitation**: The Epsilon-Greedy strategy.\n", "3. **Q-Table**: Learning the quality of actions.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Environment Simulation\n", "We will implement a simple \"Grid World\" where an agent has to find a treasure while avoiding traps." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "class SimpleGridWorld:\n", " def __init__(self, size=5):\n", " self.size = size\n", " self.state = (0, 0)\n", " self.goal = (size-1, size-1)\n", " self.trap = (size//2, size//2)\n", " \n", " def step(self, action):\n", " # 0=Up, 1=Down, 2=Left, 3=Right\n", " r, c = self.state\n", " if action == 0: r = max(0, r-1)\n", " elif action == 1: r = min(self.size-1, r+1)\n", " elif action == 2: c = max(0, c-1)\n", " elif action == 3: c = min(self.size-1, c+1)\n", " \n", " self.state = (r, c)\n", " \n", " if self.state == self.goal:\n", " return self.state, 10, True\n", " elif self.state == self.trap:\n", " return self.state, -5, True\n", " return self.state, -1, False\n", "\n", " def reset(self):\n", " self.state = (0, 0)\n", " return self.state\n", "\n", "env = SimpleGridWorld()\n", "print(\"Environment initialized!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Q-Learning Algorithm\n", "\n", "### Task 1: Training the Agent\n", "Initialize a Q-Table (5x5x4) with zeros and train the agent for 1000 episodes using the update rule:\n", "$Q(s, a) = Q(s, a) + \\alpha [R + \\gamma \\max Q(s', a') - Q(s, a)]$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alpha = 0.1 # Learning rate\n", "gamma = 0.9 # Discount factor\n", "epsilon = 0.2 # Exploration rate\n", "q_table = np.zeros((5, 5, 4))\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "for episode in range(1000):\n", " state = env.reset()\n", " done = False\n", " \n", " while not done:\n", " # Choose action\n", " if np.random.uniform(0, 1) < epsilon:\n", " action = np.random.choice(4) # Explore\n", " else:\n", " action = np.argmax(q_table[state[0], state[1]]) # Exploit\n", " \n", " next_state, reward, done = env.step(action)\n", " \n", " # Update Q-table\n", " old_value = q_table[state[0], state[1], action]\n", " next_max = np.max(q_table[next_state[0], next_state[1]])\n", " \n", " new_value = old_value + alpha * (reward + gamma * next_max - old_value)\n", " q_table[state[0], state[1], action] = new_value\n", " \n", " state = next_state\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Policy Visualization\n", "\n", "### Task 2: What did it learn?\n", "Display the learned policy by showing the best action for each cell in the grid." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "policy = np.argmax(q_table, axis=2)\n", "print(\"Learned Policy (0=Up, 1=Down, 2=Left, 3=Right):\")\n", "print(policy)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Awesome Work! \n", "You've implemented a classic RL agent from scratch. This is how robots and game AI learn!\n", "You have now completed the entire practice series!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }