{% extends "layout.html" %} {% block content %} Study Guide: RL Action & Policy

🧠 Study Guide: Action & Policy in Reinforcement Learning

🔹 1. Introduction

Story-style intuition: The Video Game Character

Think of a character in a video game. At any moment, the character has a set of possible moves they can make—jump, run, duck, attack. These are the character's Actions. The player controlling the character has a strategy in their head: "If a monster is close, I should attack. If there's a pit, I should jump." This strategy, this set of rules that dictates which action to take in any situation, is the Policy. In Reinforcement Learning, our goal is to teach the agent (the character) to learn the best possible policy on its own to win the game (maximize rewards).

In the world of RL, the Action is the "what" (what the agent does) and the Policy is the "how" (how the agent decides what to do). Together, they form the core of the agent's behavior.

🔹 2. Action (A)

An Action is one of the possible moves an agent can make in a given state. The set of all possible actions in a state is called the action space.

Types of Action Spaces:

The set of available actions can depend on the current state, denoted as \( A(s) \).

🔹 3. Policy (π)

A Policy is the agent's strategy or "brain." It is a rule that maps a state to an action. The ultimate goal of RL is to find an optimal policy—a policy that maximizes the total expected reward over time.

Mathematically, a policy is a distribution over actions given a state: \( \pi(a|s) = P(A_t = a \mid S_t = s) \)

Types of Policies:

🔹 4. Policy vs. Value Function

It's crucial to distinguish between a policy and a value function, as they work together to guide the agent.

Modern RL algorithms often learn both. They use the value function to evaluate how good their actions are, which in turn helps them improve their policy.

🔹 5. Interaction Flow with Action & Policy

The Action and Policy are at the heart of the agent's decision-making in the RL loop.

  1. Agent observes state (s): "I am at a crossroad."
  2. Agent follows its policy (π) to choose an action (a): "My policy tells me to go left."
  3. Environment transitions and gives reward (r): The agent moves left, finds a gold coin (+10 reward), and arrives at a new state.
  4. Agent improves its policy: The agent thinks, "That was a great outcome! My policy was right to tell me to go left from that crossroad. I should strengthen that rule."

🔹 6. Detailed Examples

Example 1: Chess

Example 2: Self-Driving Car

🔹 7. Challenges

📝 Quick Quiz: Test Your Knowledge

  1. What is the difference between a discrete and a continuous action space? Give an example of each.
  2. What is the difference between a deterministic and a stochastic policy? When might a stochastic policy be useful?
  3. Can an agent have a good policy without knowing the value function?

Answers

1. A discrete action space has a finite number of distinct options (e.g., move left/right). A continuous action space has actions represented by real numbers in a range (e.g., turning a steering wheel by 15.7 degrees).

2. A deterministic policy always chooses the same action for a state. A stochastic policy outputs a probability distribution over actions. A stochastic policy is very useful for exploration (trying new things) and for games where unpredictability is an advantage (like poker).

3. Yes, but it's harder. Some algorithms, called "policy-gradient" methods, can directly search for a good policy without learning a value function. However, many of the most successful modern algorithms learn both, using the value function to help guide improvements to the policy.

{% endblock %}