Q-Learning Agent playing Taxi-v3 ๐
This is a trained Q-Learning agent playing Taxi-v3. The agent was trained using a custom implementation of the Q-Learning algorithm.
๐ฎ Environment
- Environment:
Taxi-v3 - State Space: 500 discrete states (25 taxi positions ร 5 passenger locations ร 4 destinations)
- Action Space: 6 discrete actions (South, North, East, West, Pickup, Dropoff)
๐ Evaluation Results
| Metric | Value |
|---|---|
| Mean Reward | 7.56 +/- 2.71 |
| Evaluation Episodes | 100 |
โ๏ธ Hyperparameters
The agent was trained using the following hyperparameters:
- Total Training Episodes: 25,000
- Learning Rate: 0.7
- Gamma (Discount Factor): 0.95
- Max Steps per Episode: 99
- Epsilon (Exploration) Start: 1.0
- Epsilon (Exploration) Min: 0.05
- Decay Rate: 0.005
๐ Usage
To use this model, you need gymnasium and pickle5 installed. You can load the model and evaluate it using the code below:
import gymnasium as gym
import pickle5 as pickle
import numpy as np
from huggingface_hub import hf_hub_download
# 1. Download the model file from the Hub
repo_id = "Tejas-Anvekar/Qtable_taxi-v3"
filename = "q-learning.pkl"
pickle_model = hf_hub_download(repo_id=repo_id, filename=filename)
# 2. Load the model configuration and Q-table
with open(pickle_model, 'rb') as f:
model = pickle.load(f)
# 3. Create the environment
env = gym.make(model["env_id"], render_mode="rgb_array")
# 4. Define the Greedy Policy
def greedy_policy(Qtable, state):
action = np.argmax(Qtable[state][:])
return action
# 5. Evaluate the agent
state, info = env.reset()
terminated = False
truncated = False
total_reward = 0
print("Agent is playing...")
while not terminated and not truncated:
action = greedy_policy(model["qtable"], state)
next_state, reward, terminated, truncated, info = env.step(action)
total_reward += reward
state = next_state
print(f"Game Finished! Total Reward: {total_reward}")
env.close()
Evaluation results
- mean_reward on Taxi-v3self-reported7.56 +/- 2.71