Q-Learning Agent playing Taxi-v3 🚖

This is a trained Q-Learning agent playing Taxi-v3. The agent was trained using a custom implementation of the Q-Learning algorithm.

🎮 Environment

Environment: Taxi-v3
State Space: 500 discrete states (25 taxi positions × 5 passenger locations × 4 destinations)
Action Space: 6 discrete actions (South, North, East, West, Pickup, Dropoff)

📊 Evaluation Results

Metric	Value
Mean Reward	7.56 +/- 2.71
Evaluation Episodes	100

⚙️ Hyperparameters

The agent was trained using the following hyperparameters:

Total Training Episodes: 25,000
Learning Rate: 0.7
Gamma (Discount Factor): 0.95
Max Steps per Episode: 99
Epsilon (Exploration) Start: 1.0
Epsilon (Exploration) Min: 0.05
Decay Rate: 0.005

🐍 Usage

To use this model, you need gymnasium and pickle5 installed. You can load the model and evaluate it using the code below:

import gymnasium as gym
import pickle5 as pickle
import numpy as np
from huggingface_hub import hf_hub_download

# 1. Download the model file from the Hub
repo_id = "Tejas-Anvekar/Qtable_taxi-v3"
filename = "q-learning.pkl"

pickle_model = hf_hub_download(repo_id=repo_id, filename=filename)

# 2. Load the model configuration and Q-table
with open(pickle_model, 'rb') as f:
    model = pickle.load(f)

# 3. Create the environment
env = gym.make(model["env_id"], render_mode="rgb_array")

# 4. Define the Greedy Policy
def greedy_policy(Qtable, state):
    action = np.argmax(Qtable[state][:])
    return action

# 5. Evaluate the agent
state, info = env.reset()
terminated = False
truncated = False
total_reward = 0

print("Agent is playing...")
while not terminated and not truncated:
    action = greedy_policy(model["qtable"], state)
    next_state, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    state = next_state

print(f"Game Finished! Total Reward: {total_reward}")
env.close()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on Taxi-v3
self-reported

7.56 +/- 2.71