Q-Learning Agent playing Taxi-v3 ๐Ÿš–

This is a trained Q-Learning agent playing Taxi-v3. The agent was trained using a custom implementation of the Q-Learning algorithm.

๐ŸŽฎ Environment

  • Environment: Taxi-v3
  • State Space: 500 discrete states (25 taxi positions ร— 5 passenger locations ร— 4 destinations)
  • Action Space: 6 discrete actions (South, North, East, West, Pickup, Dropoff)

๐Ÿ“Š Evaluation Results

Metric Value
Mean Reward 7.56 +/- 2.71
Evaluation Episodes 100

โš™๏ธ Hyperparameters

The agent was trained using the following hyperparameters:

  • Total Training Episodes: 25,000
  • Learning Rate: 0.7
  • Gamma (Discount Factor): 0.95
  • Max Steps per Episode: 99
  • Epsilon (Exploration) Start: 1.0
  • Epsilon (Exploration) Min: 0.05
  • Decay Rate: 0.005

๐Ÿ Usage

To use this model, you need gymnasium and pickle5 installed. You can load the model and evaluate it using the code below:

import gymnasium as gym
import pickle5 as pickle
import numpy as np
from huggingface_hub import hf_hub_download

# 1. Download the model file from the Hub
repo_id = "Tejas-Anvekar/Qtable_taxi-v3"
filename = "q-learning.pkl"

pickle_model = hf_hub_download(repo_id=repo_id, filename=filename)

# 2. Load the model configuration and Q-table
with open(pickle_model, 'rb') as f:
    model = pickle.load(f)

# 3. Create the environment
env = gym.make(model["env_id"], render_mode="rgb_array")

# 4. Define the Greedy Policy
def greedy_policy(Qtable, state):
    action = np.argmax(Qtable[state][:])
    return action

# 5. Evaluate the agent
state, info = env.reset()
terminated = False
truncated = False
total_reward = 0

print("Agent is playing...")
while not terminated and not truncated:
    action = greedy_policy(model["qtable"], state)
    next_state, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    state = next_state

print(f"Game Finished! Total Reward: {total_reward}")
env.close()
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results