Reward Rush: Taxi-v3 Q-Learning
This repository contains a tabular Q-learning agent for the Taxi-v3 environment.
Model Architecture
The model uses a discrete Q-table with the following specifications:
- State Space: 500 discrete states (representing taxi coordinates, passenger location, and destination)
- Action Space: 6 discrete actions (South, North, East, West, Pickup, Dropoff)
- Format: Pickle-serialized NumPy array
- Dimensions: (500, 6)
Common Implementation Mistakes to Avoid
- Loading Method: The .pkl file is a serialized NumPy array. Do not use torch.load; use the pickle library or np.load.
- Indexing: The state returned by Gymnasium is an integer. Use this integer directly to index into the Q-table row.
- Policy: During testing, the agent should always select the action with the maximum Q-value (exploitation) rather than using epsilon-greedy (exploration).
Download and Test Code
import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download
def run_taxi_test():
path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Taxi", filename="q-learning.pkl")
with open(path, "rb") as f:
q_table = pickle.load(f)
env = gym.make("Taxi-v3")
total_success = 0
num_episodes = 100
for _ in range(num_episodes):
state, _ = env.reset()
done = False
while not done:
action = np.argmax(q_table[state])
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
if terminated and reward == 20:
total_success += 1
print(f"Success Rate: {total_success}/{num_episodes}")
env.close()
if __name__ == "__main__":
run_taxi_test()
Evaluation results
- success_rate on Taxi-v3self-reported100%