Reward Rush: Taxi-v3 Q-Learning

This repository contains a tabular Q-learning agent for the Taxi-v3 environment.

Model Architecture

The model uses a discrete Q-table with the following specifications:

State Space: 500 discrete states (representing taxi coordinates, passenger location, and destination)
Action Space: 6 discrete actions (South, North, East, West, Pickup, Dropoff)
Format: Pickle-serialized NumPy array
Dimensions: (500, 6)

Common Implementation Mistakes to Avoid

Loading Method: The .pkl file is a serialized NumPy array. Do not use torch.load; use the pickle library or np.load.
Indexing: The state returned by Gymnasium is an integer. Use this integer directly to index into the Q-table row.
Policy: During testing, the agent should always select the action with the maximum Q-value (exploitation) rather than using epsilon-greedy (exploration).

Download and Test Code

import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download

def run_taxi_test():
    path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Taxi", filename="q-learning.pkl")
    
    with open(path, "rb") as f:
        q_table = pickle.load(f)

    env = gym.make("Taxi-v3")
    total_success = 0
    num_episodes = 100

    for _ in range(num_episodes):
        state, _ = env.reset()
        done = False
        while not done:
            action = np.argmax(q_table[state])
            state, reward, terminated, truncated, _ = env.step(action)
            done = terminated or truncated
            if terminated and reward == 20:
                total_success += 1

    print(f"Success Rate: {total_success}/{num_episodes}")
    env.close()

if __name__ == "__main__":
    run_taxi_test()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

success_rate on Taxi-v3
self-reported

100%