Q-Learning Agent playing Taxi-v3

This is a trained model of a Q-Learning agent playing Taxi-v3 .

Hyperparameters

{
    "env_id": "Taxi-v3",
    "max_steps": 99,
    "n_training_episodes": 1000000,
    "n_eval_episodes": 100,
    "eval_seed": [
        16,
        54,
        165,
        177,
        191,
        191,
        120,
        80,
        149,
        178,
        48,
        38,
        6,
        125,
        174,
        73,
        50,
        172,
        100,
        148,
        146,
        6,
        25,
        40,
        68,
        148,
        49,
        167,
        9,
        97,
        164,
        176,
        61,
        7,
        54,
        55,
        161,
        131,
        184,
        51,
        170,
        12,
        120,
        113,
        95,
        126,
        51,
        98,
        36,
        135,
        54,
        82,
        45,
        95,
        89,
        59,
        95,
        124,
        9,
        113,
        58,
        85,
        51,
        134,
        121,
        169,
        105,
        21,
        30,
        11,
        50,
        65,
        12,
        43,
        82,
        145,
        152,
        97,
        106,
        55,
        31,
        85,
        38,
        112,
        102,
        168,
        123,
        97,
        21,
        83,
        158,
        26,
        80,
        63,
        5,
        81,
        32,
        11,
        28,
        148
    ],
    "learning_rate": 0.01,
    "gamma": 0.95,
    "max_epsilon": 1.0,
    "min_epsilon": 0.05,
    "decay_rate": 0.0005
}

Evaluation Result

Mean Reward: 7.56 +/- 2.71

Usage

import gymnasium as gym
import pickle
import numpy as np

# Load model
with open("q-learning.pkl", "rb") as f:
    model = pickle.load(f)

env = gym.make("Taxi-v3")
# ... play ...

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on Taxi-v3
self-reported

7.560
std_reward on Taxi-v3
self-reported

2.710