Q-Learning Agent playing Taxi-v3
This is a trained model of a Q-Learning agent playing Taxi-v3 .
Hyperparameters
{
"env_id": "Taxi-v3",
"max_steps": 99,
"n_training_episodes": 1000000,
"n_eval_episodes": 100,
"eval_seed": [
16,
54,
165,
177,
191,
191,
120,
80,
149,
178,
48,
38,
6,
125,
174,
73,
50,
172,
100,
148,
146,
6,
25,
40,
68,
148,
49,
167,
9,
97,
164,
176,
61,
7,
54,
55,
161,
131,
184,
51,
170,
12,
120,
113,
95,
126,
51,
98,
36,
135,
54,
82,
45,
95,
89,
59,
95,
124,
9,
113,
58,
85,
51,
134,
121,
169,
105,
21,
30,
11,
50,
65,
12,
43,
82,
145,
152,
97,
106,
55,
31,
85,
38,
112,
102,
168,
123,
97,
21,
83,
158,
26,
80,
63,
5,
81,
32,
11,
28,
148
],
"learning_rate": 0.01,
"gamma": 0.95,
"max_epsilon": 1.0,
"min_epsilon": 0.05,
"decay_rate": 0.0005
}
Evaluation Result
Mean Reward: 7.56 +/- 2.71
Usage
import gymnasium as gym
import pickle
import numpy as np
# Load model
with open("q-learning.pkl", "rb") as f:
model = pickle.load(f)
env = gym.make("Taxi-v3")
# ... play ...
Evaluation results
- mean_reward on Taxi-v3self-reported7.560
- std_reward on Taxi-v3self-reported2.710