π Q-Learning Agent for Taxi-v3
This is a trained Q-Learning agent for the Taxi-v3 environment using tabular Q-learning.
Developer
Vishand S (@Vishand03)
Frameworks
- NumPy
- Gymnasium
Training Details
- Algorithm: Q-Learning
- Timesteps / Episodes: 2,000,000
- Learning rate: 0.1
- Discount factor (Ξ³): 0.99
- Epsilon decay: 0.0005
- Max / Min epsilon: 1.0 / 0.01
- Mean Reward: ~7.92 Β± 2.60
π₯ Demo (Preview)
π¬ Full Demo Video
π Watch the full video here
π Usage
import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download
# -------------------------
# Load Q-table from Hugging Face
# -------------------------
q_table_path = hf_hub_download("Vishand03/q-Taxi-v3", "q-learning.pkl")
with open(q_table_path, "rb") as f:
Qtable = pickle.load(f)
# -------------------------
# Create Taxi Environment
# -------------------------
env = gym.make("Taxi-v3", render_mode="human")
state, _ = env.reset()
terminated, truncated = False, False
# -------------------------
# Run one episode
# -------------------------
total_reward = 0
while not terminated and not truncated:
action = np.argmax(Qtable[state])
state, reward, terminated, truncated, _ = env.step(action)
total_reward += reward
print(f"Episode finished with total reward: {total_reward}")
Evaluation results
- mean_reward on Taxi-v3self-reported7.92 +/- 2.60
