Q-Learning Agent playing Blackjack-v1
Training Parameters
- Environment ID:
Blackjack-v1
- Training Episodes: 10000
- Max Steps per Episode: 99
- Learning Rate: 0.7
- Gamma (Discount Factor): 0.95
Evaluation Results
- Mean Reward: -0.19 ± 0.95
- Evaluation Episodes: 100
Usage
from huggingface_hub import hf_hub_download
import pickle
import gymnasium as gym
import numpy as np
repo_id = "YOUR_USERNAME/YOUR_REPO_NAME"
filename = "q-learning.pkl"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(model_path, "rb") as f:
model = pickle.load(f)
env = gym.make(
model["env_id"],
render_mode="rgb_array",
**model.get("env_config", {})
)
qtable = model["qtable"]
state = env.reset()
terminated = False
while not terminated:
if isinstance(state, tuple):
state_idx = model.get("state_to_index", lambda s: s)(state)
else:
state_idx = state
action = np.argmax(qtable[state_idx])
state, reward, terminated, truncated, _ = env.step(action)