metadata
tags:
- Blackjack-v1
- q-learning
- reinforcement-learning
- custom-implementation
library_name: reinforcement-learning
model-index:
- name: blackjack-qlearning-agent
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: Blackjack-v1
type: Blackjack-v1
metrics:
- type: mean_reward
value: '-0.19 +/- 0.95'
name: mean_reward
verified: false
Q-Learning Agent playing Blackjack-v1
Training Parameters
- Environment ID:
Blackjack-v1 - Training Episodes: 10000
- Max Steps per Episode: 99
- Learning Rate: 0.7
- Gamma (Discount Factor): 0.95
Evaluation Results
- Mean Reward: -0.19 ± 0.95
- Evaluation Episodes: 100
Usage
from huggingface_hub import hf_hub_download
import pickle
import gymnasium as gym
import numpy as np
# 请将下面的占位符替换为你的实际仓库信息
repo_id = "YOUR_USERNAME/YOUR_REPO_NAME" # 替换为你的仓库
filename = "q-learning.pkl"
# 加载模型
model_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(model_path, "rb") as f:
model = pickle.load(f)
# 重建环境
env = gym.make(
model["env_id"],
render_mode="rgb_array",
**model.get("env_config", {})
)
# 使用Q表进行推理
qtable = model["qtable"]
# 简单的推理示例
state = env.reset()
terminated = False
while not terminated:
# 状态转换为索引
if isinstance(state, tuple):
state_idx = model.get("state_to_index", lambda s: s)(state)
else:
state_idx = state
action = np.argmax(qtable[state_idx])
state, reward, terminated, truncated, _ = env.step(action)