--- tags: - Blackjack-v1 - q-learning - reinforcement-learning - custom-implementation library_name: reinforcement-learning model-index: - name: blackjack-qlearning-agent results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: Blackjack-v1 type: Blackjack-v1 metrics: - type: mean_reward value: -0.19 +/- 0.95 name: mean_reward verified: false --- # **Q-Learning** Agent playing **Blackjack-v1** ## Training Parameters - **Environment ID**: `Blackjack-v1` - **Training Episodes**: 10000 - **Max Steps per Episode**: 99 - **Learning Rate**: 0.7 - **Gamma (Discount Factor)**: 0.95 ## Evaluation Results - **Mean Reward**: -0.19 ± 0.95 - **Evaluation Episodes**: 100 ## Usage ```python from huggingface_hub import hf_hub_download import pickle import gymnasium as gym import numpy as np # 请将下面的占位符替换为你的实际仓库信息 repo_id = "YOUR_USERNAME/YOUR_REPO_NAME" # 替换为你的仓库 filename = "q-learning.pkl" # 加载模型 model_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(model_path, "rb") as f: model = pickle.load(f) # 重建环境 env = gym.make( model["env_id"], render_mode="rgb_array", **model.get("env_config", {}) ) # 使用Q表进行推理 qtable = model["qtable"] # 简单的推理示例 state = env.reset() terminated = False while not terminated: # 状态转换为索引 if isinstance(state, tuple): state_idx = model.get("state_to_index", lambda s: s)(state) else: state_idx = state action = np.argmax(qtable[state_idx]) state, reward, terminated, truncated, _ = env.step(action)