|
|
--- |
|
|
tags: |
|
|
- Blackjack-v1 |
|
|
- q-learning |
|
|
- reinforcement-learning |
|
|
- custom-implementation |
|
|
library_name: reinforcement-learning |
|
|
model-index: |
|
|
- name: blackjack-qlearning-agent |
|
|
results: |
|
|
- task: |
|
|
type: reinforcement-learning |
|
|
name: reinforcement-learning |
|
|
dataset: |
|
|
name: Blackjack-v1 |
|
|
type: Blackjack-v1 |
|
|
metrics: |
|
|
- type: mean_reward |
|
|
value: -0.19 +/- 0.95 |
|
|
name: mean_reward |
|
|
verified: false |
|
|
--- |
|
|
|
|
|
# **Q-Learning** Agent playing **Blackjack-v1** |
|
|
|
|
|
## Training Parameters |
|
|
- **Environment ID**: `Blackjack-v1` |
|
|
- **Training Episodes**: 10000 |
|
|
- **Max Steps per Episode**: 99 |
|
|
- **Learning Rate**: 0.7 |
|
|
- **Gamma (Discount Factor)**: 0.95 |
|
|
|
|
|
## Evaluation Results |
|
|
- **Mean Reward**: -0.19 ± 0.95 |
|
|
- **Evaluation Episodes**: 100 |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import pickle |
|
|
import gymnasium as gym |
|
|
import numpy as np |
|
|
|
|
|
# 请将下面的占位符替换为你的实际仓库信息 |
|
|
repo_id = "YOUR_USERNAME/YOUR_REPO_NAME" # 替换为你的仓库 |
|
|
filename = "q-learning.pkl" |
|
|
|
|
|
# 加载模型 |
|
|
model_path = hf_hub_download(repo_id=repo_id, filename=filename) |
|
|
|
|
|
with open(model_path, "rb") as f: |
|
|
model = pickle.load(f) |
|
|
|
|
|
# 重建环境 |
|
|
env = gym.make( |
|
|
model["env_id"], |
|
|
render_mode="rgb_array", |
|
|
**model.get("env_config", {}) |
|
|
) |
|
|
|
|
|
# 使用Q表进行推理 |
|
|
qtable = model["qtable"] |
|
|
|
|
|
# 简单的推理示例 |
|
|
state = env.reset() |
|
|
terminated = False |
|
|
while not terminated: |
|
|
# 状态转换为索引 |
|
|
if isinstance(state, tuple): |
|
|
state_idx = model.get("state_to_index", lambda s: s)(state) |
|
|
else: |
|
|
state_idx = state |
|
|
|
|
|
action = np.argmax(qtable[state_idx]) |
|
|
state, reward, terminated, truncated, _ = env.step(action) |
|
|
|