|
|
--- |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- minecraft |
|
|
- stable-baselines3 |
|
|
- PPO |
|
|
- deep-reinforcement-learning |
|
|
library_name: stable-baselines3 |
|
|
model-index: |
|
|
- name: minecraft-learning-distributed_470k |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# minecraft-learning-distributed_470k |
|
|
|
|
|
A Minecraft RL agent trained with PPO (Proximal Policy Optimization) using Stable-Baselines3. |
|
|
|
|
|
This agent was trained to gather resources in Minecraft. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Total Steps** | 483,923 | |
|
|
| **Episodes** | 56 | |
|
|
| **Mean Reward** | 0.64 | |
|
|
| **Best Reward** | 26.20 | |
|
|
| **Reward Scheme** | gathering | |
|
|
| **Learning Rate** | 0.0003 | |
|
|
|
|
|
## Hardware |
|
|
|
|
|
- **Training:** NVIDIA RTX 5090 (32GB VRAM) |
|
|
- **Environment:** NVIDIA Jetson Orin AGX (64GB RAM) |
|
|
- **LLM Server:** NVIDIA DGX Spark - GPT-OSS-20B (vLLM) |
|
|
|
|
|
## Architecture |
|
|
|
|
|
- **Algorithm:** PPO (Proximal Policy Optimization) |
|
|
- **Policy:** MLP with [512, 512] hidden layers |
|
|
- **Observation Space:** 82 dimensions (position, velocity, vitals, hotbar, craftable flags) |
|
|
- **Action Space:** 37 discrete actions (movement, mining, crafting, inventory) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
from stable_baselines3 import PPO |
|
|
|
|
|
# Download model |
|
|
hf_hub_download( |
|
|
repo_id='cahlen/minecraft-learning-distributed_470k', |
|
|
filename='model.zip', |
|
|
local_dir='./models' |
|
|
) |
|
|
|
|
|
# Load and use |
|
|
model = PPO.load('./models/model.zip') |
|
|
|
|
|
# Run inference |
|
|
obs = env.reset() |
|
|
action, _ = model.predict(obs, deterministic=True) |
|
|
``` |
|
|
|
|
|
## Environment Setup |
|
|
|
|
|
This model was trained on a custom Minecraft environment using: |
|
|
- [Mineflayer](https://github.com/PrismarineJS/mineflayer) for bot control |
|
|
- Custom Gymnasium wrapper for RL interface |
|
|
- Vision features extracted from game data (not computer vision) |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
```python |
|
|
PPO( |
|
|
"MlpPolicy", |
|
|
env, |
|
|
learning_rate=1e-3, |
|
|
n_steps=256, |
|
|
batch_size=256, |
|
|
n_epochs=15, |
|
|
gamma=0.99, |
|
|
gae_lambda=0.95, |
|
|
ent_coef=0.02, |
|
|
clip_range=0.2, |
|
|
policy_kwargs={"net_arch": [512, 512]}, |
|
|
) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{minecraft_learning_distributed_470k}, |
|
|
author = {cahlen}, |
|
|
title = {minecraft-learning-distributed_470k}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/cahlen/minecraft-learning-distributed_470k}} |
|
|
} |
|
|
``` |
|
|
|