Upload minecraft-learning-distributed_470k - 483,923 steps

b95d539 verified about 1 month ago

2.36 kB

	---
	tags:
	- reinforcement-learning
	- minecraft
	- stable-baselines3
	- PPO
	- deep-reinforcement-learning
	library_name: stable-baselines3
	model-index:
	- name: minecraft-learning-distributed_470k
	results: []
	---

	# minecraft-learning-distributed_470k

	A Minecraft RL agent trained with PPO (Proximal Policy Optimization) using Stable-Baselines3.

	This agent was trained to gather resources in Minecraft.

	## Training Details

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Steps \| 483,923 \|
	\| Episodes \| 56 \|
	\| Mean Reward \| 0.64 \|
	\| Best Reward \| 26.20 \|
	\| Reward Scheme \| gathering \|
	\| Learning Rate \| 0.0003 \|

	## Hardware

	- Training: NVIDIA RTX 5090 (32GB VRAM)
	- Environment: NVIDIA Jetson Orin AGX (64GB RAM)
	- LLM Server: NVIDIA DGX Spark - GPT-OSS-20B (vLLM)

	## Architecture

	- Algorithm: PPO (Proximal Policy Optimization)
	- Policy: MLP with [512, 512] hidden layers
	- Observation Space: 82 dimensions (position, velocity, vitals, hotbar, craftable flags)
	- Action Space: 37 discrete actions (movement, mining, crafting, inventory)

	## Usage

	```python
	from huggingface_hub import hf_hub_download
	from stable_baselines3 import PPO

	# Download model
	hf_hub_download(
	repo_id='cahlen/minecraft-learning-distributed_470k',
	filename='model.zip',
	local_dir='./models'
	)

	# Load and use
	model = PPO.load('./models/model.zip')

	# Run inference
	obs = env.reset()
	action, _ = model.predict(obs, deterministic=True)
	```

	## Environment Setup

	This model was trained on a custom Minecraft environment using:
	- [Mineflayer](https://github.com/PrismarineJS/mineflayer) for bot control
	- Custom Gymnasium wrapper for RL interface
	- Vision features extracted from game data (not computer vision)

	## Training Configuration

	```python
	PPO(
	"MlpPolicy",
	env,
	learning_rate=1e-3,
	n_steps=256,
	batch_size=256,
	n_epochs=15,
	gamma=0.99,
	gae_lambda=0.95,
	ent_coef=0.02,
	clip_range=0.2,
	policy_kwargs={"net_arch": [512, 512]},
	)
	```

	## License

	MIT

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{minecraft_learning_distributed_470k},
	author = {cahlen},
	title = {minecraft-learning-distributed_470k},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/cahlen/minecraft-learning-distributed_470k}}
	}
	```