YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SpaceMining PPO Agent
A PPO agent trained on the SpaceMining Gymnasium environment. This repository includes the final Stable-Baselines3 checkpoint, configuration, and evaluation metrics.
Model Description
- Algorithm: PPO (Stable-Baselines3)
- Environment: SpaceMining (Gymnasium)
- Action Space: Box(3,) โ thrust x, thrust y, mine toggle
- Observation Space: Box(53,) โ agent state, nearby asteroids (up to 15), mothership relative position
Quickstart
from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
from space_mining import make_env
ckpt_path = hf_hub_download(repo_id="LUNDECHEN/space-mining-ppo", filename="final_model.zip")
model = PPO.load(ckpt_path)
env = make_env(render_mode='rgb_array')
obs, _ = env.reset()
for _ in range(300):
# SB3 `predict` may return `(action, state, *extras)` depending on version.
prediction = model.predict(obs, deterministic=True)
action = prediction[0] if isinstance(prediction, (tuple, list)) else prediction
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
env.close()
Training Configuration
- See
hyperparams.json(algorithm hyperparameters) - See
env_config.json(environment parameters) - See
training_args.json(timesteps, device, versions)
Evaluation
- See
evaluation.json
| Metric | Value |
|---|---|
| mean_reward | 1037.7470 |
| std_reward | 1449.5437 |
| episodes | 100 |
Agent Behavior
License
- MIT
Authors
- Xinning Zhu (zhuxinning@shu.edu.cn)
- Lunde Chen (lundechen@shu.edu.cn)
Training Details
- Training Steps: 5,000,000
- Device: cpu
- Model Type: best
- GitHub Run: 17421809264
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
