# SpaceMining PPO Agent A PPO agent trained on the SpaceMining Gymnasium environment. This repository includes the final Stable-Baselines3 checkpoint, configuration, and evaluation metrics. ## Model Description - Algorithm: PPO (Stable-Baselines3) - Environment: SpaceMining (Gymnasium) - Action Space: Box(3,) — thrust x, thrust y, mine toggle - Observation Space: Box(53,) — agent state, nearby asteroids (up to 15), mothership relative position ## Quickstart ```python from huggingface_hub import hf_hub_download from stable_baselines3 import PPO from space_mining import make_env ckpt_path = hf_hub_download(repo_id="LUNDECHEN/space-mining-ppo", filename="final_model.zip") model = PPO.load(ckpt_path) env = make_env(render_mode='rgb_array') obs, _ = env.reset() for _ in range(300): # SB3 `predict` may return `(action, state, *extras)` depending on version. prediction = model.predict(obs, deterministic=True) action = prediction[0] if isinstance(prediction, (tuple, list)) else prediction obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: break env.close() ``` ## Training Configuration - See `hyperparams.json` (algorithm hyperparameters) - See `env_config.json` (environment parameters) - See `training_args.json` (timesteps, device, versions) ## Evaluation - See `evaluation.json` | Metric | Value | |---------------|-------| | mean_reward | 1037.7470 | | std_reward | 1449.5437 | | episodes | 100 | ## Agent Behavior ![Agent in action](agent_long.gif) ## License - MIT ## Authors - Xinning Zhu (zhuxinning@shu.edu.cn) - Lunde Chen (lundechen@shu.edu.cn) ## Training Details - **Training Steps**: 5,000,000 - **Device**: cpu - **Model Type**: best - **GitHub Run**: [17421809264](https://github.com/reveurmichael/space_mining/actions/runs/17421809264)