LUNDECHEN commited on
Commit
1bcfe99
·
verified ·
1 Parent(s): 90c2003

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SpaceMining PPO Agent
2
+
3
+ A PPO agent trained on the SpaceMining Gymnasium environment. This repository includes the final Stable-Baselines3 checkpoint, configuration, and evaluation metrics.
4
+
5
+ ## Model Description
6
+
7
+ - Algorithm: PPO (Stable-Baselines3)
8
+ - Environment: SpaceMining (Gymnasium)
9
+ - Action Space: Box(3,) — thrust x, thrust y, mine toggle
10
+ - Observation Space: Box(53,) — agent state, nearby asteroids (up to 15), mothership relative position
11
+
12
+ ## Quickstart
13
+
14
+ ```python
15
+ from huggingface_hub import hf_hub_download
16
+ from stable_baselines3 import PPO
17
+ from space_mining import make_env
18
+
19
+ ckpt_path = hf_hub_download(repo_id="LUNDECHEN/space-mining-ppo", filename="final_model.zip")
20
+ model = PPO.load(ckpt_path)
21
+
22
+ env = make_env(render_mode='rgb_array')
23
+ obs, _ = env.reset()
24
+ for _ in range(300):
25
+ # SB3 `predict` may return `(action, state, *extras)` depending on version.
26
+ prediction = model.predict(obs, deterministic=True)
27
+ action = prediction[0] if isinstance(prediction, (tuple, list)) else prediction
28
+ obs, reward, terminated, truncated, info = env.step(action)
29
+ if terminated or truncated:
30
+ break
31
+ env.close()
32
+ ```
33
+
34
+ ## Training Configuration
35
+
36
+ - See `hyperparams.json` (algorithm hyperparameters)
37
+ - See `env_config.json` (environment parameters)
38
+ - See `training_args.json` (timesteps, device, versions)
39
+
40
+ ## Evaluation
41
+
42
+ - See `evaluation.json`
43
+
44
+ | Metric | Value |
45
+ |---------------|-------|
46
+ | mean_reward | 1037.7470 |
47
+ | std_reward | 1449.5437 |
48
+ | episodes | 100 |
49
+
50
+ ## Agent Behavior
51
+
52
+ ![Agent in action](agent_long.gif)
53
+
54
+ ## License
55
+
56
+ - MIT
57
+
58
+ ## Authors
59
+
60
+ - Xinning Zhu (zhuxinning@shu.edu.cn)
61
+ - Lunde Chen (lundechen@shu.edu.cn)
62
+
63
+
64
+ ## Training Details
65
+
66
+ - **Training Steps**: 5,000,000
67
+ - **Device**: cpu
68
+ - **Model Type**: best
69
+ - **GitHub Run**: [17421809264](https://github.com/reveurmichael/space_mining/actions/runs/17421809264)