dpang commited on
Commit
7803fad
Β·
verified Β·
1 Parent(s): ca8d912

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +300 -5
README.md CHANGED
@@ -1,10 +1,305 @@
1
  ---
2
- title: Rans Env
3
- emoji: πŸ’»
4
- colorFrom: red
5
- colorTo: red
6
  sdk: docker
7
  pinned: false
 
 
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: RANS Spacecraft Navigation Environment
3
+ emoji: πŸ›Έ
4
+ colorFrom: indigo
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ - reinforcement-learning
13
+ - robotics
14
+ - spacecraft
15
  ---
16
 
17
+ # RANS β€” OpenEnv Environment
18
+
19
+ **RANS: Reinforcement Learning based Autonomous Navigation for Spacecrafts**
20
+
21
+ OpenEnv-compatible implementation of the paper:
22
+
23
+ > El-Hariry, Richard, Olivares-Mendez (2023).
24
+ > *"RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts."*
25
+ > [arXiv:2310.07393](https://arxiv.org/abs/2310.07393)
26
+
27
+ Original GPU implementation (Isaac Gym): [elharirymatteo/RANS](https://github.com/elharirymatteo/RANS)
28
+
29
+ ---
30
+
31
+ ## Overview
32
+
33
+ This package wraps a pure-Python/NumPy 2-D spacecraft physics simulation (no Isaac Gym required) into an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible environment. The server can run inside a standard Docker container on CPU and exposes the standard OpenEnv HTTP/WebSocket API.
34
+
35
+ ### Supported Tasks
36
+
37
+ | Task | Description | Obs size | Reward |
38
+ |------|-------------|----------|--------|
39
+ | `GoToPosition` | Reach target (x, y) | 6 | exp(βˆ’β€–Ξ”pβ€–Β²/2σ²) |
40
+ | `GoToPose` | Reach target (x, y, ΞΈ) | 7 | weighted position + heading |
41
+ | `TrackLinearVelocity` | Maintain (vx, vy) | 6 | exp(βˆ’β€–Ξ”vβ€–Β²/2σ²) |
42
+ | `TrackLinearAngularVelocity` | Maintain (vx, vy, Ο‰) | 8 | weighted linear + angular |
43
+
44
+ ### Spacecraft Model
45
+
46
+ - **Platform**: 2-D rigid body (MFP2D β€” Modular Floating Platform)
47
+ - **State**: `[x, y, ΞΈ, vx, vy, Ο‰]`
48
+ - **Thrusters**: 8-thruster default layout (configurable)
49
+ - **Action**: continuous activation ∈ [0, 1] per thruster
50
+ - **Integration**: Euler, 50 Hz (dt = 0.02 s)
51
+
52
+ ---
53
+
54
+ ## Quick Start
55
+
56
+ ### Run locally (no Docker)
57
+
58
+ ```bash
59
+ pip install -e ".[dev]"
60
+ RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --host 0.0.0.0 --port 8000
61
+ ```
62
+
63
+ ### Client usage (async)
64
+
65
+ ```python
66
+ import asyncio
67
+ from rans_env import RANSEnv, SpacecraftAction
68
+
69
+ async def main():
70
+ async with RANSEnv(base_url="http://localhost:8000") as env:
71
+ obs = await env.reset()
72
+ print(f"Task: {obs.task}")
73
+ print(f"Initial obs: {obs.state_obs}")
74
+
75
+ n = len(obs.thruster_masks) # 8 thrusters
76
+ result = await env.step(SpacecraftAction(thrusters=[0.0] * n))
77
+ print(f"Reward: {result.reward:.4f}, Done: {result.done}")
78
+
79
+ asyncio.run(main())
80
+ ```
81
+
82
+ ### Client usage (synchronous)
83
+
84
+ ```python
85
+ from rans_env import RANSEnv, SpacecraftAction
86
+
87
+ with RANSEnv(base_url="http://localhost:8000").sync() as env:
88
+ obs = env.reset()
89
+ for _ in range(500):
90
+ n = len(obs.thruster_masks)
91
+ result = env.step(SpacecraftAction(thrusters=[0.5] * n))
92
+ obs = result.observation
93
+ if result.done:
94
+ obs = env.reset()
95
+ ```
96
+
97
+ ### Docker
98
+
99
+ ```bash
100
+ # Build
101
+ docker build -f server/Dockerfile -t rans-env .
102
+
103
+ # Run GoToPose task
104
+ docker run -e RANS_TASK=GoToPose -p 8000:8000 rans-env
105
+ ```
106
+
107
+ ---
108
+
109
+ ## Project Structure
110
+
111
+ ```
112
+ RANS/
113
+ β”œβ”€β”€ __init__.py # Public API: RANSEnv, SpacecraftAction, ...
114
+ β”œβ”€β”€ client.py # RANSEnv OpenEnv client
115
+ β”œβ”€β”€ models.py # SpacecraftAction / Observation / State
116
+ β”œβ”€β”€ openenv.yaml # OpenEnv environment manifest
117
+ β”œβ”€β”€ pyproject.toml # Package configuration
118
+ └── server/
119
+ β”œβ”€β”€ app.py # FastAPI entry-point (create_app)
120
+ β”œβ”€β”€ rans_environment.py # RANSEnvironment (Environment subclass)
121
+ β”œβ”€β”€ spacecraft_physics.py # 2-D rigid-body dynamics (NumPy)
122
+ β”œβ”€β”€ tasks/
123
+ β”‚ β”œβ”€β”€ base.py # BaseTask ABC
124
+ β”‚ β”œβ”€β”€ go_to_position.py # GoToPositionTask
125
+ β”‚ β”œβ”€β”€ go_to_pose.py # GoToPoseTask
126
+ β”‚ β”œβ”€β”€ track_linear_velocity.py
127
+ β”‚ └── track_linear_angular_velocity.py
128
+ β”œβ”€β”€ tests/
129
+ β”‚ β”œβ”€β”€ test_physics.py # Physics unit tests
130
+ β”‚ β”œβ”€β”€ test_tasks.py # Task unit tests
131
+ β”‚ └── test_environment.py # Integration tests
132
+ └── Dockerfile
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Configuration
138
+
139
+ ### Environment variables (Docker / server)
140
+
141
+ | Variable | Default | Description |
142
+ |----------|---------|-------------|
143
+ | `RANS_TASK` | `GoToPosition` | Task name |
144
+ | `RANS_MAX_STEPS` | `500` | Max steps per episode |
145
+
146
+ ### Task hyper-parameters
147
+
148
+ Pass a dict to `RANSEnvironment(task_config={...})`:
149
+
150
+ ```python
151
+ env = RANSEnvironment(
152
+ task="GoToPosition",
153
+ task_config={
154
+ "tolerance": 0.05, # success threshold (m)
155
+ "reward_sigma": 0.5, # Gaussian reward width
156
+ "spawn_max_radius": 5.0, # max target distance (m)
157
+ },
158
+ )
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Observation Format
164
+
165
+ `SpacecraftObservation` fields:
166
+
167
+ | Field | Shape | Description |
168
+ |-------|-------|-------------|
169
+ | `state_obs` | [6–8] | Task-specific error / velocity observations |
170
+ | `thruster_transforms` | [8 Γ— 5] | `[px, py, dx, dy, F_max]` per thruster |
171
+ | `thruster_masks` | [8] | 1.0 = thruster present |
172
+ | `mass` | scalar | Platform mass (kg) |
173
+ | `inertia` | scalar | Moment of inertia (kgΒ·mΒ²) |
174
+ | `task` | str | Active task name |
175
+ | `reward` | scalar | Step reward ∈ [0, 1] |
176
+ | `done` | bool | Episode ended |
177
+ | `info` | dict | Diagnostics (error values, goal_reached, step) |
178
+
179
+ ---
180
+
181
+ ## Training an RL Agent
182
+
183
+ Three example scripts cover different training scenarios:
184
+
185
+ ### 1. Sanity check β€” random agent (`examples/random_agent.py`)
186
+
187
+ First verify the server is reachable and the environment works:
188
+
189
+ ```bash
190
+ # Start server (one terminal)
191
+ RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --port 8000
192
+
193
+ # Run random agent (another terminal)
194
+ python examples/random_agent.py --task GoToPosition --episodes 5
195
+ ```
196
+
197
+ ### 2. PPO training β€” local, no server (`examples/ppo_train.py`)
198
+
199
+ Trains a MLP policy with PPO directly against `RANSEnvironment` (no HTTP
200
+ server required). Uses pure PyTorch β€” no additional RL library needed.
201
+
202
+ ```bash
203
+ pip install torch gymnasium
204
+
205
+ # Train GoToPosition (300 k steps)
206
+ python examples/ppo_train.py --task GoToPosition --timesteps 300000
207
+
208
+ # Train GoToPose
209
+ python examples/ppo_train.py --task GoToPose --timesteps 500000
210
+
211
+ # Evaluate a saved checkpoint
212
+ python examples/ppo_train.py --eval --checkpoint rans_ppo_GoToPosition.pt \
213
+ --task GoToPosition --eval-episodes 20
214
+ ```
215
+
216
+ Key hyper-parameters (all match the original RANS paper):
217
+
218
+ | Flag | Default | Description |
219
+ |------|---------|-------------|
220
+ | `--n-steps` | 2048 | Rollout length per update |
221
+ | `--n-epochs` | 10 | PPO epochs per rollout |
222
+ | `--gamma` | 0.99 | Discount factor |
223
+ | `--lam` | 0.95 | GAE-Ξ» |
224
+ | `--clip-eps` | 0.2 | PPO clipping |
225
+ | `--lr` | 3e-4 | Adam learning rate |
226
+
227
+ ### 3. Gymnasium wrapper β€” use with any RL library (`examples/gymnasium_wrapper.py`)
228
+
229
+ Wraps `RANSEnvironment` as a `gymnasium.Env` for compatibility with
230
+ Stable-Baselines3, CleanRL, RLlib, TorchRL, etc:
231
+
232
+ ```python
233
+ from examples.gymnasium_wrapper import make_rans_env
234
+
235
+ env = make_rans_env(task="GoToPosition")
236
+ print(env.observation_space) # Box(56,)
237
+ print(env.action_space) # Box(8,) β€” thruster activations in [0, 1]
238
+
239
+ # Stable-Baselines3
240
+ from stable_baselines3 import PPO, SAC
241
+
242
+ model = PPO("MlpPolicy", env, verbose=1, n_steps=2048)
243
+ model.learn(total_timesteps=500_000)
244
+ model.save("rans_sb3_ppo")
245
+
246
+ # Or SAC for off-policy training
247
+ model = SAC("MlpPolicy", env, verbose=1)
248
+ model.learn(total_timesteps=500_000)
249
+ ```
250
+
251
+ ### 4. Remote training via OpenEnv client (`examples/openenv_client_train.py`)
252
+
253
+ Train against a running Docker server using `N` concurrent WebSocket
254
+ sessions (the canonical OpenEnv pattern):
255
+
256
+ ```bash
257
+ # Start server
258
+ docker run -e RANS_TASK=GoToPosition -p 8000:8000 rans-env
259
+
260
+ # Train with 4 parallel environment sessions
261
+ python examples/openenv_client_train.py --url http://localhost:8000 \
262
+ --n-envs 4 --episodes 50
263
+ ```
264
+
265
+ ### Observation & action spaces
266
+
267
+ | | |
268
+ |---|---|
269
+ | **Observation** | Flat vector: `[state_obs, thruster_transforms (flat), masks, mass, inertia]` |
270
+ | **Action** | `float32[8]` β€” thruster activations ∈ [0, 1] |
271
+ | **Reward** | Scalar ∈ [0, 1] β€” exponential decay from target error |
272
+ | **Done** | `True` when goal reached **or** step limit hit |
273
+
274
+ Observation sizes by task:
275
+
276
+ | Task | `state_obs` | total obs dim |
277
+ |------|------------|---------------|
278
+ | GoToPosition | 6 | 56 |
279
+ | GoToPose | 7 | 57 |
280
+ | TrackLinearVelocity | 6 | 56 |
281
+ | TrackLinearAngularVelocity | 8 | 58 |
282
+
283
+ ---
284
+
285
+ ## Tests
286
+
287
+ ```bash
288
+ pip install -e ".[dev]"
289
+ pytest server/tests/ -v
290
+ ```
291
+
292
+ ---
293
+
294
+ ## Citation
295
+
296
+ ```bibtex
297
+ @misc{elhariry2023rans,
298
+ title = {RANS: Highly-Parallelised Simulator for Reinforcement Learning
299
+ based Autonomous Navigating Spacecrafts},
300
+ author = {El-Hariry, Matteo and Richard, Antoine and Olivares-Mendez, Miguel},
301
+ year = {2023},
302
+ eprint = {2310.07393},
303
+ archivePrefix = {arXiv},
304
+ }
305
+ ```