title: RANS Spacecraft Navigation Environment
emoji: πΈ
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- reinforcement-learning
- robotics
- spacecraft
RANS β OpenEnv Environment
RANS: Reinforcement Learning based Autonomous Navigation for Spacecrafts
OpenEnv-compatible implementation of the paper:
El-Hariry, Richard, Olivares-Mendez (2023). "RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts." arXiv:2310.07393
Original GPU implementation (Isaac Gym): elharirymatteo/RANS
Live HuggingFace Space: https://huggingface.co/spaces/dpang/rans-env
Overview
This package wraps a pure-Python/NumPy 2-D spacecraft physics simulation (no Isaac Gym required) into an OpenEnv-compatible environment. The server can run inside a standard Docker container on CPU and exposes the standard OpenEnv HTTP/WebSocket API.
Supported Tasks
| Task | Description | Obs size | Reward |
|---|---|---|---|
GoToPosition |
Reach target (x, y) | 6 | exp(ββΞpβΒ²/2ΟΒ²) |
GoToPose |
Reach target (x, y, ΞΈ) | 7 | weighted position + heading |
TrackLinearVelocity |
Maintain (vx, vy) | 6 | exp(ββΞvβΒ²/2ΟΒ²) |
TrackLinearAngularVelocity |
Maintain (vx, vy, Ο) | 8 | weighted linear + angular |
Spacecraft Model
- Platform: 2-D rigid body (MFP2D β Modular Floating Platform)
- State:
[x, y, ΞΈ, vx, vy, Ο] - Thrusters: 8-thruster default layout (configurable)
- Action: continuous activation β [0, 1] per thruster
- Integration: Euler, 50 Hz (dt = 0.02 s)
Quick Start
Run locally (no Docker)
pip install -e ".[dev]"
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --host 0.0.0.0 --port 8000
Client usage (async)
import asyncio
from rans_env import RANSEnv, SpacecraftAction
async def main():
async with RANSEnv(base_url="http://localhost:8000") as env:
obs = await env.reset()
print(f"Task: {obs.task}")
print(f"Initial obs: {obs.state_obs}")
n = len(obs.thruster_masks) # 8 thrusters
result = await env.step(SpacecraftAction(thrusters=[0.0] * n))
print(f"Reward: {result.reward:.4f}, Done: {result.done}")
asyncio.run(main())
Client usage (synchronous)
from rans_env import RANSEnv, SpacecraftAction
with RANSEnv(base_url="http://localhost:8000").sync() as env:
obs = env.reset()
for _ in range(500):
n = len(obs.thruster_masks)
result = env.step(SpacecraftAction(thrusters=[0.5] * n))
obs = result.observation
if result.done:
obs = env.reset()
Docker
# Build
docker build -f server/Dockerfile -t rans-env .
# Run GoToPose task
docker run -e RANS_TASK=GoToPose -p 8000:8000 rans-env
Project Structure
RANS/
βββ __init__.py # Public API: RANSEnv, SpacecraftAction, ...
βββ client.py # RANSEnv OpenEnv client
βββ models.py # SpacecraftAction / Observation / State
βββ openenv.yaml # OpenEnv environment manifest
βββ pyproject.toml # Package configuration
βββ server/
βββ app.py # FastAPI entry-point (create_app)
βββ rans_environment.py # RANSEnvironment (Environment subclass)
βββ spacecraft_physics.py # 2-D rigid-body dynamics (NumPy)
βββ tasks/
β βββ base.py # BaseTask ABC
β βββ go_to_position.py # GoToPositionTask
β βββ go_to_pose.py # GoToPoseTask
β βββ track_linear_velocity.py
β βββ track_linear_angular_velocity.py
βββ tests/
β βββ test_physics.py # Physics unit tests
β βββ test_tasks.py # Task unit tests
β βββ test_environment.py # Integration tests
βββ Dockerfile
Configuration
Environment variables (Docker / server)
| Variable | Default | Description |
|---|---|---|
RANS_TASK |
GoToPosition |
Task name |
RANS_MAX_STEPS |
500 |
Max steps per episode |
Task hyper-parameters
Pass a dict to RANSEnvironment(task_config={...}):
env = RANSEnvironment(
task="GoToPosition",
task_config={
"tolerance": 0.05, # success threshold (m)
"reward_sigma": 0.5, # Gaussian reward width
"spawn_max_radius": 5.0, # max target distance (m)
},
)
Observation Format
SpacecraftObservation fields:
| Field | Shape | Description |
|---|---|---|
state_obs |
[6β8] | Task-specific error / velocity observations |
thruster_transforms |
[8 Γ 5] | [px, py, dx, dy, F_max] per thruster |
thruster_masks |
[8] | 1.0 = thruster present |
mass |
scalar | Platform mass (kg) |
inertia |
scalar | Moment of inertia (kgΒ·mΒ²) |
task |
str | Active task name |
reward |
scalar | Step reward β [0, 1] |
done |
bool | Episode ended |
info |
dict | Diagnostics (error values, goal_reached, step) |
Training an RL Agent
Three example scripts cover different training scenarios:
1. Sanity check β random agent (examples/random_agent.py)
First verify the server is reachable and the environment works:
# Start server (one terminal)
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --port 8000
# Run random agent (another terminal)
python examples/random_agent.py --task GoToPosition --episodes 5
2. PPO training β local, no server (examples/ppo_train.py)
Trains a MLP policy with PPO directly against RANSEnvironment (no HTTP
server required). Uses pure PyTorch β no additional RL library needed.
pip install torch gymnasium
# Train GoToPosition (300 k steps)
python examples/ppo_train.py --task GoToPosition --timesteps 300000
# Train GoToPose
python examples/ppo_train.py --task GoToPose --timesteps 500000
# Evaluate a saved checkpoint
python examples/ppo_train.py --eval --checkpoint rans_ppo_GoToPosition.pt \
--task GoToPosition --eval-episodes 20
Key hyper-parameters (all match the original RANS paper):
| Flag | Default | Description |
|---|---|---|
--n-steps |
2048 | Rollout length per update |
--n-epochs |
10 | PPO epochs per rollout |
--gamma |
0.99 | Discount factor |
--lam |
0.95 | GAE-Ξ» |
--clip-eps |
0.2 | PPO clipping |
--lr |
3e-4 | Adam learning rate |
3. Gymnasium wrapper β use with any RL library (examples/gymnasium_wrapper.py)
Wraps RANSEnvironment as a gymnasium.Env for compatibility with
Stable-Baselines3, CleanRL, RLlib, TorchRL, etc:
from examples.gymnasium_wrapper import make_rans_env
env = make_rans_env(task="GoToPosition")
print(env.observation_space) # Box(56,)
print(env.action_space) # Box(8,) β thruster activations in [0, 1]
# Stable-Baselines3
from stable_baselines3 import PPO, SAC
model = PPO("MlpPolicy", env, verbose=1, n_steps=2048)
model.learn(total_timesteps=500_000)
model.save("rans_sb3_ppo")
# Or SAC for off-policy training
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500_000)
4. Remote training via OpenEnv client (examples/openenv_client_train.py)
Train against a running Docker server using N concurrent WebSocket
sessions (the canonical OpenEnv pattern):
# Start server
docker run -e RANS_TASK=GoToPosition -p 8000:8000 rans-env
# Train with 4 parallel environment sessions
python examples/openenv_client_train.py --url http://localhost:8000 \
--n-envs 4 --episodes 50
Observation & action spaces
| Observation | Flat vector: [state_obs, thruster_transforms (flat), masks, mass, inertia] |
| Action | float32[8] β thruster activations β [0, 1] |
| Reward | Scalar β [0, 1] β exponential decay from target error |
| Done | True when goal reached or step limit hit |
Observation sizes by task:
| Task | state_obs |
total obs dim |
|---|---|---|
| GoToPosition | 6 | 56 |
| GoToPose | 7 | 57 |
| TrackLinearVelocity | 6 | 56 |
| TrackLinearAngularVelocity | 8 | 58 |
Tests
pip install -e ".[dev]"
pytest server/tests/ -v
Citation
@misc{elhariry2023rans,
title = {RANS: Highly-Parallelised Simulator for Reinforcement Learning
based Autonomous Navigating Spacecrafts},
author = {El-Hariry, Matteo and Richard, Antoine and Olivares-Mendez, Miguel},
year = {2023},
eprint = {2310.07393},
archivePrefix = {arXiv},
}