--- title: RANS Spacecraft Navigation Environment emoji: ๐Ÿ›ธ colorFrom: indigo colorTo: blue sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv - reinforcement-learning - robotics - spacecraft --- # RANS โ€” OpenEnv Environment **RANS: Reinforcement Learning based Autonomous Navigation for Spacecrafts** OpenEnv-compatible implementation of the paper: > El-Hariry, Richard, Olivares-Mendez (2023). > *"RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts."* > [arXiv:2310.07393](https://arxiv.org/abs/2310.07393) Original GPU implementation (Isaac Gym): [elharirymatteo/RANS](https://github.com/elharirymatteo/RANS) **Live HuggingFace Space:** https://huggingface.co/spaces/dpang/rans-env --- ## Overview This package wraps a pure-Python/NumPy 2-D spacecraft physics simulation (no Isaac Gym required) into an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible environment. The server can run inside a standard Docker container on CPU and exposes the standard OpenEnv HTTP/WebSocket API. ### Supported Tasks | Task | Description | Obs size | Reward | |------|-------------|----------|--------| | `GoToPosition` | Reach target (x, y) | 6 | exp(โˆ’โ€–ฮ”pโ€–ยฒ/2ฯƒยฒ) | | `GoToPose` | Reach target (x, y, ฮธ) | 7 | weighted position + heading | | `TrackLinearVelocity` | Maintain (vx, vy) | 6 | exp(โˆ’โ€–ฮ”vโ€–ยฒ/2ฯƒยฒ) | | `TrackLinearAngularVelocity` | Maintain (vx, vy, ฯ‰) | 8 | weighted linear + angular | ### Spacecraft Model - **Platform**: 2-D rigid body (MFP2D โ€” Modular Floating Platform) - **State**: `[x, y, ฮธ, vx, vy, ฯ‰]` - **Thrusters**: 8-thruster default layout (configurable) - **Action**: continuous activation โˆˆ [0, 1] per thruster - **Integration**: Euler, 50 Hz (dt = 0.02 s) --- ## Quick Start ### Run locally (no Docker) ```bash pip install -e ".[dev]" RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --host 0.0.0.0 --port 8000 ``` ### Client usage (async) ```python import asyncio from rans_env import RANSEnv, SpacecraftAction async def main(): async with RANSEnv(base_url="http://localhost:8000") as env: obs = await env.reset() print(f"Task: {obs.task}") print(f"Initial obs: {obs.state_obs}") n = len(obs.thruster_masks) # 8 thrusters result = await env.step(SpacecraftAction(thrusters=[0.0] * n)) print(f"Reward: {result.reward:.4f}, Done: {result.done}") asyncio.run(main()) ``` ### Client usage (synchronous) ```python from rans_env import RANSEnv, SpacecraftAction with RANSEnv(base_url="http://localhost:8000").sync() as env: obs = env.reset() for _ in range(500): n = len(obs.thruster_masks) result = env.step(SpacecraftAction(thrusters=[0.5] * n)) obs = result.observation if result.done: obs = env.reset() ``` ### Docker ```bash # Build docker build -f server/Dockerfile -t rans-env . # Run GoToPose task docker run -e RANS_TASK=GoToPose -p 8000:8000 rans-env ``` --- ## Project Structure ``` RANS/ โ”œโ”€โ”€ __init__.py # Public API: RANSEnv, SpacecraftAction, ... โ”œโ”€โ”€ client.py # RANSEnv OpenEnv client โ”œโ”€โ”€ models.py # SpacecraftAction / Observation / State โ”œโ”€โ”€ openenv.yaml # OpenEnv environment manifest โ”œโ”€โ”€ pyproject.toml # Package configuration โ””โ”€โ”€ server/ โ”œโ”€โ”€ app.py # FastAPI entry-point (create_app) โ”œโ”€โ”€ rans_environment.py # RANSEnvironment (Environment subclass) โ”œโ”€โ”€ spacecraft_physics.py # 2-D rigid-body dynamics (NumPy) โ”œโ”€โ”€ tasks/ โ”‚ โ”œโ”€โ”€ base.py # BaseTask ABC โ”‚ โ”œโ”€โ”€ go_to_position.py # GoToPositionTask โ”‚ โ”œโ”€โ”€ go_to_pose.py # GoToPoseTask โ”‚ โ”œโ”€โ”€ track_linear_velocity.py โ”‚ โ””โ”€โ”€ track_linear_angular_velocity.py โ”œโ”€โ”€ tests/ โ”‚ โ”œโ”€โ”€ test_physics.py # Physics unit tests โ”‚ โ”œโ”€โ”€ test_tasks.py # Task unit tests โ”‚ โ””โ”€โ”€ test_environment.py # Integration tests โ””โ”€โ”€ Dockerfile ``` --- ## Configuration ### Environment variables (Docker / server) | Variable | Default | Description | |----------|---------|-------------| | `RANS_TASK` | `GoToPosition` | Task name | | `RANS_MAX_STEPS` | `500` | Max steps per episode | ### Task hyper-parameters Pass a dict to `RANSEnvironment(task_config={...})`: ```python env = RANSEnvironment( task="GoToPosition", task_config={ "tolerance": 0.05, # success threshold (m) "reward_sigma": 0.5, # Gaussian reward width "spawn_max_radius": 5.0, # max target distance (m) }, ) ``` --- ## Observation Format `SpacecraftObservation` fields: | Field | Shape | Description | |-------|-------|-------------| | `state_obs` | [6โ€“8] | Task-specific error / velocity observations | | `thruster_transforms` | [8 ร— 5] | `[px, py, dx, dy, F_max]` per thruster | | `thruster_masks` | [8] | 1.0 = thruster present | | `mass` | scalar | Platform mass (kg) | | `inertia` | scalar | Moment of inertia (kgยทmยฒ) | | `task` | str | Active task name | | `reward` | scalar | Step reward โˆˆ [0, 1] | | `done` | bool | Episode ended | | `info` | dict | Diagnostics (error values, goal_reached, step) | --- ## Training an RL Agent Three example scripts cover different training scenarios: ### 1. Sanity check โ€” random agent (`examples/random_agent.py`) First verify the server is reachable and the environment works: ```bash # Start server (one terminal) RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --port 8000 # Run random agent (another terminal) python examples/random_agent.py --task GoToPosition --episodes 5 ``` ### 2. PPO training โ€” local, no server (`examples/ppo_train.py`) Trains a MLP policy with PPO directly against `RANSEnvironment` (no HTTP server required). Uses pure PyTorch โ€” no additional RL library needed. ```bash pip install torch gymnasium # Train GoToPosition (300 k steps) python examples/ppo_train.py --task GoToPosition --timesteps 300000 # Train GoToPose python examples/ppo_train.py --task GoToPose --timesteps 500000 # Evaluate a saved checkpoint python examples/ppo_train.py --eval --checkpoint rans_ppo_GoToPosition.pt \ --task GoToPosition --eval-episodes 20 ``` Key hyper-parameters (all match the original RANS paper): | Flag | Default | Description | |------|---------|-------------| | `--n-steps` | 2048 | Rollout length per update | | `--n-epochs` | 10 | PPO epochs per rollout | | `--gamma` | 0.99 | Discount factor | | `--lam` | 0.95 | GAE-ฮป | | `--clip-eps` | 0.2 | PPO clipping | | `--lr` | 3e-4 | Adam learning rate | ### 3. Gymnasium wrapper โ€” use with any RL library (`examples/gymnasium_wrapper.py`) Wraps `RANSEnvironment` as a `gymnasium.Env` for compatibility with Stable-Baselines3, CleanRL, RLlib, TorchRL, etc: ```python from examples.gymnasium_wrapper import make_rans_env env = make_rans_env(task="GoToPosition") print(env.observation_space) # Box(56,) print(env.action_space) # Box(8,) โ€” thruster activations in [0, 1] # Stable-Baselines3 from stable_baselines3 import PPO, SAC model = PPO("MlpPolicy", env, verbose=1, n_steps=2048) model.learn(total_timesteps=500_000) model.save("rans_sb3_ppo") # Or SAC for off-policy training model = SAC("MlpPolicy", env, verbose=1) model.learn(total_timesteps=500_000) ``` ### 4. Remote training via OpenEnv client (`examples/openenv_client_train.py`) Train against a running Docker server using `N` concurrent WebSocket sessions (the canonical OpenEnv pattern): ```bash # Start server docker run -e RANS_TASK=GoToPosition -p 8000:8000 rans-env # Train with 4 parallel environment sessions python examples/openenv_client_train.py --url http://localhost:8000 \ --n-envs 4 --episodes 50 ``` ### Observation & action spaces | | | |---|---| | **Observation** | Flat vector: `[state_obs, thruster_transforms (flat), masks, mass, inertia]` | | **Action** | `float32[8]` โ€” thruster activations โˆˆ [0, 1] | | **Reward** | Scalar โˆˆ [0, 1] โ€” exponential decay from target error | | **Done** | `True` when goal reached **or** step limit hit | Observation sizes by task: | Task | `state_obs` | total obs dim | |------|------------|---------------| | GoToPosition | 6 | 56 | | GoToPose | 7 | 57 | | TrackLinearVelocity | 6 | 56 | | TrackLinearAngularVelocity | 8 | 58 | --- ## Tests ```bash pip install -e ".[dev]" pytest server/tests/ -v ``` --- ## Citation ```bibtex @misc{elhariry2023rans, title = {RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts}, author = {El-Hariry, Matteo and Richard, Antoine and Olivares-Mendez, Miguel}, year = {2023}, eprint = {2310.07393}, archivePrefix = {arXiv}, } ```