rans-env / README.md
dpang's picture
Add HF Space URL to README
6cb4f35 verified
metadata
title: RANS Spacecraft Navigation Environment
emoji: πŸ›Έ
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - reinforcement-learning
  - robotics
  - spacecraft

RANS β€” OpenEnv Environment

RANS: Reinforcement Learning based Autonomous Navigation for Spacecrafts

OpenEnv-compatible implementation of the paper:

El-Hariry, Richard, Olivares-Mendez (2023). "RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts." arXiv:2310.07393

Original GPU implementation (Isaac Gym): elharirymatteo/RANS

Live HuggingFace Space: https://huggingface.co/spaces/dpang/rans-env


Overview

This package wraps a pure-Python/NumPy 2-D spacecraft physics simulation (no Isaac Gym required) into an OpenEnv-compatible environment. The server can run inside a standard Docker container on CPU and exposes the standard OpenEnv HTTP/WebSocket API.

Supported Tasks

Task Description Obs size Reward
GoToPosition Reach target (x, y) 6 exp(βˆ’β€–Ξ”pβ€–Β²/2σ²)
GoToPose Reach target (x, y, ΞΈ) 7 weighted position + heading
TrackLinearVelocity Maintain (vx, vy) 6 exp(βˆ’β€–Ξ”vβ€–Β²/2σ²)
TrackLinearAngularVelocity Maintain (vx, vy, Ο‰) 8 weighted linear + angular

Spacecraft Model

  • Platform: 2-D rigid body (MFP2D β€” Modular Floating Platform)
  • State: [x, y, ΞΈ, vx, vy, Ο‰]
  • Thrusters: 8-thruster default layout (configurable)
  • Action: continuous activation ∈ [0, 1] per thruster
  • Integration: Euler, 50 Hz (dt = 0.02 s)

Quick Start

Run locally (no Docker)

pip install -e ".[dev]"
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --host 0.0.0.0 --port 8000

Client usage (async)

import asyncio
from rans_env import RANSEnv, SpacecraftAction

async def main():
    async with RANSEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        print(f"Task: {obs.task}")
        print(f"Initial obs: {obs.state_obs}")

        n = len(obs.thruster_masks)  # 8 thrusters
        result = await env.step(SpacecraftAction(thrusters=[0.0] * n))
        print(f"Reward: {result.reward:.4f},  Done: {result.done}")

asyncio.run(main())

Client usage (synchronous)

from rans_env import RANSEnv, SpacecraftAction

with RANSEnv(base_url="http://localhost:8000").sync() as env:
    obs = env.reset()
    for _ in range(500):
        n = len(obs.thruster_masks)
        result = env.step(SpacecraftAction(thrusters=[0.5] * n))
        obs = result.observation
        if result.done:
            obs = env.reset()

Docker

# Build
docker build -f server/Dockerfile -t rans-env .

# Run GoToPose task
docker run -e RANS_TASK=GoToPose -p 8000:8000 rans-env

Project Structure

RANS/
β”œβ”€β”€ __init__.py                  # Public API: RANSEnv, SpacecraftAction, ...
β”œβ”€β”€ client.py                    # RANSEnv OpenEnv client
β”œβ”€β”€ models.py                    # SpacecraftAction / Observation / State
β”œβ”€β”€ openenv.yaml                 # OpenEnv environment manifest
β”œβ”€β”€ pyproject.toml               # Package configuration
└── server/
    β”œβ”€β”€ app.py                   # FastAPI entry-point (create_app)
    β”œβ”€β”€ rans_environment.py      # RANSEnvironment (Environment subclass)
    β”œβ”€β”€ spacecraft_physics.py    # 2-D rigid-body dynamics (NumPy)
    β”œβ”€β”€ tasks/
    β”‚   β”œβ”€β”€ base.py              # BaseTask ABC
    β”‚   β”œβ”€β”€ go_to_position.py    # GoToPositionTask
    β”‚   β”œβ”€β”€ go_to_pose.py        # GoToPoseTask
    β”‚   β”œβ”€β”€ track_linear_velocity.py
    β”‚   └── track_linear_angular_velocity.py
    β”œβ”€β”€ tests/
    β”‚   β”œβ”€β”€ test_physics.py      # Physics unit tests
    β”‚   β”œβ”€β”€ test_tasks.py        # Task unit tests
    β”‚   └── test_environment.py  # Integration tests
    └── Dockerfile

Configuration

Environment variables (Docker / server)

Variable Default Description
RANS_TASK GoToPosition Task name
RANS_MAX_STEPS 500 Max steps per episode

Task hyper-parameters

Pass a dict to RANSEnvironment(task_config={...}):

env = RANSEnvironment(
    task="GoToPosition",
    task_config={
        "tolerance": 0.05,       # success threshold (m)
        "reward_sigma": 0.5,     # Gaussian reward width
        "spawn_max_radius": 5.0, # max target distance (m)
    },
)

Observation Format

SpacecraftObservation fields:

Field Shape Description
state_obs [6–8] Task-specific error / velocity observations
thruster_transforms [8 Γ— 5] [px, py, dx, dy, F_max] per thruster
thruster_masks [8] 1.0 = thruster present
mass scalar Platform mass (kg)
inertia scalar Moment of inertia (kgΒ·mΒ²)
task str Active task name
reward scalar Step reward ∈ [0, 1]
done bool Episode ended
info dict Diagnostics (error values, goal_reached, step)

Training an RL Agent

Three example scripts cover different training scenarios:

1. Sanity check β€” random agent (examples/random_agent.py)

First verify the server is reachable and the environment works:

# Start server (one terminal)
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --port 8000

# Run random agent (another terminal)
python examples/random_agent.py --task GoToPosition --episodes 5

2. PPO training β€” local, no server (examples/ppo_train.py)

Trains a MLP policy with PPO directly against RANSEnvironment (no HTTP server required). Uses pure PyTorch β€” no additional RL library needed.

pip install torch gymnasium

# Train GoToPosition (300 k steps)
python examples/ppo_train.py --task GoToPosition --timesteps 300000

# Train GoToPose
python examples/ppo_train.py --task GoToPose --timesteps 500000

# Evaluate a saved checkpoint
python examples/ppo_train.py --eval --checkpoint rans_ppo_GoToPosition.pt \
       --task GoToPosition --eval-episodes 20

Key hyper-parameters (all match the original RANS paper):

Flag Default Description
--n-steps 2048 Rollout length per update
--n-epochs 10 PPO epochs per rollout
--gamma 0.99 Discount factor
--lam 0.95 GAE-Ξ»
--clip-eps 0.2 PPO clipping
--lr 3e-4 Adam learning rate

3. Gymnasium wrapper β€” use with any RL library (examples/gymnasium_wrapper.py)

Wraps RANSEnvironment as a gymnasium.Env for compatibility with Stable-Baselines3, CleanRL, RLlib, TorchRL, etc:

from examples.gymnasium_wrapper import make_rans_env

env = make_rans_env(task="GoToPosition")
print(env.observation_space)   # Box(56,)
print(env.action_space)        # Box(8,)  β€” thruster activations in [0, 1]

# Stable-Baselines3
from stable_baselines3 import PPO, SAC

model = PPO("MlpPolicy", env, verbose=1, n_steps=2048)
model.learn(total_timesteps=500_000)
model.save("rans_sb3_ppo")

# Or SAC for off-policy training
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500_000)

4. Remote training via OpenEnv client (examples/openenv_client_train.py)

Train against a running Docker server using N concurrent WebSocket sessions (the canonical OpenEnv pattern):

# Start server
docker run -e RANS_TASK=GoToPosition -p 8000:8000 rans-env

# Train with 4 parallel environment sessions
python examples/openenv_client_train.py --url http://localhost:8000 \
       --n-envs 4 --episodes 50

Observation & action spaces

Observation Flat vector: [state_obs, thruster_transforms (flat), masks, mass, inertia]
Action float32[8] β€” thruster activations ∈ [0, 1]
Reward Scalar ∈ [0, 1] β€” exponential decay from target error
Done True when goal reached or step limit hit

Observation sizes by task:

Task state_obs total obs dim
GoToPosition 6 56
GoToPose 7 57
TrackLinearVelocity 6 56
TrackLinearAngularVelocity 8 58

Tests

pip install -e ".[dev]"
pytest server/tests/ -v

Citation

@misc{elhariry2023rans,
  title   = {RANS: Highly-Parallelised Simulator for Reinforcement Learning
             based Autonomous Navigating Spacecrafts},
  author  = {El-Hariry, Matteo and Richard, Antoine and Olivares-Mendez, Miguel},
  year    = {2023},
  eprint  = {2310.07393},
  archivePrefix = {arXiv},
}