Spaces:

dpang
/

rans-env

Sleeping

App Files Files Community

rans-env / README.md

dpang

Add HF Space URL to README

6cb4f35 verified 3 days ago

preview code

raw

history blame contribute delete

8.95 kB

metadata

title: RANS Spacecraft Navigation Environment
emoji: 🛸
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - reinforcement-learning
  - robotics
  - spacecraft

RANS — OpenEnv Environment

RANS: Reinforcement Learning based Autonomous Navigation for Spacecrafts

OpenEnv-compatible implementation of the paper:

El-Hariry, Richard, Olivares-Mendez (2023). "RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts." arXiv:2310.07393

Original GPU implementation (Isaac Gym): elharirymatteo/RANS

Live HuggingFace Space: https://huggingface.co/spaces/dpang/rans-env

Overview

This package wraps a pure-Python/NumPy 2-D spacecraft physics simulation (no Isaac Gym required) into an OpenEnv-compatible environment. The server can run inside a standard Docker container on CPU and exposes the standard OpenEnv HTTP/WebSocket API.

Supported Tasks

Task	Description	Obs size	Reward
`GoToPosition`	Reach target (x, y)	6	exp(−‖Δp‖²/2σ²)
`GoToPose`	Reach target (x, y, θ)	7	weighted position + heading
`TrackLinearVelocity`	Maintain (vx, vy)	6	exp(−‖Δv‖²/2σ²)
`TrackLinearAngularVelocity`	Maintain (vx, vy, ω)	8	weighted linear + angular

Spacecraft Model

Platform: 2-D rigid body (MFP2D — Modular Floating Platform)
State: [x, y, θ, vx, vy, ω]
Thrusters: 8-thruster default layout (configurable)
Action: continuous activation ∈ [0, 1] per thruster
Integration: Euler, 50 Hz (dt = 0.02 s)

Quick Start

Run locally (no Docker)

pip install -e ".[dev]"
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --host 0.0.0.0 --port 8000

Client usage (async)

import asyncio
from rans_env import RANSEnv, SpacecraftAction

async def main():
    async with RANSEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        print(f"Task: {obs.task}")
        print(f"Initial obs: {obs.state_obs}")

        n = len(obs.thruster_masks)  # 8 thrusters
        result = await env.step(SpacecraftAction(thrusters=[0.0] * n))
        print(f"Reward: {result.reward:.4f},  Done: {result.done}")

asyncio.run(main())

Client usage (synchronous)

from rans_env import RANSEnv, SpacecraftAction

with RANSEnv(base_url="http://localhost:8000").sync() as env:
    obs = env.reset()
    for _ in range(500):
        n = len(obs.thruster_masks)
        result = env.step(SpacecraftAction(thrusters=[0.5] * n))
        obs = result.observation
        if result.done:
            obs = env.reset()

Docker

# Build
docker build -f server/Dockerfile -t rans-env .

# Run GoToPose task
docker run -e RANS_TASK=GoToPose -p 8000:8000 rans-env

Project Structure

RANS/
├── __init__.py                  # Public API: RANSEnv, SpacecraftAction, ...
├── client.py                    # RANSEnv OpenEnv client
├── models.py                    # SpacecraftAction / Observation / State
├── openenv.yaml                 # OpenEnv environment manifest
├── pyproject.toml               # Package configuration
└── server/
    ├── app.py                   # FastAPI entry-point (create_app)
    ├── rans_environment.py      # RANSEnvironment (Environment subclass)
    ├── spacecraft_physics.py    # 2-D rigid-body dynamics (NumPy)
    ├── tasks/
    │   ├── base.py              # BaseTask ABC
    │   ├── go_to_position.py    # GoToPositionTask
    │   ├── go_to_pose.py        # GoToPoseTask
    │   ├── track_linear_velocity.py
    │   └── track_linear_angular_velocity.py
    ├── tests/
    │   ├── test_physics.py      # Physics unit tests
    │   ├── test_tasks.py        # Task unit tests
    │   └── test_environment.py  # Integration tests
    └── Dockerfile

Configuration

Environment variables (Docker / server)

Variable	Default	Description
`RANS_TASK`	`GoToPosition`	Task name
`RANS_MAX_STEPS`	`500`	Max steps per episode

Task hyper-parameters

Pass a dict to RANSEnvironment(task_config={...}):

env = RANSEnvironment(
    task="GoToPosition",
    task_config={
        "tolerance": 0.05,       # success threshold (m)
        "reward_sigma": 0.5,     # Gaussian reward width
        "spawn_max_radius": 5.0, # max target distance (m)
    },
)

Observation Format

SpacecraftObservation fields:

Field	Shape	Description
`state_obs`	[6–8]	Task-specific error / velocity observations
`thruster_transforms`	[8 × 5]	`[px, py, dx, dy, F_max]` per thruster
`thruster_masks`	[8]	1.0 = thruster present
`mass`	scalar	Platform mass (kg)
`inertia`	scalar	Moment of inertia (kg·m²)
`task`	str	Active task name
`reward`	scalar	Step reward ∈ [0, 1]
`done`	bool	Episode ended
`info`	dict	Diagnostics (error values, goal_reached, step)

Training an RL Agent

Three example scripts cover different training scenarios:

1. Sanity check — random agent (`examples/random_agent.py`)

First verify the server is reachable and the environment works:

# Start server (one terminal)
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --port 8000

# Run random agent (another terminal)
python examples/random_agent.py --task GoToPosition --episodes 5

2. PPO training — local, no server (`examples/ppo_train.py`)

Trains a MLP policy with PPO directly against RANSEnvironment (no HTTP server required). Uses pure PyTorch — no additional RL library needed.

pip install torch gymnasium

# Train GoToPosition (300 k steps)
python examples/ppo_train.py --task GoToPosition --timesteps 300000

# Train GoToPose
python examples/ppo_train.py --task GoToPose --timesteps 500000

# Evaluate a saved checkpoint
python examples/ppo_train.py --eval --checkpoint rans_ppo_GoToPosition.pt \
       --task GoToPosition --eval-episodes 20

Key hyper-parameters (all match the original RANS paper):

Flag	Default	Description
`--n-steps`	2048	Rollout length per update
`--n-epochs`	10	PPO epochs per rollout
`--gamma`	0.99	Discount factor
`--lam`	0.95	GAE-λ
`--clip-eps`	0.2	PPO clipping
`--lr`	3e-4	Adam learning rate

3. Gymnasium wrapper — use with any RL library (`examples/gymnasium_wrapper.py`)

Wraps RANSEnvironment as a gymnasium.Env for compatibility with Stable-Baselines3, CleanRL, RLlib, TorchRL, etc:

from examples.gymnasium_wrapper import make_rans_env

env = make_rans_env(task="GoToPosition")
print(env.observation_space)   # Box(56,)
print(env.action_space)        # Box(8,)  — thruster activations in [0, 1]

# Stable-Baselines3
from stable_baselines3 import PPO, SAC

model = PPO("MlpPolicy", env, verbose=1, n_steps=2048)
model.learn(total_timesteps=500_000)
model.save("rans_sb3_ppo")

# Or SAC for off-policy training
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500_000)

4. Remote training via OpenEnv client (`examples/openenv_client_train.py`)

Train against a running Docker server using N concurrent WebSocket sessions (the canonical OpenEnv pattern):

# Start server
docker run -e RANS_TASK=GoToPosition -p 8000:8000 rans-env

# Train with 4 parallel environment sessions
python examples/openenv_client_train.py --url http://localhost:8000 \
       --n-envs 4 --episodes 50

Observation & action spaces


Observation	Flat vector: `[state_obs, thruster_transforms (flat), masks, mass, inertia]`
Action	`float32[8]` — thruster activations ∈ [0, 1]
Reward	Scalar ∈ [0, 1] — exponential decay from target error
Done	`True` when goal reached or step limit hit

Observation sizes by task:

Task	`state_obs`	total obs dim
GoToPosition	6	56
GoToPose	7	57
TrackLinearVelocity	6	56
TrackLinearAngularVelocity	8	58

Tests

pip install -e ".[dev]"
pytest server/tests/ -v

Citation

@misc{elhariry2023rans,
  title   = {RANS: Highly-Parallelised Simulator for Reinforcement Learning
             based Autonomous Navigating Spacecrafts},
  author  = {El-Hariry, Matteo and Richard, Antoine and Olivares-Mendez, Miguel},
  year    = {2023},
  eprint  = {2310.07393},
  archivePrefix = {arXiv},
}