rans-env / README.md
dpang's picture
Add HF Space URL to README
6cb4f35 verified
---
title: RANS Spacecraft Navigation Environment
emoji: πŸ›Έ
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- reinforcement-learning
- robotics
- spacecraft
---
# RANS β€” OpenEnv Environment
**RANS: Reinforcement Learning based Autonomous Navigation for Spacecrafts**
OpenEnv-compatible implementation of the paper:
> El-Hariry, Richard, Olivares-Mendez (2023).
> *"RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts."*
> [arXiv:2310.07393](https://arxiv.org/abs/2310.07393)
Original GPU implementation (Isaac Gym): [elharirymatteo/RANS](https://github.com/elharirymatteo/RANS)
**Live HuggingFace Space:** https://huggingface.co/spaces/dpang/rans-env
---
## Overview
This package wraps a pure-Python/NumPy 2-D spacecraft physics simulation (no Isaac Gym required) into an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible environment. The server can run inside a standard Docker container on CPU and exposes the standard OpenEnv HTTP/WebSocket API.
### Supported Tasks
| Task | Description | Obs size | Reward |
|------|-------------|----------|--------|
| `GoToPosition` | Reach target (x, y) | 6 | exp(βˆ’β€–Ξ”pβ€–Β²/2σ²) |
| `GoToPose` | Reach target (x, y, ΞΈ) | 7 | weighted position + heading |
| `TrackLinearVelocity` | Maintain (vx, vy) | 6 | exp(βˆ’β€–Ξ”vβ€–Β²/2σ²) |
| `TrackLinearAngularVelocity` | Maintain (vx, vy, Ο‰) | 8 | weighted linear + angular |
### Spacecraft Model
- **Platform**: 2-D rigid body (MFP2D β€” Modular Floating Platform)
- **State**: `[x, y, ΞΈ, vx, vy, Ο‰]`
- **Thrusters**: 8-thruster default layout (configurable)
- **Action**: continuous activation ∈ [0, 1] per thruster
- **Integration**: Euler, 50 Hz (dt = 0.02 s)
---
## Quick Start
### Run locally (no Docker)
```bash
pip install -e ".[dev]"
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --host 0.0.0.0 --port 8000
```
### Client usage (async)
```python
import asyncio
from rans_env import RANSEnv, SpacecraftAction
async def main():
async with RANSEnv(base_url="http://localhost:8000") as env:
obs = await env.reset()
print(f"Task: {obs.task}")
print(f"Initial obs: {obs.state_obs}")
n = len(obs.thruster_masks) # 8 thrusters
result = await env.step(SpacecraftAction(thrusters=[0.0] * n))
print(f"Reward: {result.reward:.4f}, Done: {result.done}")
asyncio.run(main())
```
### Client usage (synchronous)
```python
from rans_env import RANSEnv, SpacecraftAction
with RANSEnv(base_url="http://localhost:8000").sync() as env:
obs = env.reset()
for _ in range(500):
n = len(obs.thruster_masks)
result = env.step(SpacecraftAction(thrusters=[0.5] * n))
obs = result.observation
if result.done:
obs = env.reset()
```
### Docker
```bash
# Build
docker build -f server/Dockerfile -t rans-env .
# Run GoToPose task
docker run -e RANS_TASK=GoToPose -p 8000:8000 rans-env
```
---
## Project Structure
```
RANS/
β”œβ”€β”€ __init__.py # Public API: RANSEnv, SpacecraftAction, ...
β”œβ”€β”€ client.py # RANSEnv OpenEnv client
β”œβ”€β”€ models.py # SpacecraftAction / Observation / State
β”œβ”€β”€ openenv.yaml # OpenEnv environment manifest
β”œβ”€β”€ pyproject.toml # Package configuration
└── server/
β”œβ”€β”€ app.py # FastAPI entry-point (create_app)
β”œβ”€β”€ rans_environment.py # RANSEnvironment (Environment subclass)
β”œβ”€β”€ spacecraft_physics.py # 2-D rigid-body dynamics (NumPy)
β”œβ”€β”€ tasks/
β”‚ β”œβ”€β”€ base.py # BaseTask ABC
β”‚ β”œβ”€β”€ go_to_position.py # GoToPositionTask
β”‚ β”œβ”€β”€ go_to_pose.py # GoToPoseTask
β”‚ β”œβ”€β”€ track_linear_velocity.py
β”‚ └── track_linear_angular_velocity.py
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ test_physics.py # Physics unit tests
β”‚ β”œβ”€β”€ test_tasks.py # Task unit tests
β”‚ └── test_environment.py # Integration tests
└── Dockerfile
```
---
## Configuration
### Environment variables (Docker / server)
| Variable | Default | Description |
|----------|---------|-------------|
| `RANS_TASK` | `GoToPosition` | Task name |
| `RANS_MAX_STEPS` | `500` | Max steps per episode |
### Task hyper-parameters
Pass a dict to `RANSEnvironment(task_config={...})`:
```python
env = RANSEnvironment(
task="GoToPosition",
task_config={
"tolerance": 0.05, # success threshold (m)
"reward_sigma": 0.5, # Gaussian reward width
"spawn_max_radius": 5.0, # max target distance (m)
},
)
```
---
## Observation Format
`SpacecraftObservation` fields:
| Field | Shape | Description |
|-------|-------|-------------|
| `state_obs` | [6–8] | Task-specific error / velocity observations |
| `thruster_transforms` | [8 Γ— 5] | `[px, py, dx, dy, F_max]` per thruster |
| `thruster_masks` | [8] | 1.0 = thruster present |
| `mass` | scalar | Platform mass (kg) |
| `inertia` | scalar | Moment of inertia (kgΒ·mΒ²) |
| `task` | str | Active task name |
| `reward` | scalar | Step reward ∈ [0, 1] |
| `done` | bool | Episode ended |
| `info` | dict | Diagnostics (error values, goal_reached, step) |
---
## Training an RL Agent
Three example scripts cover different training scenarios:
### 1. Sanity check β€” random agent (`examples/random_agent.py`)
First verify the server is reachable and the environment works:
```bash
# Start server (one terminal)
RANS_TASK=GoToPosition uvicorn rans_env.server.app:app --port 8000
# Run random agent (another terminal)
python examples/random_agent.py --task GoToPosition --episodes 5
```
### 2. PPO training β€” local, no server (`examples/ppo_train.py`)
Trains a MLP policy with PPO directly against `RANSEnvironment` (no HTTP
server required). Uses pure PyTorch β€” no additional RL library needed.
```bash
pip install torch gymnasium
# Train GoToPosition (300 k steps)
python examples/ppo_train.py --task GoToPosition --timesteps 300000
# Train GoToPose
python examples/ppo_train.py --task GoToPose --timesteps 500000
# Evaluate a saved checkpoint
python examples/ppo_train.py --eval --checkpoint rans_ppo_GoToPosition.pt \
--task GoToPosition --eval-episodes 20
```
Key hyper-parameters (all match the original RANS paper):
| Flag | Default | Description |
|------|---------|-------------|
| `--n-steps` | 2048 | Rollout length per update |
| `--n-epochs` | 10 | PPO epochs per rollout |
| `--gamma` | 0.99 | Discount factor |
| `--lam` | 0.95 | GAE-Ξ» |
| `--clip-eps` | 0.2 | PPO clipping |
| `--lr` | 3e-4 | Adam learning rate |
### 3. Gymnasium wrapper β€” use with any RL library (`examples/gymnasium_wrapper.py`)
Wraps `RANSEnvironment` as a `gymnasium.Env` for compatibility with
Stable-Baselines3, CleanRL, RLlib, TorchRL, etc:
```python
from examples.gymnasium_wrapper import make_rans_env
env = make_rans_env(task="GoToPosition")
print(env.observation_space) # Box(56,)
print(env.action_space) # Box(8,) β€” thruster activations in [0, 1]
# Stable-Baselines3
from stable_baselines3 import PPO, SAC
model = PPO("MlpPolicy", env, verbose=1, n_steps=2048)
model.learn(total_timesteps=500_000)
model.save("rans_sb3_ppo")
# Or SAC for off-policy training
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500_000)
```
### 4. Remote training via OpenEnv client (`examples/openenv_client_train.py`)
Train against a running Docker server using `N` concurrent WebSocket
sessions (the canonical OpenEnv pattern):
```bash
# Start server
docker run -e RANS_TASK=GoToPosition -p 8000:8000 rans-env
# Train with 4 parallel environment sessions
python examples/openenv_client_train.py --url http://localhost:8000 \
--n-envs 4 --episodes 50
```
### Observation & action spaces
| | |
|---|---|
| **Observation** | Flat vector: `[state_obs, thruster_transforms (flat), masks, mass, inertia]` |
| **Action** | `float32[8]` β€” thruster activations ∈ [0, 1] |
| **Reward** | Scalar ∈ [0, 1] β€” exponential decay from target error |
| **Done** | `True` when goal reached **or** step limit hit |
Observation sizes by task:
| Task | `state_obs` | total obs dim |
|------|------------|---------------|
| GoToPosition | 6 | 56 |
| GoToPose | 7 | 57 |
| TrackLinearVelocity | 6 | 56 |
| TrackLinearAngularVelocity | 8 | 58 |
---
## Tests
```bash
pip install -e ".[dev]"
pytest server/tests/ -v
```
---
## Citation
```bibtex
@misc{elhariry2023rans,
title = {RANS: Highly-Parallelised Simulator for Reinforcement Learning
based Autonomous Navigating Spacecrafts},
author = {El-Hariry, Matteo and Richard, Antoine and Olivares-Mendez, Miguel},
year = {2023},
eprint = {2310.07393},
archivePrefix = {arXiv},
}
```