| --- |
| title: OpenReward Echo Env (TRL test fixture) |
| emoji: π¦ |
| colorFrom: blue |
| colorTo: purple |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| license: apache-2.0 |
| --- |
| |
| # OpenReward Echo Env |
|
|
| Minimal [Open Reward Standard](https://openrewardstandard.io) environment used as the test fixture for [`trl.experimental.openreward`](https://github.com/huggingface/trl/tree/main/trl/experimental/openreward). |
|
|
| The model is given a target string and must call `echo(text=...)` with exactly that string. Reward is `1.0` on match, `0.0` otherwise; the episode finishes on a correct echo. |
|
|
| Pure Python β no sandbox, no external state β so responses are deterministic and the env can run thousands of concurrent sessions on free-tier hardware. |
|
|
| ## Use |
|
|
| ```python |
| import os |
| os.environ["OPENREWARD_API_URL"] = "https://trl-internal-testing-openreward-echo-env.hf.space" |
| os.environ["OPENREWARD_SESSION_URL"] = "https://trl-internal-testing-openreward-echo-env.hf.space" |
| |
| from trl.experimental.openreward import OpenRewardSpec |
| |
| spec = OpenRewardSpec( |
| "https://trl-internal-testing-openreward-echo-env.hf.space", |
| env_name="echoenvironment", |
| ) |
| print(spec.train_dataset) # 8 rows: target β {"hello", "world", "trl", ...} |
| ``` |
|
|
| The two `OPENREWARD_*_URL` overrides are needed because the `openreward` SDK by default expects a two-subdomain platform layout (`api.<host>` + `sessions.<host>`); for a single-host self-hosted server both have to point at the same URL. |
|
|
| ## Tasks |
|
|
| | split | count | shape | |
| |---|---|---| |
| | `train` | 8 | `{"id": "echo-N", "target": "<word>"}` | |
|
|
| ## Local development |
|
|
| ```bash |
| pip install -r requirements.txt |
| python server.py # listens on PORT (default 8080) |
| ``` |
|
|
| ## Files |
|
|
| - `server.py` β env definition + `Server([EchoEnvironment]).run()` |
| - `Dockerfile` β `python:3.11-slim` + the deps; HF Spaces serves it on port 7860 |
| - `requirements.txt` β `openreward`, `fastapi`, `uvicorn` |
|
|