Spaces:

trl-internal-testing
/

openreward-echo-env

Sleeping

App Files Files Community

openreward-echo-env / README.md

AdithyaSK HF Staff

Initial echo env (TRL × OpenReward test fixture)

fc2f931 verified 25 days ago

preview code

raw

history blame contribute delete

1.91 kB

metadata

title: OpenReward Echo Env (TRL test fixture)
emoji: 🦜
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0

OpenReward Echo Env

Minimal Open Reward Standard environment used as the test fixture for trl.experimental.openreward.

The model is given a target string and must call echo(text=...) with exactly that string. Reward is 1.0 on match, 0.0 otherwise; the episode finishes on a correct echo.

Pure Python — no sandbox, no external state — so responses are deterministic and the env can run thousands of concurrent sessions on free-tier hardware.

Use

import os
os.environ["OPENREWARD_API_URL"]     = "https://trl-internal-testing-openreward-echo-env.hf.space"
os.environ["OPENREWARD_SESSION_URL"] = "https://trl-internal-testing-openreward-echo-env.hf.space"

from trl.experimental.openreward import OpenRewardSpec

spec = OpenRewardSpec(
    "https://trl-internal-testing-openreward-echo-env.hf.space",
    env_name="echoenvironment",
)
print(spec.train_dataset)  # 8 rows: target ∈ {"hello", "world", "trl", ...}

The two OPENREWARD_*_URL overrides are needed because the openreward SDK by default expects a two-subdomain platform layout (api.<host> + sessions.<host>); for a single-host self-hosted server both have to point at the same URL.

Tasks

split	count	shape
`train`	8	`{"id": "echo-N", "target": "<word>"}`

Local development

pip install -r requirements.txt
python server.py    # listens on PORT (default 8080)

Files

server.py — env definition + Server([EchoEnvironment]).run()
Dockerfile — python:3.11-slim + the deps; HF Spaces serves it on port 7860
requirements.txt — openreward, fastapi, uvicorn