Spaces:
Sleeping
Sleeping
| title: REPL Environment Server | |
| emoji: 🎮 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv-0.2.2 | |
| - openenv | |
| ## Hugging Face Space Deployment | |
| This Space is built from OpenEnv environment `repl_env`. | |
| - Space URL: `https://huggingface.co/spaces/openenv/repl_env-0.2.2` | |
| - OpenEnv pinned ref: `0.2.2` | |
| - Hub tag: `openenv` | |
| ### Connecting from Code | |
| ```python | |
| from envs.repl_env import Env | |
| env = Env(base_url="https://huggingface.co/spaces/openenv/repl_env-0.2.2") | |
| ``` | |
| # REPL Environment for OpenEnv | |
| `repl_env` is an OpenEnv-native Python REPL environment for Recursive Language Model style execution. It now follows the current OpenEnv client/server conventions: | |
| - `REPLEnv` is the remote async `EnvClient` | |
| - `.sync()` is the sync wrapper for remote usage | |
| - `LocalREPLEnv` is the explicit in-process helper | |
| - `LocalRLMRunner` is the higher-level orchestration loop for local recursive RLM runs | |
| The architecture is intentionally split the same way the official `rlm` and DSPy implementations split things: | |
| - the environment executes code and exposes tools | |
| - the runner owns the iterative prompting loop | |
| - recursive behavior lives in backend/controller modules, not in the executor | |
| ## Overview | |
| Inside the REPL, the model can: | |
| - inspect `context` | |
| - execute Python code across multiple turns with persistent state | |
| - call `llm_query(...)` and `llm_query_batched(...)` | |
| - call `rlm_query(...)` and `rlm_query_batched(...)` for recursive child runs when configured | |
| - finish with `FINAL(...)`, `FINAL_VAR(...)`, or `answer = {"content": ..., "ready": True}` | |
| ## Current Architecture | |
| Main modules: | |
| - [`client.py`](client.py): remote async OpenEnv client | |
| - [`local.py`](local.py): explicit in-process local env helper | |
| - [`runner.py`](runner.py): local RLM orchestration loop | |
| - [`recursive_backends.py`](recursive_backends.py): direct and recursive backend implementations | |
| - [`recursive_controller.py`](recursive_controller.py): server-side backend/broker composition | |
| - [`rubrics.py`](rubrics.py): reward rubrics (OpenEnv RFC 004) | |
| - [`server/repl_environment.py`](server/repl_environment.py): server-side execution environment | |
| - [`server/app.py`](server/app.py): OpenEnv HTTP server app and env factory | |
| ## What Works Today | |
| - Standard remote OpenEnv usage through `REPLEnv` | |
| - Local in-process execution through `LocalREPLEnv` | |
| - Local recursive RLM runs through `LocalRLMRunner` | |
| - Server-backed recursive calls through the current controller/broker path | |
| - Explicit recursion controls: | |
| - `max_depth` | |
| - `max_children_total` | |
| - `max_children_per_batch` | |
| - `per_child_timeout_s` | |
| - `result_truncation_limit` | |
| - Lightweight child trace metadata on local runner results | |
| - Rubric-based rewards (OpenEnv RFC 004): | |
| - `ExactMatchRubric`: binary outcome reward against ground truth | |
| - `FuzzyMatchRubric`: partial credit for containment matches | |
| - `CustomMetricRubric`: user-provided `metric(expected, predicted) -> float` | |
| - `CodeExecutionRubric`: per-step process reward for code errors | |
| - `REPLRubric`: composite rubric combining outcome + process | |
| - Ground truth injectable at reset via `expected_answer` | |
| ## Rewards | |
| Rewards follow the OpenEnv Rubric system (RFC 004). The environment uses | |
| `REPLRubric` by default, which combines: | |
| - **Outcome reward** (on terminal steps): compares `final_answer` against | |
| `expected_answer` if provided. Returns 1.0 for match, 0.0 otherwise. | |
| - **Process reward** (on non-terminal steps): returns -0.05 for code | |
| execution errors, 0.0 for successful steps. | |
| - **Failure reward**: returns -0.1 when max iterations exhausted without an answer. | |
| For RL training (GRPO, etc.), pass `expected_answer` at reset time: | |
| ```python | |
| with LocalREPLEnv() as env: | |
| env.reset( | |
| context="...", | |
| task_prompt="...", | |
| expected_answer="42", # ground truth for rubric scoring | |
| ) | |
| result = env.execute("print(FINAL(42))") | |
| print(result.reward) # 1.0 (correct) | |
| ``` | |
| Custom rubrics can be injected at construction: | |
| ```python | |
| from repl_env import LocalREPLEnv, CustomMetricRubric, REPLRubric | |
| def my_metric(expected, predicted): | |
| return 1.0 if expected.strip() == predicted.strip() else 0.0 | |
| env = LocalREPLEnv(rubric=REPLRubric(outcome=CustomMetricRubric(my_metric))) | |
| ``` | |
| ## Quick Start | |
| ### Remote Server Usage | |
| Async: | |
| ```python | |
| import asyncio | |
| from repl_env import REPLEnv | |
| async def main(): | |
| async with REPLEnv(base_url="http://127.0.0.1:8000") as env: | |
| result = await env.reset( | |
| context="alpha beta gamma", | |
| task_prompt="Count the words", | |
| ) | |
| result = await env.execute("count = len(context.split())") | |
| result = await env.execute("print(FINAL(count))") | |
| print(result.done) | |
| asyncio.run(main()) | |
| ``` | |
| Sync: | |
| ```python | |
| from repl_env import REPLEnv | |
| with REPLEnv(base_url="http://127.0.0.1:8000").sync() as env: | |
| result = env.reset( | |
| context="alpha beta gamma", | |
| task_prompt="Count the words", | |
| ) | |
| result = env.execute("count = len(context.split())") | |
| result = env.execute("print(FINAL(count))") | |
| print(result.observation.result.stdout) | |
| ``` | |
| ### Local Environment Usage | |
| ```python | |
| from repl_env import LocalREPLEnv | |
| with LocalREPLEnv() as env: | |
| result = env.reset( | |
| context="The quick brown fox jumps over the lazy dog", | |
| task_prompt="Count the words", | |
| ) | |
| result = env.execute("count = len(context.split())") | |
| result = env.execute("print(FINAL(count))") | |
| print(env.state().final_answer) | |
| ``` | |
| ### Local Recursive RLM Usage | |
| `LocalRLMRunner` takes any `chat_fn(messages, model=None) -> str`. It works | |
| with HF Inference API, vLLM, SGLang, Ollama, or any OpenAI-compatible server. | |
| With HF Inference API: | |
| ```python | |
| from huggingface_hub import InferenceClient | |
| from repl_env import LocalRLMRunner, RLM_SYSTEM_PROMPT | |
| client = InferenceClient(model="Qwen/Qwen3.5-9B", timeout=300) | |
| def chat_fn(messages, model=None): | |
| response = client.chat.completions.create( | |
| model=model or "Qwen/Qwen3.5-9B", | |
| messages=messages, | |
| max_tokens=2048, | |
| temperature=0.6, | |
| extra_body={"chat_template_kwargs": {"enable_thinking": False}}, | |
| ) | |
| return response.choices[0].message.content | |
| runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2) | |
| result = runner.run("The answer is 42", "What number is mentioned?") | |
| print(result.final_answer) | |
| ``` | |
| With a local vLLM server: | |
| ```python | |
| from openai import OpenAI | |
| from repl_env import LocalRLMRunner | |
| client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused") | |
| def chat_fn(messages, model=None): | |
| response = client.chat.completions.create( | |
| model=model or "Qwen/Qwen3.5-9B", | |
| messages=messages, | |
| max_tokens=2048, | |
| temperature=0.6, | |
| ) | |
| return response.choices[0].message.content | |
| runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2) | |
| result = runner.run(context, task) | |
| ``` | |
| ### Using Different Models for Outer and Inner Loops | |
| The outer loop (code generation) can use a large model while inner | |
| `llm_query`/`rlm_query` calls use a smaller, faster model. Pass a | |
| custom `backend_factory` to the runner: | |
| ```python | |
| from openai import OpenAI | |
| from huggingface_hub import InferenceClient | |
| from repl_env import LocalRLMRunner | |
| from repl_env.recursive_backends import BackendLimits, LocalChildRLMBackend | |
| # Outer loop: large local model via vLLM | |
| vllm = OpenAI(base_url="http://localhost:8000/v1", api_key="unused") | |
| def outer_chat(messages, model=None): | |
| r = vllm.chat.completions.create( | |
| model="Qwen/Qwen3-32B", messages=messages, max_tokens=2048, | |
| ) | |
| return r.choices[0].message.content | |
| # Inner calls (llm_query/rlm_query): smaller HF-hosted model | |
| hf = InferenceClient(model="Qwen/Qwen3.5-9B") | |
| def inner_chat(messages, model=None): | |
| r = hf.chat.completions.create( | |
| model=model or "Qwen/Qwen3.5-9B", messages=messages, max_tokens=2048, | |
| extra_body={"chat_template_kwargs": {"enable_thinking": False}}, | |
| ) | |
| return r.choices[0].message.content | |
| def my_backend_factory(llm_chat_fn, **kwargs): | |
| return LocalChildRLMBackend( | |
| inner_chat, # inner calls use the smaller model | |
| runner_factory=LocalRLMRunner, | |
| system_prompt=kwargs["system_prompt"], | |
| max_iterations=kwargs["max_iterations"], | |
| env_max_iterations_multiplier=kwargs["env_max_iterations_multiplier"], | |
| depth=kwargs["depth"], | |
| limits=BackendLimits(max_depth=2), | |
| ) | |
| runner = LocalRLMRunner( | |
| outer_chat, # outer loop: large model | |
| backend_factory=my_backend_factory, # inner calls: small model | |
| max_iterations=30, | |
| max_depth=2, | |
| ) | |
| result = runner.run(context, task) | |
| ``` | |
| ## Server | |
| Run the local server: | |
| ```bash | |
| PYTHONPATH=src:envs uvicorn envs.repl_env.server.app:app --host 127.0.0.1 --port 8000 | |
| ``` | |
| The server uses a proper OpenEnv environment factory in [`server/app.py`](server/app.py). | |
| ## API Surface | |
| ### Remote Client | |
| ```python | |
| class REPLEnv(EnvClient[REPLAction, REPLObservation, REPLState]): | |
| async def reset(...) | |
| async def execute(code: str) | |
| async def submit_final_answer(answer: str) | |
| async def state() | |
| ``` | |
| Use `.sync()` for synchronous code. | |
| ### Local Helpers | |
| ```python | |
| class LocalREPLEnv: | |
| def reset(...) | |
| def execute(code: str) | |
| def state() | |
| ``` | |
| ```python | |
| class LocalRLMRunner: | |
| def run(context: str, task_prompt: str, *, model: str | None = None) -> RLMRunResult | |
| ``` | |
| ### Actions and Observations | |
| `REPLAction` | |
| ```python | |
| code: str = "" | |
| is_final: bool = False | |
| final_answer: str | None = None | |
| ``` | |
| `REPLObservation` | |
| ```python | |
| result: CodeBlockResult | |
| context_preview: str | None | |
| context_length: int | |
| available_variables: list[str] | |
| iteration: int | |
| max_iterations: int | |
| done: bool | |
| reward: float | None | |
| metadata: dict | |
| ``` | |
| ## Injected REPL Helpers | |
| When configured, the REPL namespace exposes: | |
| - `llm_query(prompt, model=None)` | |
| - `llm_query_batched(prompts, model=None)` | |
| - `rlm_query(prompt, model=None)` | |
| - `rlm_query_batched(prompts, model=None)` | |
| - `FINAL(value)` | |
| - `FINAL_VAR(name)` | |
| - `SHOW_VARS()` | |
| Notes: | |
| - `rlm_query` is the recursive child-run surface. | |
| - At max recursion depth, recursion falls back to direct LM calls rather than spawning more children. | |
| - Lifecycle callbacks follow the official `rlm` pattern: | |
| - `on_subcall_start(depth, model, prompt_preview)` | |
| - `on_subcall_complete(depth, model, duration, error_or_none)` | |
| ## Finalization Patterns | |
| ### `FINAL(...)` | |
| ```python | |
| result = env.execute("answer = 42") | |
| result = env.execute("print(FINAL(answer))") | |
| ``` | |
| ### `FINAL_VAR(...)` | |
| ```python | |
| result = env.execute("my_answer = '42'") | |
| result = env.execute('print(FINAL_VAR("my_answer"))') | |
| ``` | |
| ### `answer` dict | |
| ```python | |
| result = env.execute("answer['content'] = '42'") | |
| result = env.execute("answer['ready'] = True") | |
| ``` | |
| ## Prompt Utilities | |
| [`prompts.py`](prompts.py) contains the current message-building and parsing helpers used by the examples and runner. | |
| Important exports: | |
| - `RLM_SYSTEM_PROMPT` | |
| - `RLM_SYSTEM_PROMPT_QWEN` | |
| - `QueryMetadata` | |
| - `build_rlm_system_prompt(...)` | |
| - `build_user_prompt(...)` | |
| - `extract_code_blocks(...)` | |
| - `format_observations(...)` | |
| These prompts were updated to reflect the actual helper surface the environment provides, rather than documenting tools that do not exist. | |
| ## Examples | |
| - [`examples/repl_with_llm.py`](../../examples/repl_with_llm.py) | |
| - [`examples/repl_oolong_simple.py`](../../examples/repl_oolong_simple.py) | |
| Default hosted model in the examples is currently `Qwen/Qwen3.5-9B`, but real hosted inference still depends on provider availability and token access. | |
| ## Environment Variables | |
| Server-side configuration in [`server/app.py`](server/app.py): | |
| - `LLM_MODEL` | |
| - `HF_TOKEN` | |
| - `REPL_MAX_ITERATIONS` | |
| - `REPL_MAX_OUTPUT_LENGTH` | |
| - `REPL_CONTEXT_PREVIEW_LENGTH` | |
| - `REPL_RLM_MAX_DEPTH` | |
| - `REPL_RLM_MAX_ITERATIONS` | |
| ## References | |
| - [RLM Paper (arXiv:2512.24601)](https://huggingface.co/papers/2512.24601) | |
| - [RLM Implementation](https://github.com/alexzhang13/rlm) | |
| - [Alex Zhang's RLM Blog](https://alexzhang13.github.io/blog/2025/rlm/) | |
| - [Prime Intellect RLM Blog](https://www.primeintellect.ai/blog/rlm) | |