Spaces:

openenv
/

repl_env-0.2.2

Sleeping

App Files Files Community

repl_env-0.2.2 / README.md

burtenshaw HF Staff

Upload folder using huggingface_hub

7031bab verified about 2 months ago

preview code

raw

history blame contribute delete

12.2 kB

	---
	title: REPL Environment Server
	emoji: 🎮
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv-0.2.2
	- openenv
	---

	## Hugging Face Space Deployment

	This Space is built from OpenEnv environment `repl_env`.

	- Space URL: `https://huggingface.co/spaces/openenv/repl_env-0.2.2`
	- OpenEnv pinned ref: `0.2.2`
	- Hub tag: `openenv`

	### Connecting from Code

	```python
	from envs.repl_env import Env

	env = Env(base_url="https://huggingface.co/spaces/openenv/repl_env-0.2.2")
	```

	# REPL Environment for OpenEnv

	`repl_env` is an OpenEnv-native Python REPL environment for Recursive Language Model style execution. It now follows the current OpenEnv client/server conventions:

	- `REPLEnv` is the remote async `EnvClient`
	- `.sync()` is the sync wrapper for remote usage
	- `LocalREPLEnv` is the explicit in-process helper
	- `LocalRLMRunner` is the higher-level orchestration loop for local recursive RLM runs

	The architecture is intentionally split the same way the official `rlm` and DSPy implementations split things:

	- the environment executes code and exposes tools
	- the runner owns the iterative prompting loop
	- recursive behavior lives in backend/controller modules, not in the executor

	## Overview

	Inside the REPL, the model can:

	- inspect `context`
	- execute Python code across multiple turns with persistent state
	- call `llm_query(...)` and `llm_query_batched(...)`
	- call `rlm_query(...)` and `rlm_query_batched(...)` for recursive child runs when configured
	- finish with `FINAL(...)`, `FINAL_VAR(...)`, or `answer = {"content": ..., "ready": True}`

	## Current Architecture

	Main modules:

	- [`client.py`](client.py): remote async OpenEnv client
	- [`local.py`](local.py): explicit in-process local env helper
	- [`runner.py`](runner.py): local RLM orchestration loop
	- [`recursive_backends.py`](recursive_backends.py): direct and recursive backend implementations
	- [`recursive_controller.py`](recursive_controller.py): server-side backend/broker composition
	- [`rubrics.py`](rubrics.py): reward rubrics (OpenEnv RFC 004)
	- [`server/repl_environment.py`](server/repl_environment.py): server-side execution environment
	- [`server/app.py`](server/app.py): OpenEnv HTTP server app and env factory

	## What Works Today

	- Standard remote OpenEnv usage through `REPLEnv`
	- Local in-process execution through `LocalREPLEnv`
	- Local recursive RLM runs through `LocalRLMRunner`
	- Server-backed recursive calls through the current controller/broker path
	- Explicit recursion controls:
	- `max_depth`
	- `max_children_total`
	- `max_children_per_batch`
	- `per_child_timeout_s`
	- `result_truncation_limit`
	- Lightweight child trace metadata on local runner results
	- Rubric-based rewards (OpenEnv RFC 004):
	- `ExactMatchRubric`: binary outcome reward against ground truth
	- `FuzzyMatchRubric`: partial credit for containment matches
	- `CustomMetricRubric`: user-provided `metric(expected, predicted) -> float`
	- `CodeExecutionRubric`: per-step process reward for code errors
	- `REPLRubric`: composite rubric combining outcome + process
	- Ground truth injectable at reset via `expected_answer`

	## Rewards

	Rewards follow the OpenEnv Rubric system (RFC 004). The environment uses
	`REPLRubric` by default, which combines:

	- Outcome reward (on terminal steps): compares `final_answer` against
	`expected_answer` if provided. Returns 1.0 for match, 0.0 otherwise.
	- Process reward (on non-terminal steps): returns -0.05 for code
	execution errors, 0.0 for successful steps.
	- Failure reward: returns -0.1 when max iterations exhausted without an answer.

	For RL training (GRPO, etc.), pass `expected_answer` at reset time:

	```python
	with LocalREPLEnv() as env:
	env.reset(
	context="...",
	task_prompt="...",
	expected_answer="42", # ground truth for rubric scoring
	)
	result = env.execute("print(FINAL(42))")
	print(result.reward) # 1.0 (correct)
	```

	Custom rubrics can be injected at construction:

	```python
	from repl_env import LocalREPLEnv, CustomMetricRubric, REPLRubric

	def my_metric(expected, predicted):
	return 1.0 if expected.strip() == predicted.strip() else 0.0

	env = LocalREPLEnv(rubric=REPLRubric(outcome=CustomMetricRubric(my_metric)))
	```

	## Quick Start

	### Remote Server Usage

	Async:

	```python
	import asyncio
	from repl_env import REPLEnv


	async def main():
	async with REPLEnv(base_url="http://127.0.0.1:8000") as env:
	result = await env.reset(
	context="alpha beta gamma",
	task_prompt="Count the words",
	)
	result = await env.execute("count = len(context.split())")
	result = await env.execute("print(FINAL(count))")
	print(result.done)


	asyncio.run(main())
	```

	Sync:

	```python
	from repl_env import REPLEnv

	with REPLEnv(base_url="http://127.0.0.1:8000").sync() as env:
	result = env.reset(
	context="alpha beta gamma",
	task_prompt="Count the words",
	)
	result = env.execute("count = len(context.split())")
	result = env.execute("print(FINAL(count))")
	print(result.observation.result.stdout)
	```

	### Local Environment Usage

	```python
	from repl_env import LocalREPLEnv

	with LocalREPLEnv() as env:
	result = env.reset(
	context="The quick brown fox jumps over the lazy dog",
	task_prompt="Count the words",
	)
	result = env.execute("count = len(context.split())")
	result = env.execute("print(FINAL(count))")
	print(env.state().final_answer)
	```

	### Local Recursive RLM Usage

	`LocalRLMRunner` takes any `chat_fn(messages, model=None) -> str`. It works
	with HF Inference API, vLLM, SGLang, Ollama, or any OpenAI-compatible server.

	With HF Inference API:

	```python
	from huggingface_hub import InferenceClient
	from repl_env import LocalRLMRunner, RLM_SYSTEM_PROMPT

	client = InferenceClient(model="Qwen/Qwen3.5-9B", timeout=300)

	def chat_fn(messages, model=None):
	response = client.chat.completions.create(
	model=model or "Qwen/Qwen3.5-9B",
	messages=messages,
	max_tokens=2048,
	temperature=0.6,
	extra_body={"chat_template_kwargs": {"enable_thinking": False}},
	)
	return response.choices[0].message.content

	runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2)
	result = runner.run("The answer is 42", "What number is mentioned?")
	print(result.final_answer)
	```

	With a local vLLM server:

	```python
	from openai import OpenAI
	from repl_env import LocalRLMRunner

	client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

	def chat_fn(messages, model=None):
	response = client.chat.completions.create(
	model=model or "Qwen/Qwen3.5-9B",
	messages=messages,
	max_tokens=2048,
	temperature=0.6,
	)
	return response.choices[0].message.content

	runner = LocalRLMRunner(chat_fn, max_iterations=30, max_depth=2)
	result = runner.run(context, task)
	```

	### Using Different Models for Outer and Inner Loops

	The outer loop (code generation) can use a large model while inner
	`llm_query`/`rlm_query` calls use a smaller, faster model. Pass a
	custom `backend_factory` to the runner:

	```python
	from openai import OpenAI
	from huggingface_hub import InferenceClient
	from repl_env import LocalRLMRunner
	from repl_env.recursive_backends import BackendLimits, LocalChildRLMBackend

	# Outer loop: large local model via vLLM
	vllm = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

	def outer_chat(messages, model=None):
	r = vllm.chat.completions.create(
	model="Qwen/Qwen3-32B", messages=messages, max_tokens=2048,
	)
	return r.choices[0].message.content

	# Inner calls (llm_query/rlm_query): smaller HF-hosted model
	hf = InferenceClient(model="Qwen/Qwen3.5-9B")

	def inner_chat(messages, model=None):
	r = hf.chat.completions.create(
	model=model or "Qwen/Qwen3.5-9B", messages=messages, max_tokens=2048,
	extra_body={"chat_template_kwargs": {"enable_thinking": False}},
	)
	return r.choices[0].message.content

	def my_backend_factory(llm_chat_fn, **kwargs):
	return LocalChildRLMBackend(
	inner_chat, # inner calls use the smaller model
	runner_factory=LocalRLMRunner,
	system_prompt=kwargs["system_prompt"],
	max_iterations=kwargs["max_iterations"],
	env_max_iterations_multiplier=kwargs["env_max_iterations_multiplier"],
	depth=kwargs["depth"],
	limits=BackendLimits(max_depth=2),
	)

	runner = LocalRLMRunner(
	outer_chat, # outer loop: large model
	backend_factory=my_backend_factory, # inner calls: small model
	max_iterations=30,
	max_depth=2,
	)
	result = runner.run(context, task)
	```

	## Server

	Run the local server:

	```bash
	PYTHONPATH=src:envs uvicorn envs.repl_env.server.app:app --host 127.0.0.1 --port 8000
	```

	The server uses a proper OpenEnv environment factory in [`server/app.py`](server/app.py).

	## API Surface

	### Remote Client

	```python
	class REPLEnv(EnvClient[REPLAction, REPLObservation, REPLState]):
	async def reset(...)
	async def execute(code: str)
	async def submit_final_answer(answer: str)
	async def state()
	```

	Use `.sync()` for synchronous code.

	### Local Helpers

	```python
	class LocalREPLEnv:
	def reset(...)
	def execute(code: str)
	def state()
	```

	```python
	class LocalRLMRunner:
	def run(context: str, task_prompt: str, *, model: str \| None = None) -> RLMRunResult
	```

	### Actions and Observations

	`REPLAction`

	```python
	code: str = ""
	is_final: bool = False
	final_answer: str \| None = None
	```

	`REPLObservation`

	```python
	result: CodeBlockResult
	context_preview: str \| None
	context_length: int
	available_variables: list[str]
	iteration: int
	max_iterations: int
	done: bool
	reward: float \| None
	metadata: dict
	```

	## Injected REPL Helpers

	When configured, the REPL namespace exposes:

	- `llm_query(prompt, model=None)`
	- `llm_query_batched(prompts, model=None)`
	- `rlm_query(prompt, model=None)`
	- `rlm_query_batched(prompts, model=None)`
	- `FINAL(value)`
	- `FINAL_VAR(name)`
	- `SHOW_VARS()`

	Notes:

	- `rlm_query` is the recursive child-run surface.
	- At max recursion depth, recursion falls back to direct LM calls rather than spawning more children.
	- Lifecycle callbacks follow the official `rlm` pattern:
	- `on_subcall_start(depth, model, prompt_preview)`
	- `on_subcall_complete(depth, model, duration, error_or_none)`

	## Finalization Patterns

	### `FINAL(...)`

	```python
	result = env.execute("answer = 42")
	result = env.execute("print(FINAL(answer))")
	```

	### `FINAL_VAR(...)`

	```python
	result = env.execute("my_answer = '42'")
	result = env.execute('print(FINAL_VAR("my_answer"))')
	```

	### `answer` dict

	```python
	result = env.execute("answer['content'] = '42'")
	result = env.execute("answer['ready'] = True")
	```

	## Prompt Utilities

	[`prompts.py`](prompts.py) contains the current message-building and parsing helpers used by the examples and runner.

	Important exports:

	- `RLM_SYSTEM_PROMPT`
	- `RLM_SYSTEM_PROMPT_QWEN`
	- `QueryMetadata`
	- `build_rlm_system_prompt(...)`
	- `build_user_prompt(...)`
	- `extract_code_blocks(...)`
	- `format_observations(...)`

	These prompts were updated to reflect the actual helper surface the environment provides, rather than documenting tools that do not exist.

	## Examples

	- [`examples/repl_with_llm.py`](../../examples/repl_with_llm.py)
	- [`examples/repl_oolong_simple.py`](../../examples/repl_oolong_simple.py)

	Default hosted model in the examples is currently `Qwen/Qwen3.5-9B`, but real hosted inference still depends on provider availability and token access.

	## Environment Variables

	Server-side configuration in [`server/app.py`](server/app.py):

	- `LLM_MODEL`
	- `HF_TOKEN`
	- `REPL_MAX_ITERATIONS`
	- `REPL_MAX_OUTPUT_LENGTH`
	- `REPL_CONTEXT_PREVIEW_LENGTH`
	- `REPL_RLM_MAX_DEPTH`
	- `REPL_RLM_MAX_ITERATIONS`

	## References

	- [RLM Paper (arXiv:2512.24601)](https://huggingface.co/papers/2512.24601)
	- [RLM Implementation](https://github.com/alexzhang13/rlm)
	- [Alex Zhang's RLM Blog](https://alexzhang13.github.io/blog/2025/rlm/)
	- [Prime Intellect RLM Blog](https://www.primeintellect.ai/blog/rlm)