Spaces:

sergiopaniego
/

repl

Sleeping

App Files Files Community

sergiopaniego HF Staff commited on Jan 12

Commit

07fffa0

verified ·

1 Parent(s): 9d67534

Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

Dockerfile +81 -0
README.md +442 -4
__init__.py +78 -0
client.py +464 -0
models.py +109 -0
openenv.yaml +6 -0
prompts.py +373 -0
pyproject.toml +43 -0
server/__init__.py +19 -0
server/app.py +87 -0
server/python_executor.py +327 -0
server/repl_environment.py +512 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,81 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local src/core)
+# - Standalone environments (with openenv from pip)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=repl_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv-core is already in the pyproject.toml dependencies
+# For standalone builds, openenv-core will be installed from pip via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install git for building from git repos (build-time only)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check using Python (more portable than curl/wget)
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,448 @@
 ---
-title: Repl
-emoji: 👀
-colorFrom: gray
 colorTo: indigo
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: REPL Environment Server
+emoji: 🎮
+colorFrom: yellow
 colorTo: indigo
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# REPL Environment for OpenEnv
+A Python REPL environment for training language models on code execution tasks, based on the [Recursive Language Models (RLM)](https://arxiv.org/abs/2512.24601) paradigm.
+## Overview
+The RLM paradigm allows language models to:
+- Execute Python code in a sandboxed REPL environment
+- Make recursive calls to themselves or other LMs via `llm_query()` / `llm_query_batched()`
+- Handle near-infinite context by programmatically decomposing and exploring data
+- Terminate with explicit `FINAL(answer)` or `answer = {"content": ..., "ready": True}` signals
+## Features
+- **Unified API**: Same `REPLEnv` class works for both local and remote execution
+- **Sandboxed Python Execution**: Safe code execution with restricted builtins
+- **Context Loading**: Load large contexts that agents can explore programmatically
+- **Multiple Finalization Patterns**:
+  - Direct call: `FINAL(answer)` - helper function injected into namespace
+  - Print pattern: `print('FINAL(answer)')` or `print('FINAL_VAR(var_name)')`
+  - Prime Intellect style: `answer = {"content": "...", "ready": True}`
+- **Iteration Limits**: Configurable maximum steps per episode
+- **Reward Signals**: Customizable reward functions for RL training
+- **Optional LLM Oracle**: Can enable `llm_query()` and `llm_query_batched()` for recursive calls
+## Quick Start
+### Local Mode (No Server Required)
+```python
+from repl_env import REPLEnv
+# Create environment - runs locally by default
+with REPLEnv() as env:
+    result = env.reset(
+        context="This is a large document with lots of text...",
+        task_prompt="Find the word count"
+    )
+    # Execute code iteratively
+    result = env.execute("words = context.split()")
+    result = env.execute("count = len(words)")
+    result = env.execute("print(f'FINAL({count})')")
+    print(f"Done: {result.done}")
+    print(f"Final Answer: {env.state().final_answer}")
+```
+### Remote Server Mode
+```python
+from repl_env import REPLEnv
+# Connect to a running server - same API!
+with REPLEnv(base_url="https://my-server.hf.space") as env:
+    result = env.reset(context="...", task_prompt="...")
+    result = env.execute("count = len(context)")
+    result = env.execute("print(f'FINAL({count})')")
+```
+### Local Mode with LLM Support
+```python
+from repl_env import REPLEnv
+def my_llm_query(prompt: str) -> str:
+    return your_llm.generate(prompt)
+def my_llm_query_batched(prompts: list[str]) -> list[str]:
+    return [my_llm_query(p) for p in prompts]
+# Pass LLM functions for recursive calls
+with REPLEnv(llm_query_fn=my_llm_query, llm_batch_fn=my_llm_query_batched) as env:
+    result = env.reset(context=large_document, task_prompt="Summarize this")
+    # Now the executed code can use llm_query() and llm_query_batched()!
+    result = env.execute("summary = llm_query('Summarize: ' + context[:1000])")
+```
+### From Docker or HuggingFace Hub
+```python
+from repl_env import REPLEnv
+# Start from Docker image
+env = REPLEnv.from_docker_image("repl-env:latest")
+# Or from HuggingFace Hub
+env = REPLEnv.from_hub("openenv/repl-env")
+```
+## API Reference
+### REPLEnv
+```python
+class REPLEnv:
+    def __init__(
+        self,
+        base_url: str | None = None,      # Server URL (None = local mode)
+        *,
+        # Local-only options
+        llm_query_fn: Callable | None = None,    # Function for llm_query()
+        llm_batch_fn: Callable | None = None,    # Function for llm_query_batched()
+        max_output_length: int = 8192,           # Max stdout/stderr chars
+        context_preview_length: int = 500,       # Chars in context preview
+        reward_on_success: float = 1.0,          # Reward on FINAL()
+        reward_on_iteration: float = 0.0,        # Reward per step
+        reward_on_failure: float = -0.1,         # Reward on max iterations
+        reward_on_error: float = -0.05,          # Reward on execution error
+        # Remote-only options
+        connect_timeout_s: float = 10.0,
+        message_timeout_s: float = 60.0,
+    ): ...
+    def reset(
+        self,
+        *,
+        context: str = "",              # Text to analyze (as `context` variable)
+        task_prompt: str = "",          # Task description
+        max_iterations: int = 30,       # Max code execution steps
+        seed: int | None = None,        # Random seed
+        episode_id: str | None = None,  # Custom episode ID
+        hf_token: str | None = None,    # HF token for llm_query (remote mode)
+        llm_model: str | None = None,   # Model for llm_query (remote mode)
+    ) -> StepResult[REPLObservation]: ...
+    def execute(self, code: str) -> StepResult[REPLObservation]: ...
+    def step(self, action: REPLAction) -> StepResult[REPLObservation]: ...
+    def submit_final_answer(self, answer: str) -> StepResult[REPLObservation]: ...
+    def state(self) -> REPLState: ...
+    def close(self) -> None: ...
+```
+### Action Space
+```python
+class REPLAction:
+    code: str = ""                    # Python code to execute
+    is_final: bool = False            # Whether this signals the final answer
+    final_answer: str | None = None   # The final answer (if is_final=True)
+```
+### Observation Space
+```python
+class REPLObservation:
+    result: CodeBlockResult      # Execution result (stdout, stderr, etc.)
+    context_preview: str | None  # First 500 chars of context
+    context_length: int          # Total context length
+    available_variables: list    # Variables in namespace
+    iteration: int               # Current iteration
+    max_iterations: int          # Max iterations
+    done: bool                   # Episode complete?
+    reward: float                # Step reward
+    metadata: dict               # Additional info (final_answer, etc.)
+```
+## Finalization Patterns
+### Pattern 1: Direct FINAL() call (recommended)
+```python
+result = env.execute("answer = 42")
+result = env.execute("FINAL(answer)")
+# -> done=True, final_answer="42"
+```
+### Pattern 2: FINAL() via print
+```python
+result = env.execute("answer = 42")
+result = env.execute("print(f'FINAL({answer})')")
+# -> done=True, final_answer="42"
+```
+### Pattern 3: FINAL_VAR() for variable reference
+```python
+result = env.execute("my_result = 'The answer is 42'")
+# Direct call (recommended) - pass variable name as string
+# FINAL_VAR looks up the variable and returns FINAL(value)
+result = env.execute('FINAL_VAR("my_result")')
+# -> done=True, final_answer="The answer is 42"
+# Also works via print (for regex detection)
+result = env.execute("print('FINAL_VAR(my_result)')")
+# -> done=True, final_answer="The answer is 42"
+```
+### Pattern 4: Prime Intellect style answer dict
+```python
+result = env.execute("answer['content'] = '42'")
+result = env.execute("answer['ready'] = True")
+# -> done=True, final_answer="42"
+```
+## Prompts Module
+The `prompts` module provides RLM-style prompts and parsing utilities:
+```python
+from repl_env.prompts import (
+    # System prompts (from official RLM repo)
+    RLM_SYSTEM_PROMPT,           # Base prompt with llm_query_batched
+    RLM_SYSTEM_PROMPT_QWEN,      # For Qwen models (adds cost warning)
+    # Prompt building
+    QueryMetadata,               # Context metadata dataclass
+    build_rlm_system_prompt,     # Build system messages with metadata
+    build_user_prompt,           # Build user prompt for each iteration
+    build_initial_prompt,        # Convenience wrapper for iteration 0
+    # Parsing utilities
+    extract_code_blocks,         # Extract code from ```repl``` or ```python``` blocks
+    format_observation,          # Format execution result for LLM
+)
+# Example: Build messages using official RLM style
+query_metadata = QueryMetadata(
+    context_lengths=[len(context)],
+    context_total_length=len(context),
+    context_type="str",
+)
+messages = build_rlm_system_prompt(RLM_SYSTEM_PROMPT_QWEN, query_metadata)
+messages.append(build_user_prompt(root_prompt="Count words in the context", iteration=0))
+# Extract code from LLM response (supports ```repl``` and ```python```)
+response = "Here's my solution:\n```repl\ncount = len(context.split())\nFINAL(count)\n```"
+code_blocks = extract_code_blocks(response)  # ["count = len(context.split())\nFINAL(count)"]
+```
+## Examples
+See the `examples/` directory for complete working examples:
+- **`examples/repl_with_llm.py`** - Full RLM loop with local Qwen model
+- **`examples/repl_oolong_simple.py`** - RLM on Oolong benchmark with HuggingFace Inference API
+Run examples:
+```bash
+# Full RLM example with local model (requires GPU)
+python examples/repl_with_llm.py
+# Oolong benchmark with HF Inference API (requires HF_TOKEN)
+python examples/repl_oolong_simple.py
+```
+## Model Usage
+### Inference Loop
+A typical model inference loop where the LLM generates code and the environment executes it:
+```python
+from repl_env import REPLEnv
+from repl_env.prompts import RLM_SYSTEM_PROMPT, build_initial_prompt, extract_code_blocks, format_observation
+# Works with both local and remote!
+with REPLEnv(base_url="http://localhost:8000") as env:  # or REPLEnv() for local
+    result = env.reset(
+        context="The quick brown fox jumps over the lazy dog. " * 1000,
+        task_prompt="Count how many times 'fox' appears"
+    )
+    messages = [
+        {"role": "system", "content": RLM_SYSTEM_PROMPT},
+        {"role": "user", "content": build_initial_prompt(
+            task_prompt="Count how many times 'fox' appears",
+            context_length=result.observation.context_length,
+            context_preview=result.observation.context_preview,
+            variables=result.observation.available_variables,
+        )},
+    ]
+    while not result.done:
+        # Get code from LLM
+        response = your_llm.chat(messages)
+        code_blocks = extract_code_blocks(response)
+        for code in code_blocks:
+            result = env.execute(code)
+            if result.done:
+                break
+        # Update conversation
+        messages.append({"role": "assistant", "content": response})
+        messages.append({"role": "user", "content": format_observation(result.observation)})
+    print(f"Final answer: {env.state().final_answer}")
+```
+### Recursive LLM Calls (RLM Paradigm)
+The key insight of RLM is that models can make recursive calls to themselves or other LLMs from within the code:
+```python
+from repl_env import REPLEnv
+def llm_query(prompt: str) -> str:
+    """Single LLM call - model can call this from executed code"""
+    return your_llm.generate(prompt)
+def llm_query_batched(prompts: list[str]) -> list[str]:
+    """Batch LLM calls for efficiency (parallel in production)"""
+    return [your_llm.generate(p) for p in prompts]
+# Create environment with LLM oracle (local mode)
+with REPLEnv(llm_query_fn=llm_query, llm_batch_fn=llm_query_batched) as env:
+    result = env.reset(
+        context=massive_document,  # Could be 100K+ chars
+        task_prompt="Summarize each section and find key themes"
+    )
+    # The model can now generate code like this:
+    code = """
+# Split document into sections
+sections = context.split('\\n\\n')
+# Use LLM to summarize each section (recursive call!)
+summaries = llm_query_batched([f"Summarize: {s[:1000]}" for s in sections[:10]])
+# Combine summaries
+combined = '\\n'.join(summaries)
+# Final synthesis using another LLM call
+answer['content'] = llm_query(f"Find key themes in: {combined}")
+answer['ready'] = True
+"""
+    result = env.execute(code)
+    print(f"Done: {result.done}, Answer: {env.state().final_answer}")
+```
+### RL Training Integration
+For RL training, integrate with frameworks like TRL, prime-rl, or verifiers:
+```python
+from repl_env import REPLEnv
+def collect_trajectory(env, policy, context, task):
+    """Collect a single trajectory for RL training"""
+    result = env.reset(context=context, task_prompt=task)
+    trajectory = []
+    total_reward = 0
+    while not result.done:
+        # Policy generates code
+        code = policy.generate(result.observation)
+        # Step environment
+        next_result = env.execute(code)
+        # Store transition
+        trajectory.append({
+            "observation": result.observation,
+            "action": code,
+            "reward": next_result.reward,
+            "next_observation": next_result.observation,
+            "done": next_result.done,
+        })
+        total_reward += next_result.reward
+        result = next_result
+    return trajectory, total_reward
+# Training loop
+with REPLEnv(
+    reward_on_success=1.0,
+    reward_on_iteration=0.0,
+    reward_on_error=-0.05,
+    reward_on_failure=-0.1,
+) as env:
+    for epoch in range(num_epochs):
+        for context, task, ground_truth in dataset:
+            trajectory, reward = collect_trajectory(env, policy, context, task)
+            # Verify answer correctness (optional external reward)
+            if trajectory:
+                final_answer = env.state().final_answer
+                if final_answer == ground_truth:
+                    reward += verification_bonus
+            # Update policy (use your RL framework - PPO, GRPO, DPO, etc.)
+            policy.update(trajectory, reward)
+```
+### Reward Configuration
+Configure rewards for different outcomes:
+```python
+env = REPLEnv(
+    reward_on_success=1.0,    # When FINAL() is called
+    reward_on_iteration=0.0,  # Per step (can be negative to encourage efficiency)
+    reward_on_error=-0.05,    # When code execution fails
+    reward_on_failure=-0.1,   # When max iterations reached without answer
+)
+```
+## Environment Configuration
+| Environment Variable | Description | Default |
+|---------------------|-------------|---------|
+| `REPL_CONTEXT` | Initial context to load | "" |
+| `REPL_TASK_PROMPT` | Task description | "" |
+| `REPL_MAX_ITERATIONS` | Max steps per episode | 30 |
+| `HF_TOKEN` | HuggingFace token for llm_query (server fallback) | None |
+| `LLM_MODEL` | Model for llm_query/llm_query_batched | Qwen/Qwen3-Coder-480B-A35B-Instruct |
+## Running the Server
+### Using UV
+```bash
+cd envs/repl_env
+uv run --project . server
+```
+### Using Docker
+```bash
+docker build -t repl-env:latest -f server/Dockerfile .
+docker run -p 8000:8000 repl-env:latest
+```
+### Testing
+```bash
+pytest tests/
+```
+## References
+- [RLM Paper (arXiv:2512.24601)](https://arxiv.org/abs/2512.24601)
+- [RLM Implementation](https://github.com/alexzhang13/rlm)
+- [Alex Zhang's RLM Blog](https://alexzhang13.github.io/blog/2025/rlm/)
+- [Prime Intellect RLM Blog](https://www.primeintellect.ai/blog/rlm)

__init__.py ADDED Viewed

	@@ -0,0 +1,78 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+REPL Environment for OpenEnv.
+A Python REPL environment for training language models on code execution tasks,
+based on the Recursive Language Models (RLM) paradigm.
+This environment allows language models to:
+- Execute Python code in a sandboxed REPL
+- Work with large contexts loaded as variables
+- Finalize answers via FINAL(), FINAL_VAR(), or answer dict pattern
+- Optionally make recursive LLM calls via llm_query() / llm_query_batched()
+Example:
+    >>> from repl_env import REPLEnv, REPLAction
+    >>>
+    >>> # Start from Docker
+    >>> env = REPLEnv.from_docker_image("repl-env:latest")
+    >>>
+    >>> # Reset with context
+    >>> result = env.reset(context="Hello World", task_prompt="Count characters")
+    >>>
+    >>> # Execute code
+    >>> result = env.execute("count = len(context)")
+    >>> result = env.execute("print(f'FINAL({count})')")
+    >>>
+    >>> # Check result
+    >>> print(f"Done: {result.done}, Answer: {result.observation.metadata['final_answer']}")
+    >>>
+    >>> env.close()
+References:
+    - RLM Paper: https://arxiv.org/abs/2512.24601
+    - Prime Intellect Blog: https://www.primeintellect.ai/blog/rlm
+    - Alex Zhang Blog: https://alexzhang13.github.io/blog/2025/rlm/
+"""
+from .models import REPLAction, REPLObservation, REPLState, CodeBlockResult
+from .client import REPLEnv
+from .prompts import (
+    # System prompts
+    RLM_SYSTEM_PROMPT,
+    RLM_SYSTEM_PROMPT_QWEN,
+    # Prompt building
+    QueryMetadata,
+    build_rlm_system_prompt,
+    build_user_prompt,
+    build_initial_prompt,
+    # Parsing utilities
+    extract_code_blocks,
+    format_observation,
+)
+__all__ = [
+    # Models
+    "REPLAction",
+    "REPLObservation",
+    "REPLState",
+    "CodeBlockResult",
+    # Client
+    "REPLEnv",
+    # System prompts
+    "RLM_SYSTEM_PROMPT",
+    "RLM_SYSTEM_PROMPT_QWEN",
+    # Prompt building
+    "QueryMetadata",
+    "build_rlm_system_prompt",
+    "build_user_prompt",
+    "build_initial_prompt",
+    # Parsing utilities
+    "extract_code_blocks",
+    "format_observation",
+]

client.py ADDED Viewed

	@@ -0,0 +1,464 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+REPL Environment Client.
+This module provides a unified client for the REPL Environment that works
+with both remote servers (via WebSocket) and local execution (no server needed).
+Examples:
+    # Connect to remote server with your HF token for sub-LLM calls
+    env = REPLEnv(base_url="https://my-server.hf.space")
+    result = env.reset(
+        context="...",
+        task_prompt="...",
+        hf_token=os.environ["HF_TOKEN"],  # Server uses this for llm_query
+    )
+    # Run locally (no server)
+    env = REPLEnv()
+    # Local with LLM support
+    env = REPLEnv(llm_query_fn=my_llm, llm_batch_fn=my_batch)
+    # All use the same interface
+    result = env.execute("x = len(context)")
+    env.close()
+"""
+from __future__ import annotations
+from typing import Any, Callable, Dict, List, Optional, TYPE_CHECKING
+# Support both in-repo and standalone imports
+try:
+    from openenv.core.client_types import StepResult
+    from openenv.core.env_client import EnvClient
+    from .models import REPLAction, REPLObservation, REPLState, CodeBlockResult
+except ImportError:
+    from openenv.core.client_types import StepResult
+    from openenv.core.env_client import EnvClient
+    from models import REPLAction, REPLObservation, REPLState, CodeBlockResult
+if TYPE_CHECKING:
+    from .server.repl_environment import REPLEnvironment
+class REPLEnv:
+    """
+    Unified client for the REPL Environment.
+    Works with both remote servers and local execution, providing the same
+    interface regardless of where the code runs.
+    Examples:
+        >>> # Connect to a running server
+        >>> with REPLEnv(base_url="http://localhost:8000") as env:
+        ...     result = env.reset(context="Hello World", task_prompt="Count chars")
+        ...     result = env.execute("count = len(context)")
+        ...     result = env.execute("print(f'FINAL({count})')")
+        ...     print(result.done)  # True
+        >>> # Run locally without a server
+        >>> with REPLEnv() as env:
+        ...     result = env.reset(context="Hello World", task_prompt="Count chars")
+        ...     result = env.execute("count = len(context)")
+        ...     print(result.observation.result.success)  # True
+        >>> # Local with LLM support for recursive calls
+        >>> def my_llm(prompt: str) -> str:
+        ...     return "LLM response"
+        >>> with REPLEnv(llm_query_fn=my_llm) as env:
+        ...     result = env.reset(context="...")
+        ...     result = env.execute("response = llm_query('Summarize: ' + context)")
+        >>> # From Docker image
+        >>> env = REPLEnv.from_docker_image("repl-env:latest")
+        >>> # From HuggingFace Hub
+        >>> env = REPLEnv.from_hub("openenv/repl-env")
+    """
+    def __init__(
+        self,
+        base_url: Optional[str] = None,
+        *,
+        # Local-only options (ignored when base_url is set)
+        llm_query_fn: Optional[Callable[[str], str]] = None,
+        llm_batch_fn: Optional[Callable[[List[str]], List[str]]] = None,
+        max_output_length: int = 8192,
+        context_preview_length: int = 500,
+        reward_on_success: float = 1.0,
+        reward_on_iteration: float = 0.0,
+        reward_on_failure: float = -0.1,
+        reward_on_error: float = -0.05,
+        # Connection options (ignored when running locally)
+        connect_timeout_s: float = 10.0,
+        message_timeout_s: float = 60.0,
+    ):
+        """
+        Initialize REPL environment.
+        Args:
+            base_url: Server URL. If None, runs locally without a server.
+            llm_query_fn: Function for llm_query() calls (local mode only).
+            llm_batch_fn: Function for llm_query_batched() calls (local mode only).
+            max_output_length: Max stdout/stderr chars per execution (local only).
+            context_preview_length: Chars to show in context preview (local only).
+            reward_on_success: Reward when final answer submitted (local only).
+            reward_on_iteration: Reward per iteration step (local only).
+            reward_on_failure: Reward when max iterations reached (local only).
+            reward_on_error: Reward when code execution fails (local only).
+            connect_timeout_s: WebSocket connection timeout (remote only).
+            message_timeout_s: Message response timeout (remote only).
+        """
+        self._base_url = base_url
+        self._local_env: Optional[REPLEnvironment] = None
+        self._remote_client: Optional[_RemoteREPLClient] = None
+        # Store local-mode options
+        self._llm_query_fn = llm_query_fn
+        self._llm_batch_fn = llm_batch_fn
+        self._max_output_length = max_output_length
+        self._context_preview_length = context_preview_length
+        self._reward_on_success = reward_on_success
+        self._reward_on_iteration = reward_on_iteration
+        self._reward_on_failure = reward_on_failure
+        self._reward_on_error = reward_on_error
+        # Store remote-mode options
+        self._connect_timeout_s = connect_timeout_s
+        self._message_timeout_s = message_timeout_s
+        # Provider for container/runtime lifecycle (set by factory methods)
+        self._provider = None
+    def _ensure_initialized(self) -> None:
+        """Initialize the appropriate backend (local or remote)."""
+        if self._local_env is not None or self._remote_client is not None:
+            return
+        if self._base_url is None:
+            # Local mode: create REPLEnvironment directly
+            from .server.repl_environment import REPLEnvironment
+            self._local_env = REPLEnvironment(
+                max_output_length=self._max_output_length,
+                context_preview_length=self._context_preview_length,
+                reward_on_success=self._reward_on_success,
+                reward_on_iteration=self._reward_on_iteration,
+                reward_on_failure=self._reward_on_failure,
+                reward_on_error=self._reward_on_error,
+                llm_query_fn=self._llm_query_fn,
+                llm_batch_fn=self._llm_batch_fn,
+            )
+        else:
+            # Remote mode: create WebSocket client
+            self._remote_client = _RemoteREPLClient(
+                base_url=self._base_url,
+                connect_timeout_s=self._connect_timeout_s,
+                message_timeout_s=self._message_timeout_s,
+                provider=self._provider,
+            )
+            self._remote_client.connect()
+    def reset(
+        self,
+        *,
+        context: str = "",
+        task_prompt: str = "",
+        max_iterations: int = 30,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        hf_token: Optional[str] = None,
+        llm_model: Optional[str] = None,
+    ) -> StepResult[REPLObservation]:
+        """
+        Reset the environment for a new episode.
+        Args:
+            context: Text content to analyze (accessible as `context` variable).
+            task_prompt: Description of the task to solve.
+            max_iterations: Maximum code execution steps before timeout.
+            seed: Optional random seed for reproducibility.
+            episode_id: Optional custom episode identifier.
+            hf_token: Optional HuggingFace token for llm_query/llm_query_batched.
+                      When provided, the server uses this token for sub-LLM calls
+                      instead of its own configured token.
+            llm_model: Optional model name for LLM functions (default: Qwen3-Coder-480B).
+        Returns:
+            StepResult with initial observation.
+        """
+        self._ensure_initialized()
+        if self._local_env is not None:
+            # Local mode
+            self._local_env.max_iterations = max_iterations
+            obs = self._local_env.reset(
+                seed=seed,
+                episode_id=episode_id,
+                context=context,
+                task_prompt=task_prompt,
+                hf_token=hf_token,
+                llm_model=llm_model,
+            )
+            return self._wrap_observation(obs)
+        else:
+            # Remote mode
+            assert self._remote_client is not None
+            return self._remote_client.reset(
+                context=context,
+                task_prompt=task_prompt,
+                max_iterations=max_iterations,
+                seed=seed,
+                episode_id=episode_id,
+                hf_token=hf_token,
+                llm_model=llm_model,
+            )
+    def step(self, action: REPLAction) -> StepResult[REPLObservation]:
+        """
+        Execute a REPL action.
+        Args:
+            action: REPLAction containing code to execute.
+        Returns:
+            StepResult with execution observation.
+        """
+        self._ensure_initialized()
+        if self._local_env is not None:
+            obs = self._local_env.step(action)
+            return self._wrap_observation(obs)
+        else:
+            assert self._remote_client is not None
+            return self._remote_client.step(action)
+    def execute(self, code: str) -> StepResult[REPLObservation]:
+        """
+        Execute Python code in the REPL.
+        Convenience method that wraps step() with a code-only action.
+        Args:
+            code: Python code to execute.
+        Returns:
+            StepResult with execution observation.
+        """
+        return self.step(REPLAction(code=code))
+    def submit_final_answer(self, answer: str) -> StepResult[REPLObservation]:
+        """
+        Submit a final answer and terminate the episode.
+        Args:
+            answer: The final answer string.
+        Returns:
+            StepResult with done=True.
+        """
+        return self.step(REPLAction(code="", is_final=True, final_answer=answer))
+    def get_variable(self, name: str) -> StepResult[REPLObservation]:
+        """
+        Retrieve and print a variable from the REPL namespace.
+        Args:
+            name: Variable name to retrieve.
+        Returns:
+            StepResult with variable value in stdout.
+        """
+        return self.execute(f"print(repr({name}))")
+    def state(self) -> REPLState:
+        """
+        Get current environment state.
+        Returns:
+            REPLState with current environment information.
+        """
+        self._ensure_initialized()
+        if self._local_env is not None:
+            return self._local_env.state
+        else:
+            assert self._remote_client is not None
+            return self._remote_client.state()
+    def list_variables(self) -> List[str]:
+        """
+        Get list of available variables in the current session.
+        Returns:
+            List of variable names.
+        """
+        return self.state().namespace_keys
+    def close(self) -> None:
+        """Clean up resources."""
+        if self._local_env is not None:
+            self._local_env.close()
+            self._local_env = None
+        if self._remote_client is not None:
+            self._remote_client.close()
+            self._remote_client = None
+    def _wrap_observation(self, obs: REPLObservation) -> StepResult[REPLObservation]:
+        """Wrap a local REPLObservation in a StepResult."""
+        return StepResult(
+            observation=obs,
+            reward=obs.reward,
+            done=obs.done,
+        )
+    # Context manager support
+    def __enter__(self) -> "REPLEnv":
+        """Enter context manager."""
+        self._ensure_initialized()
+        return self
+    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
+        """Exit context manager."""
+        self.close()
+    # Factory methods
+    @classmethod
+    def from_docker_image(
+        cls,
+        image: str,
+        **kwargs: Any,
+    ) -> "REPLEnv":
+        """
+        Create a REPL environment by spinning up a Docker container.
+        Args:
+            image: Docker image name to run (e.g., "repl-env:latest").
+            **kwargs: Additional arguments passed to container start.
+        Returns:
+            Connected REPLEnv instance.
+        """
+        from openenv.core.containers.runtime import LocalDockerProvider
+        provider = LocalDockerProvider()
+        base_url = provider.start_container(image, **kwargs)
+        provider.wait_for_ready(base_url)
+        env = cls(base_url=base_url)
+        env._provider = provider
+        env._ensure_initialized()
+        return env
+    @classmethod
+    def from_hub(
+        cls,
+        repo_id: str,
+        *,
+        use_docker: bool = True,
+        **kwargs: Any,
+    ) -> "REPLEnv":
+        """
+        Create a REPL environment from a HuggingFace Space.
+        Args:
+            repo_id: HuggingFace space identifier (e.g., "openenv/repl-env").
+            use_docker: If True, pull from HF registry. If False, run with UV.
+            **kwargs: Additional arguments passed to provider.
+        Returns:
+            Connected REPLEnv instance.
+        """
+        if use_docker:
+            from openenv.core.containers.runtime import LocalDockerProvider
+            provider = LocalDockerProvider()
+            tag = kwargs.pop("tag", "latest")
+            image = f"registry.hf.space/{repo_id.replace('/', '-')}:{tag}"
+            base_url = provider.start_container(image, **kwargs)
+            provider.wait_for_ready(base_url)
+        else:
+            from openenv.core.containers.runtime import UVProvider
+            project_path = kwargs.pop(
+                "project_path", f"git+https://huggingface.co/spaces/{repo_id}"
+            )
+            provider = UVProvider(project_path=project_path, **kwargs)
+            base_url = provider.start()
+            provider.wait_for_ready()
+        env = cls(base_url=base_url)
+        env._provider = provider
+        env._ensure_initialized()
+        return env
+class _RemoteREPLClient(EnvClient[REPLAction, REPLObservation, REPLState]):
+    """
+    Internal WebSocket client for remote REPL connections.
+    This is the original EnvClient-based implementation, now used internally
+    by REPLEnv for remote mode.
+    """
+    def _step_payload(self, action: REPLAction) -> Dict:
+        """Convert REPLAction to JSON payload for step request."""
+        return {
+            "code": action.code,
+            "is_final": action.is_final,
+            "final_answer": action.final_answer,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[REPLObservation]:
+        """Parse server response into StepResult[REPLObservation]."""
+        obs_data = payload.get("observation", {})
+        result_data = obs_data.get("result", {})
+        observation = REPLObservation(
+            result=CodeBlockResult(
+                stdout=result_data.get("stdout", ""),
+                stderr=result_data.get("stderr", ""),
+                locals_snapshot=result_data.get("locals_snapshot", {}),
+                execution_time=result_data.get("execution_time", 0.0),
+                success=result_data.get("success", True),
+                exception=result_data.get("exception"),
+            ),
+            context_preview=obs_data.get("context_preview"),
+            context_length=obs_data.get("context_length", 0),
+            available_variables=obs_data.get("available_variables", []),
+            iteration=obs_data.get("iteration", 0),
+            max_iterations=obs_data.get("max_iterations", 30),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> REPLState:
+        """Parse server response into REPLState object."""
+        return REPLState(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+            context=payload.get("context"),
+            task_prompt=payload.get("task_prompt"),
+            iteration=payload.get("iteration", 0),
+            max_iterations=payload.get("max_iterations", 30),
+            namespace_keys=payload.get("namespace_keys", []),
+            final_answer=payload.get("final_answer"),
+            total_execution_time=payload.get("total_execution_time", 0.0),
+        )

models.py ADDED Viewed

	@@ -0,0 +1,109 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the REPL Environment.
+The REPL environment provides a Python REPL for training language models
+on code execution tasks, based on the Recursive Language Models (RLM) paradigm.
+Supports two finalization patterns:
+1. RLM-style: print('FINAL(answer)') or print('FINAL_VAR(var_name)')
+2. Prime Intellect style: answer = {"content": "...", "ready": True}
+"""
+from typing import Any, Dict, List, Optional
+from pydantic import BaseModel, Field
+# Support both in-repo and standalone imports
+try:
+    from openenv.core.env_server.types import Action, Observation, State
+except ImportError:
+    from openenv.core.env_server.types import Action, Observation, State
+class REPLAction(Action):
+    """Action containing Python code to execute in the REPL.
+    Supports multiple finalization patterns:
+    1. RLM-style: print('FINAL(answer)') or print('FINAL_VAR(var_name)') in code
+    2. Prime Intellect style: answer = {"content": "...", "ready": True} in namespace
+    3. Explicit: Set is_final=True with final_answer
+    """
+    code: str = Field(default="", description="Python code to execute")
+    is_final: bool = Field(
+        default=False, description="Whether this action signals the final answer"
+    )
+    final_answer: Optional[str] = Field(
+        default=None, description="Final answer if is_final=True"
+    )
+class CodeBlockResult(BaseModel):
+    """Result of executing a single code block."""
+    stdout: str = Field(default="", description="Standard output from execution")
+    stderr: str = Field(default="", description="Standard error from execution")
+    locals_snapshot: Dict[str, str] = Field(
+        default_factory=dict,
+        description="String representations of new/modified variables",
+    )
+    execution_time: float = Field(
+        default=0.0, ge=0, description="Execution time in seconds"
+    )
+    success: bool = Field(default=True, description="Whether execution succeeded")
+    exception: Optional[str] = Field(
+        default=None, description="Exception message if execution failed"
+    )
+class REPLObservation(Observation):
+    """Observation returned after code execution in the REPL."""
+    result: CodeBlockResult = Field(
+        default_factory=CodeBlockResult, description="Result of code execution"
+    )
+    context_preview: Optional[str] = Field(
+        default=None,
+        description="Preview of the context (first N chars) if context is loaded",
+    )
+    context_length: int = Field(
+        default=0, ge=0, description="Total length of context in characters"
+    )
+    available_variables: List[str] = Field(
+        default_factory=list,
+        description="List of variable names available in the namespace",
+    )
+    iteration: int = Field(default=0, ge=0, description="Current iteration number")
+    max_iterations: int = Field(
+        default=30, ge=1, description="Maximum allowed iterations"
+    )
+class REPLState(State):
+    """Extended state for REPL environment."""
+    context: Optional[str] = Field(
+        default=None, description="The context/problem to work with"
+    )
+    task_prompt: Optional[str] = Field(
+        default=None, description="The task description to solve"
+    )
+    iteration: int = Field(default=0, ge=0, description="Current iteration number")
+    max_iterations: int = Field(
+        default=30, ge=1, description="Max iterations before termination"
+    )
+    namespace_keys: List[str] = Field(
+        default_factory=list, description="Variables currently in namespace"
+    )
+    final_answer: Optional[str] = Field(
+        default=None, description="Final answer if episode is complete"
+    )
+    total_execution_time: float = Field(
+        default=0.0, ge=0, description="Total code execution time in seconds"
+    )

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: repl
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

prompts.py ADDED Viewed

	@@ -0,0 +1,373 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+RLM System Prompts and Parsing Utilities for the REPL Environment.
+Based on the official RLM repo: https://github.com/alexzhang13/rlm
+Two versions available:
+- RLM_SYSTEM_PROMPT: Base prompt from the repo (with llm_query_batched)
+- RLM_SYSTEM_PROMPT_QWEN: For Qwen3-Coder-480B (adds IMPORTANT cost warning)
+Parsing utilities help extract code blocks and format observations.
+"""
+import re
+import textwrap
+from dataclasses import dataclass
+from typing import List, Optional
+# =============================================================================
+# Query Metadata (for context info)
+# =============================================================================
+@dataclass
+class QueryMetadata:
+    """Metadata about the context for building prompts."""
+    context_lengths: List[int]
+    context_total_length: int
+    context_type: str = "str"  # "str" or "List[str]"
+# =============================================================================
+# System Prompt from Official RLM Repo
+# =============================================================================
+RLM_SYSTEM_PROMPT = textwrap.dedent(
+    """You are tasked with answering a query with associated context. You can access, transform, and analyze this context interactively in a REPL environment that can recursively query sub-LLMs, which you are strongly encouraged to use as much as possible. You will be queried iteratively until you provide a final answer.
+    The REPL environment is initialized with:
+1. A `context` variable that contains extremely important information about your query. You should check the content of the `context` variable to understand what you are working with. Make sure you look through it sufficiently as you answer your query.
+2. A `llm_query` function that allows you to query an LLM (that can handle around 500K chars) inside your REPL environment.
+3. A `llm_query_batched` function that allows you to query multiple prompts concurrently: `llm_query_batched(prompts: List[str]) -> List[str]`. This is much faster than sequential `llm_query` calls when you have multiple independent queries. Results are returned in the same order as the input prompts.
+4. The ability to use `print()` statements to view the output of your REPL code and continue your reasoning.
+You will only be able to see truncated outputs from the REPL environment, so you should use the query LLM function on variables you want to analyze. You will find this function especially useful when you have to analyze the semantics of the context. Use these variables as buffers to build up your final answer.
+Make sure to explicitly look through the entire context in REPL before answering your query. An example strategy is to first look at the context and figure out a chunking strategy, then break up the context into smart chunks, and query an LLM per chunk with a particular question and save the answers to a buffer, then query an LLM with all the buffers to produce your final answer.
+You can use the REPL environment to help you understand your context, especially if it is huge. Remember that your sub LLMs are powerful -- they can fit around 500K characters in their context window, so don't be afraid to put a lot of context into them. For example, a viable strategy is to feed 10 documents per sub-LLM query. Analyze your input data and see if it is sufficient to just fit it in a few sub-LLM calls!
+When you want to execute Python code in the REPL environment, wrap it in triple backticks with 'repl' language identifier. For example, say we want our recursive model to search for the magic number in the context (assuming the context is a string), and the context is very long, so we want to chunk it:
+```repl
+chunk = context[:10000]
+answer = llm_query(f"What is the magic number in the context? Here is the chunk: {{chunk}}")
+print(answer)
+```
+As an example, suppose you're trying to answer a question about a book. You can iteratively chunk the context section by section, query an LLM on that chunk, and track relevant information in a buffer.
+```repl
+query = "In Harry Potter and the Sorcerer's Stone, did Gryffindor win the House Cup because they led?"
+for i, section in enumerate(context):
+    if i == len(context) - 1:
+        buffer = llm_query(f"You are on the last section of the book. So far you know that: {{buffers}}. Gather from this last section to answer {{query}}. Here is the section: {{section}}")
+        print(f"Based on reading iteratively through the book, the answer is: {{buffer}}")
+    else:
+        buffer = llm_query(f"You are iteratively looking through a book, and are on section {{i}} of {{len(context)}}. Gather information to help answer {{query}}. Here is the section: {{section}}")
+        print(f"After section {{i}} of {{len(context)}}, you have tracked: {{buffer}}")
+```
+As another example, when the context isn't that long (e.g. >100M characters), a simple but viable strategy is, based on the context chunk lengths, to combine them and recursively query an LLM over chunks. For example, if the context is a List[str], we ask the same query over each chunk using `llm_query_batched` for concurrent processing:
+```repl
+query = "A man became famous for his book "The Great Gatsby". How many jobs did he have?"
+# Suppose our context is ~1M chars, and we want each sub-LLM query to be ~0.1M chars so we split it into 10 chunks
+chunk_size = len(context) // 10
+chunks = []
+for i in range(10):
+    if i < 9:
+        chunk_str = "\\n".join(context[i*chunk_size:(i+1)*chunk_size])
+    else:
+        chunk_str = "\\n".join(context[i*chunk_size:])
+    chunks.append(chunk_str)
+# Use batched query for concurrent processing - much faster than sequential calls!
+prompts = [f"Try to answer the following query: {{query}}. Here are the documents:\\n{{chunk}}. Only answer if you are confident in your answer based on the evidence." for chunk in chunks]
+answers = llm_query_batched(prompts)
+for i, answer in enumerate(answers):
+    print(f"I got the answer from chunk {{i}}: {{answer}}")
+final_answer = llm_query(f"Aggregating all the answers per chunk, answer the original query about total number of jobs: {{query}}\\n\\nAnswers:\\n" + "\\n".join(answers))
+```
+As a final example, after analyzing the context and realizing its separated by Markdown headers, we can maintain state through buffers by chunking the context by headers, and iteratively querying an LLM over it:
+```repl
+# After finding out the context is separated by Markdown headers, we can chunk, summarize, and answer
+import re
+sections = re.split(r'### (.+)', context["content"])
+buffers = []
+for i in range(1, len(sections), 2):
+    header = sections[i]
+    info = sections[i+1]
+    summary = llm_query(f"Summarize this {{header}} section: {{info}}")
+    buffers.append(f"{{header}}: {{summary}}")
+final_answer = llm_query(f"Based on these summaries, answer the original query: {{query}}\\n\\nSummaries:\\n" + "\\n".join(buffers))
+```
+In the next step, we can return FINAL_VAR("final_answer").
+IMPORTANT: When you are done with the iterative process, you MUST provide a final answer using one of the FINAL functions. Do not use these unless you have completed your task. You have two options:
+1. Use FINAL(value) to provide the answer directly, e.g., FINAL(42) or FINAL(my_variable)
+2. Use FINAL_VAR("variable_name") to return a variable by name, e.g., FINAL_VAR("final_answer")
+Think step by step carefully, plan, and execute this plan immediately in your response -- do not just say "I will do this" or "I will do that". Output to the REPL environment and recursive LLMs as much as possible. Remember to explicitly answer the original query in your final answer.
+"""
+)
+# =============================================================================
+# System Prompt for Qwen3-Coder-480B (with IMPORTANT cost warning from paper)
+# Adds cost warning after the "sub LLMs are powerful" paragraph
+# =============================================================================
+RLM_SYSTEM_PROMPT_QWEN = textwrap.dedent(
+    """You are tasked with answering a query with associated context. You can access, transform, and analyze this context interactively in a REPL environment that can recursively query sub-LLMs, which you are strongly encouraged to use as much as possible. You will be queried iteratively until you provide a final answer.
+The REPL environment is initialized with:
+1. A `context` variable that contains extremely important information about your query. You should check the content of the `context` variable to understand what you are working with. Make sure you look through it sufficiently as you answer your query.
+2. A `llm_query` function that allows you to query an LLM (that can handle around 500K chars) inside your REPL environment.
+3. A `llm_query_batched` function that allows you to query multiple prompts concurrently: `llm_query_batched(prompts: List[str]) -> List[str]`. This is much faster than sequential `llm_query` calls when you have multiple independent queries. Results are returned in the same order as the input prompts.
+4. The ability to use `print()` statements to view the output of your REPL code and continue your reasoning.
+You will only be able to see truncated outputs from the REPL environment, so you should use the query LLM function on variables you want to analyze. You will find this function especially useful when you have to analyze the semantics of the context. Use these variables as buffers to build up your final answer.
+Make sure to explicitly look through the entire context in REPL before answering your query. An example strategy is to first look at the context and figure out a chunking strategy, then break up the context into smart chunks, and query an LLM per chunk with a particular question and save the answers to a buffer, then query an LLM with all the buffers to produce your final answer.
+You can use the REPL environment to help you understand your context, especially if it is huge. Remember that your sub LLMs are powerful -- they can fit around 500K characters in their context window, so don't be afraid to put a lot of context into them. For example, a viable strategy is to feed 10 documents per sub-LLM query. Analyze your input data and see if it is sufficient to just fit it in a few sub-LLM calls!
+IMPORTANT: Be very careful about using 'llm_query' as it incurs high runtime costs. Always batch as much information as reasonably possible into each call (aim for around ~200k characters per call). For example, if you have 1000 lines of information to process, it's much better to split into chunks of 5 and call 'llm_query' on each chunk (200 calls total) rather than making 1000 individual calls. Minimize the number of 'llm_query' calls by batching related information together.
+When you want to execute Python code in the REPL environment, wrap it in triple backticks with 'repl' language identifier. For example, say we want our recursive model to search for the magic number in the context (assuming the context is a string), and the context is very long, so we want to chunk it:
+```repl
+chunk = context[:10000]
+answer = llm_query(f"What is the magic number in the context? Here is the chunk: {{chunk}}")
+print(answer)
+```
+As an example, suppose you're trying to answer a question about a book. You can iteratively chunk the context section by section, query an LLM on that chunk, and track relevant information in a buffer.
+```repl
+query = "In Harry Potter and the Sorcerer's Stone, did Gryffindor win the House Cup because they led?"
+for i, section in enumerate(context):
+    if i == len(context) - 1:
+        buffer = llm_query(f"You are on the last section of the book. So far you know that: {{buffers}}. Gather from this last section to answer {{query}}. Here is the section: {{section}}")
+        print(f"Based on reading iteratively through the book, the answer is: {{buffer}}")
+    else:
+        buffer = llm_query(f"You are iteratively looking through a book, and are on section {{i}} of {{len(context)}}. Gather information to help answer {{query}}. Here is the section: {{section}}")
+        print(f"After section {{i}} of {{len(context)}}, you have tracked: {{buffer}}")
+```
+As another example, when the context isn't that long (e.g. >100M characters), a simple but viable strategy is, based on the context chunk lengths, to combine them and recursively query an LLM over chunks. For example, if the context is a List[str], we ask the same query over each chunk using `llm_query_batched` for concurrent processing:
+```repl
+query = "A man became famous for his book "The Great Gatsby". How many jobs did he have?"
+# Suppose our context is ~1M chars, and we want each sub-LLM query to be ~0.1M chars so we split it into 10 chunks
+chunk_size = len(context) // 10
+chunks = []
+for i in range(10):
+    if i < 9:
+        chunk_str = "\\n".join(context[i*chunk_size:(i+1)*chunk_size])
+    else:
+        chunk_str = "\\n".join(context[i*chunk_size:])
+    chunks.append(chunk_str)
+# Use batched query for concurrent processing - much faster than sequential calls!
+prompts = [f"Try to answer the following query: {{query}}. Here are the documents:\\n{{chunk}}. Only answer if you are confident in your answer based on the evidence." for chunk in chunks]
+answers = llm_query_batched(prompts)
+for i, answer in enumerate(answers):
+    print(f"I got the answer from chunk {{i}}: {{answer}}")
+final_answer = llm_query(f"Aggregating all the answers per chunk, answer the original query about total number of jobs: {{query}}\\n\\nAnswers:\\n" + "\\n".join(answers))
+```
+As a final example, after analyzing the context and realizing its separated by Markdown headers, we can maintain state through buffers by chunking the context by headers, and iteratively querying an LLM over it:
+```repl
+# After finding out the context is separated by Markdown headers, we can chunk, summarize, and answer
+import re
+sections = re.split(r'### (.+)', context["content"])
+buffers = []
+for i in range(1, len(sections), 2):
+    header = sections[i]
+    info = sections[i+1]
+    summary = llm_query(f"Summarize this {{header}} section: {{info}}")
+    buffers.append(f"{{header}}: {{summary}}")
+final_answer = llm_query(f"Based on these summaries, answer the original query: {{query}}\\n\\nSummaries:\\n" + "\\n".join(buffers))
+```
+In the next step, we can return FINAL_VAR("final_answer").
+IMPORTANT: When you are done with the iterative process, you MUST provide a final answer using one of the FINAL functions. Do not use these unless you have completed your task. You have two options:
+1. Use FINAL(value) to provide the answer directly, e.g., FINAL(42) or FINAL(my_variable)
+2. Use FINAL_VAR("variable_name") to return a variable by name, e.g., FINAL_VAR("final_answer")
+Think step by step carefully, plan, and execute this plan immediately in your response -- do not just say "I will do this" or "I will do that". Output to the REPL environment and recursive LLMs as much as possible. Remember to explicitly answer the original query in your final answer.
+"""
+)
+# =============================================================================
+# User Prompt Templates (from official RLM repo)
+# =============================================================================
+USER_PROMPT = """Think step-by-step on what to do using the REPL environment (which contains the context) to answer the prompt.\n\nContinue using the REPL environment, which has the `context` variable, and querying sub-LLMs by writing to ```repl``` tags, and determine your answer. Your next action:"""
+USER_PROMPT_WITH_ROOT = """Think step-by-step on what to do using the REPL environment (which contains the context) to answer the original prompt: \"{root_prompt}\".\n\nContinue using the REPL environment, which has the `context` variable, and querying sub-LLMs by writing to ```repl``` tags, and determine your answer. Your next action:"""
+# =============================================================================
+# Prompt Building Functions (from official RLM repo)
+# =============================================================================
+def build_rlm_system_prompt(
+    system_prompt: str,
+    query_metadata: QueryMetadata,
+) -> List[dict]:
+    """
+    Build the initial system prompt for the REPL environment based on extra prompt metadata.
+    Args:
+        system_prompt: The system prompt to use
+        query_metadata: QueryMetadata object containing context metadata
+    Returns:
+        List of message dictionaries [system, assistant(metadata)]
+    """
+    context_lengths = query_metadata.context_lengths
+    context_total_length = query_metadata.context_total_length
+    context_type = query_metadata.context_type
+    # If there are more than 100 chunks, truncate to the first 100 chunks.
+    if len(context_lengths) > 100:
+        others = len(context_lengths) - 100
+        context_lengths_str = (
+            str(context_lengths[:100]) + "... [" + str(others) + " others]"
+        )
+    else:
+        context_lengths_str = str(context_lengths)
+    metadata_prompt = f"Your context is a {context_type} with {context_total_length} total characters, and is broken up into chunks of char lengths: {context_lengths_str}."
+    return [
+        {"role": "system", "content": system_prompt},
+        {"role": "assistant", "content": metadata_prompt},
+    ]
+def build_user_prompt(
+    root_prompt: Optional[str] = None,
+    iteration: int = 0,
+    context_count: int = 1,
+    history_count: int = 0,
+) -> dict:
+    """
+    Build the user prompt for a given iteration.
+    Args:
+        root_prompt: The original query/task
+        iteration: Current iteration number (0 = first)
+        context_count: Number of context variables available
+        history_count: Number of prior conversation histories
+    Returns:
+        User message dict
+    """
+    if iteration == 0:
+        safeguard = "You have not interacted with the REPL environment or seen your prompt / context yet. Your next action should be to look through and figure out how to answer the prompt, so don't just provide a final answer yet.\n\n"
+        prompt = safeguard + (
+            USER_PROMPT_WITH_ROOT.format(root_prompt=root_prompt)
+            if root_prompt
+            else USER_PROMPT
+        )
+    else:
+        prompt = "The history before is your previous interactions with the REPL environment. " + (
+            USER_PROMPT_WITH_ROOT.format(root_prompt=root_prompt)
+            if root_prompt
+            else USER_PROMPT
+        )
+    # Inform model about multiple contexts if present
+    if context_count > 1:
+        prompt += f"\n\nNote: You have {context_count} contexts available (context_0 through context_{context_count - 1})."
+    # Inform model about prior conversation histories if present
+    if history_count > 0:
+        if history_count == 1:
+            prompt += "\n\nNote: You have 1 prior conversation history available in the `history` variable."
+        else:
+            prompt += f"\n\nNote: You have {history_count} prior conversation histories available (history_0 through history_{history_count - 1})."
+    return {"role": "user", "content": prompt}
+# =============================================================================
+# Convenience Functions (for backward compatibility)
+# =============================================================================
+def build_initial_prompt(
+    task_prompt: str,
+    context_length: int,
+    context_preview: Optional[str] = None,
+    variables: Optional[List[str]] = None,
+    **kwargs,
+) -> str:
+    """Build the initial user prompt (convenience wrapper).
+    Args:
+        task_prompt: The task to accomplish
+        context_length: Total length of the context
+        context_preview: Preview of the context (not used)
+        variables: List of available variable names (not used)
+    Returns:
+        Formatted initial prompt string
+    """
+    return build_user_prompt(root_prompt=task_prompt, iteration=0)["content"]
+# =============================================================================
+# Parsing Utilities
+# =============================================================================
+def extract_code_blocks(text: str, language: str = "python") -> List[str]:
+    """Extract code blocks from LLM response.
+    Supports both ```repl``` (official RLM) and ```python``` style blocks.
+    Args:
+        text: The LLM response text
+        language: Language identifier to match (default "python")
+    Returns:
+        List of code strings extracted from the response
+    """
+    # Match 'repl' (official) and 'python' (common alternative)
+    patterns = [
+        r"```repl\s*(.*?)```",
+        rf"```{language}\s*(.*?)```",
+    ]
+    all_matches = []
+    for pattern in patterns:
+        matches = re.findall(pattern, text, re.DOTALL)
+        all_matches.extend(m.strip() for m in matches if m.strip())
+    return all_matches
+def format_observation(obs) -> str:
+    """Format a REPLObservation into observation text for the LLM.
+    Args:
+        obs: The REPLObservation from env.step()
+    Returns:
+        Formatted observation string
+    """
+    output = obs.result.stdout.strip() if obs.result.stdout else "(no output)"
+    if obs.result.success:
+        return f"Code output:\n{output}"
+    else:
+        error = obs.result.stderr or obs.result.exception or "Unknown error"
+        return f"Code output:\n{output}\n\nERROR: {error}\nFix the error. Remember: 'context' is already defined."

pyproject.toml ADDED Viewed

	@@ -0,0 +1,43 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-repl"
+version = "0.1.0"
+description = "Recursive Language Model REPL Environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv dependencies (required for server functionality)
+    "openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@main",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn>=0.24.0",
+    "requests>=2.31.0",
+    # Environment-specific dependencies
+    "smolagents>=1.22.0,<2",
+    # LLM support via HuggingFace Inference API
+    "huggingface_hub>=0.20.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m repl_env.server.app
+server = "repl_env.server.app:main"
+[tool.setuptools]
+# Explicitly list packages - "repl_env" maps to current dir
+packages = ["repl_env", "repl_env.server"]
+package-dir = {"repl_env" = ".", "repl_env.server" = "server"}

server/__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+REPL Environment Server Components.
+This module contains the server-side implementation of the REPL environment.
+"""
+from .repl_environment import REPLEnvironment
+from .python_executor import PythonExecutor
+__all__ = [
+    "REPLEnvironment",
+    "PythonExecutor",
+]

server/app.py ADDED Viewed

	@@ -0,0 +1,87 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the REPL Environment.
+This module creates an HTTP server that exposes the REPLEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+The server includes llm_query and llm_query_batched support via HuggingFace Inference API,
+enabling the Recursive Language Model (RLM) paradigm.
+LLM Token Configuration:
+    1. Client can pass `hf_token` in reset() - RECOMMENDED
+    2. Server fallback: HF_TOKEN environment variable
+LLM functions are created dynamically in REPLEnvironment.reset() based on the
+available token (client or server).
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    uv run --project . server
+Environment Variables:
+    HF_TOKEN: Fallback HuggingFace API token (client token takes priority)
+    LLM_MODEL: Model to use for llm_query/llm_query_batched (default: Qwen/Qwen3-Coder-480B-A35B-Instruct)
+"""
+import os
+# Support both in-repo and standalone imports
+try:
+    # In-repo imports (when running from OpenEnv repository)
+    from openenv.core.env_server.http_server import create_app
+    from ..models import REPLAction, REPLObservation
+    from .repl_environment import REPLEnvironment
+except ImportError:
+    # Standalone imports (when environment is standalone with openenv from pip)
+    from openenv.core.env_server.http_server import create_app
+    from models import REPLAction, REPLObservation
+    from server.repl_environment import REPLEnvironment
+# ============== LLM CONFIGURATION ==============
+LLM_MODEL = os.environ.get("LLM_MODEL", "Qwen/Qwen3-Coder-480B-A35B-Instruct")
+HF_TOKEN = os.environ.get("HF_TOKEN", None)
+# ===============================================
+# Log LLM configuration
+if HF_TOKEN:
+    print(f"[REPL Server] LLM support ENABLED (server token configured)")
+    print(f"[REPL Server] Default model: {LLM_MODEL}")
+else:
+    print("[REPL Server] No server HF_TOKEN configured")
+    print("[REPL Server] LLM functions will be enabled if client passes hf_token in reset()")
+# Simple factory - LLM functions are created dynamically in reset() based on token
+env_factory = REPLEnvironment
+# Create the app with web interface and README integration
+app = create_app(env_factory, REPLAction, REPLObservation, env_name="repl_env")
+def main():
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        python -m envs.repl_env.server.app
+        openenv serve repl_env
+    """
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/python_executor.py ADDED Viewed

	@@ -0,0 +1,327 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Sandboxed Python code executor for the REPL environment.
+Uses smolagents.LocalPythonExecutor as the backend for battle-tested sandboxed
+execution, with RLM-specific features on top:
+- Context loading (set_context)
+- Variable access (get_variable, list_variables)
+- Function injection (inject_function for llm_query, llm_query_batched)
+- Output capped at 8,192 characters per turn (configurable)
+- Persistent namespace across code blocks
+"""
+import json
+import logging
+import time
+import traceback
+from collections.abc import Callable
+from typing import Any, Dict, List, Optional
+from smolagents import LocalPythonExecutor
+logger = logging.getLogger(__name__)
+logger.addHandler(logging.NullHandler())
+class PythonExecutor:
+    """Sandboxed Python code executor with persistent namespace.
+    Wraps smolagents.LocalPythonExecutor with RLM-specific features:
+    - Context loading for RLM tasks
+    - Variable tracking for observation
+    - Function injection for llm_query, llm_query_batched
+    - Configurable output length limit (default 8192 chars per Prime Intellect)
+    """
+    def __init__(
+        self,
+        max_output_length: int = 8192,
+        timeout: float = 30.0,
+        allowed_imports: Optional[List[str]] = None,
+    ):
+        """Initialize the executor.
+        Args:
+            max_output_length: Maximum characters for stdout/stderr (default 8192)
+            timeout: Execution timeout in seconds (passed to LocalPythonExecutor)
+            allowed_imports: List of allowed module names for import
+        """
+        self.max_output_length = max_output_length
+        self.timeout = timeout
+        # Default allowed imports for RLM tasks
+        default_imports = [
+            "re",
+            "json",
+            "math",
+            "random",
+            "collections",
+            "itertools",
+            "functools",
+            "operator",
+            "string",
+            "textwrap",
+            "difflib",
+            "statistics",
+            "decimal",
+            "fractions",
+            "datetime",
+            "copy",
+            "pprint",
+            "typing",
+            "dataclasses",
+            "enum",
+            "bisect",
+            "heapq",
+            "array",
+            "struct",
+            "base64",
+            "hashlib",
+            "hmac",
+            "uuid",
+        ]
+        self.allowed_imports = allowed_imports or default_imports
+        # Initialize the smolagents executor
+        self._executor = LocalPythonExecutor(
+            additional_authorized_imports=self.allowed_imports
+        )
+        # Track variables we've set (for list_variables)
+        self._user_variables: set[str] = set()
+        # Track callable functions to register with send_tools
+        self._callable_tools: Dict[str, Callable[..., Any]] = {}
+        # Register helper utilities
+        self._register_helpers()
+    def _register_helpers(self) -> None:
+        """Register helper functions with the executor."""
+        helpers = {
+            "format_exc": traceback.format_exc,
+            "safe_json_dumps": lambda obj: json.dumps(obj, default=lambda o: repr(o)),
+        }
+        # Register helpers as callable tools
+        for name, func in helpers.items():
+            self.inject_function(name, func)
+    def _sync_callable_tools(self) -> None:
+        """Sync callable functions with the executor via send_tools."""
+        if self._callable_tools:
+            try:
+                # Type ignore: smolagents accepts callables despite Tool type hint
+                self._executor.send_tools(self._callable_tools)  # type: ignore[arg-type]
+            except Exception:
+                logger.debug("send_tools failed; continuing without extra tools", exc_info=True)
+    def set_context(self, context: str, variable_name: str = "context") -> None:
+        """Load context into namespace as a variable.
+        Args:
+            context: The context string to load
+            variable_name: Name of the variable (default "context")
+        """
+        self.set_variable(variable_name, context)
+    def set_variable(self, name: str, value: Any) -> None:
+        """Set a variable in the namespace.
+        Args:
+            name: Variable name
+            value: Variable value
+        """
+        # Access the executor's internal state to set variables
+        if hasattr(self._executor, 'state'):
+            self._executor.state[name] = value
+        else:
+            # Fallback: store in injected vars for later retrieval
+            self._executor._injected_vars = getattr(self._executor, '_injected_vars', {})
+            self._executor._injected_vars[name] = value
+        self._user_variables.add(name)
+    def get_variable(self, name: str) -> Optional[Any]:
+        """Retrieve a variable from namespace.
+        Args:
+            name: Variable name
+        Returns:
+            The variable value or None if not found
+        """
+        # Try to get from executor's state
+        if hasattr(self._executor, 'state'):
+            return self._executor.state.get(name)
+        # Fallback to injected vars
+        if hasattr(self._executor, '_injected_vars'):
+            return self._executor._injected_vars.get(name)
+        return None
+    def list_variables(self) -> List[str]:
+        """List non-private variables in namespace.
+        Returns:
+            List of variable names (excluding private and builtins)
+        """
+        variables = set()
+        # Get from executor's state
+        if hasattr(self._executor, 'state'):
+            for key in self._executor.state:
+                if not key.startswith('_'):
+                    variables.add(key)
+        # Include tracked user variables
+        variables.update(self._user_variables)
+        return list(variables)
+    def execute(self, code: str) -> Dict[str, Any]:
+        """Execute Python code and return results.
+        Args:
+            code: Python code to execute
+        Returns:
+            Dictionary with stdout, stderr, locals_snapshot, execution_time,
+            success, and exception fields
+        """
+        start_time = time.time()
+        success = True
+        exception_msg = None
+        new_locals: Dict[str, str] = {}
+        # Track state before execution
+        pre_state_keys = set()
+        if hasattr(self._executor, 'state'):
+            pre_state_keys = set(self._executor.state.keys())
+        stdout_parts: list[str] = []
+        stderr_parts: list[str] = []
+        try:
+            exec_result = self._executor(code)
+            # Extract logs/prints
+            try:
+                logs = getattr(exec_result, "logs", None)
+                if logs:
+                    stdout_parts.append(str(logs))
+            except Exception:
+                logger.debug("Failed to read exec_result.logs", exc_info=True)
+            # Extract the result / output value
+            try:
+                if hasattr(exec_result, "output"):
+                    out_val = exec_result.output
+                    if out_val is not None:
+                        try:
+                            stdout_parts.append(json.dumps(out_val))
+                        except Exception:
+                            stdout_parts.append(repr(out_val))
+            except Exception:
+                logger.debug("Failed to read exec_result.output", exc_info=True)
+            # Check for errors
+            try:
+                err = getattr(exec_result, "error", None)
+                if err:
+                    stderr_parts.append(str(err))
+                    success = False
+                    exception_msg = str(err)
+            except Exception:
+                logger.debug("Failed to read exec_result.error", exc_info=True)
+            try:
+                ex = getattr(exec_result, "exception", None)
+                if ex:
+                    stderr_parts.append(str(ex))
+                    success = False
+                    exception_msg = str(ex)
+            except Exception:
+                logger.debug("Failed to read exec_result.exception", exc_info=True)
+            # Determine success from exit_code if available
+            try:
+                if hasattr(exec_result, "exit_code"):
+                    if exec_result.exit_code is not None and exec_result.exit_code != 0:
+                        success = False
+                elif hasattr(exec_result, "success"):
+                    success = bool(exec_result.success)
+            except Exception:
+                logger.debug("Failed to determine exec_result exit code", exc_info=True)
+        except Exception as e:
+            success = False
+            exception_msg = f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}"
+            stderr_parts.append(exception_msg)
+        execution_time = time.time() - start_time
+        # Capture new/modified variables
+        if hasattr(self._executor, 'state'):
+            for key in self._executor.state:
+                if key not in pre_state_keys and not key.startswith('_'):
+                    try:
+                        val = self._executor.state[key]
+                        val_repr = repr(val)
+                        if len(val_repr) > 500:
+                            val_repr = val_repr[:500] + "..."
+                        new_locals[key] = val_repr
+                        self._user_variables.add(key)
+                    except Exception:
+                        new_locals[key] = "<unrepresentable>"
+        # Compose stdout/stderr
+        stdout = "\n".join(part for part in stdout_parts if part)
+        stderr = "\n".join(part for part in stderr_parts if part)
+        # Truncate output to max_output_length
+        if len(stdout) > self.max_output_length:
+            stdout = stdout[:self.max_output_length] + f"\n... (truncated, total {len(stdout)} chars)"
+        if len(stderr) > self.max_output_length:
+            stderr = stderr[:self.max_output_length] + f"\n... (truncated, total {len(stderr)} chars)"
+        return {
+            "stdout": stdout,
+            "stderr": stderr,
+            "locals_snapshot": new_locals,
+            "execution_time": execution_time,
+            "success": success,
+            "exception": exception_msg,
+        }
+    def reset(self) -> None:
+        """Reset namespace to initial state."""
+        # Create a new executor instance
+        self._executor = LocalPythonExecutor(
+            additional_authorized_imports=self.allowed_imports
+        )
+        self._user_variables.clear()
+        self._callable_tools.clear()
+        self._register_helpers()
+    def inject_function(self, name: str, func: Callable[..., Any]) -> None:
+        """Inject a callable function into the namespace.
+        Used for adding llm_query, llm_query_batched, FINAL, etc.
+        Args:
+            name: Function name in namespace
+            func: The callable to inject
+        """
+        # Add to callable tools and sync with executor
+        self._callable_tools[name] = func
+        self._user_variables.add(name)
+        self._sync_callable_tools()

server/repl_environment.py ADDED Viewed

	@@ -0,0 +1,512 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+REPL Environment Implementation.
+A Python REPL environment for training language models on code execution tasks,
+based on the Recursive Language Models (RLM) paradigm.
+References:
+- RLM Paper: https://arxiv.org/abs/2512.24601
+- Prime Intellect Blog: https://www.primeintellect.ai/blog/rlm
+- Alex Zhang Blog: https://alexzhang13.github.io/blog/2025/rlm/
+"""
+import os
+import re
+from collections.abc import Callable
+from typing import Any, Dict, List, Optional
+from uuid import uuid4
+# Support both in-repo and standalone imports
+try:
+    from openenv.core.env_server.interfaces import Environment
+    from openenv.core.env_server.types import EnvironmentMetadata
+except ImportError:
+    from openenv.core.env_server.interfaces import Environment
+    from openenv.core.env_server.types import EnvironmentMetadata
+try:
+    from ..models import REPLAction, REPLObservation, REPLState, CodeBlockResult
+except ImportError:
+    from models import REPLAction, REPLObservation, REPLState, CodeBlockResult
+try:
+    from .python_executor import PythonExecutor
+except ImportError:
+    from python_executor import PythonExecutor
+class REPLEnvironment(Environment):
+    """
+    A REPL environment for training language models to use code execution.
+    Based on the Recursive Language Models (RLM) paradigm, this environment allows
+    language models to:
+    - Execute Python code in a sandboxed REPL
+    - Work with large contexts loaded as variables
+    - Finalize answers via FINAL(), FINAL_VAR(), or answer dict pattern
+    - Optionally make recursive LLM calls via llm_query() / llm_query_batched()
+    Supports two finalization patterns:
+    1. RLM-style: print('FINAL(answer)') or print('FINAL_VAR(var_name)')
+    2. Prime Intellect style: answer = {"content": "...", "ready": True}
+    Example:
+        >>> env = REPLEnvironment(context="Hello World", task_prompt="Count chars")
+        >>> obs = env.reset()
+        >>> print(obs.context_preview)  # "Hello World"
+        >>>
+        >>> obs = env.step(REPLAction(code="result = len(context)"))
+        >>> print(obs.result.success)  # True
+        >>> print(obs.available_variables)  # ["context", "result", "answer"]
+        >>>
+        >>> obs = env.step(REPLAction(code="print(f'FINAL({result})')"))
+        >>> print(obs.done)  # True
+        >>> print(obs.metadata["final_answer"])  # "11"
+    """
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    def __init__(
+        self,
+        context: Optional[str] = None,
+        task_prompt: Optional[str] = None,
+        max_iterations: int = 30,
+        max_output_length: int = 8192,
+        context_preview_length: int = 500,
+        reward_on_success: float = 1.0,
+        reward_on_iteration: float = 0.0,
+        reward_on_failure: float = -0.1,
+        reward_on_error: float = -0.05,
+        llm_query_fn: Optional[Callable[[str], str]] = None,
+        llm_batch_fn: Optional[Callable[[List[str]], List[str]]] = None,
+    ):
+        """Initialize the REPL environment.
+        Args:
+            context: Initial context to load (can also be set via REPL_CONTEXT env var)
+            task_prompt: Task description (can also be set via REPL_TASK_PROMPT env var)
+            max_iterations: Maximum steps per episode (default 30, env var REPL_MAX_ITERATIONS)
+            max_output_length: Max chars for stdout/stderr per turn (default 8192)
+            context_preview_length: Chars to show in context preview (default 500)
+            reward_on_success: Reward when final answer is submitted (default 1.0)
+            reward_on_iteration: Reward per iteration step (default 0.0)
+            reward_on_failure: Reward when max iterations reached (default -0.1)
+            reward_on_error: Reward when code execution fails (default -0.05)
+            llm_query_fn: Optional function for llm_query() support
+            llm_batch_fn: Optional function for llm_query_batched() support
+        """
+        self.initial_context = context or os.environ.get("REPL_CONTEXT", "")
+        self.initial_task_prompt = task_prompt or os.environ.get("REPL_TASK_PROMPT", "")
+        self.max_iterations = int(os.environ.get("REPL_MAX_ITERATIONS", max_iterations))
+        self.max_output_length = max_output_length
+        self.context_preview_length = context_preview_length
+        # Reward configuration
+        self.reward_on_success = reward_on_success
+        self.reward_on_iteration = reward_on_iteration
+        self.reward_on_failure = reward_on_failure
+        self.reward_on_error = reward_on_error
+        # Optional LLM functions for recursive calls
+        self.llm_query_fn = llm_query_fn
+        self.llm_batch_fn = llm_batch_fn
+        # State (initialized on reset)
+        self._state: Optional[REPLState] = None
+        self._executor: Optional[PythonExecutor] = None
+    def _create_llm_functions(
+        self,
+        hf_token: str,
+        llm_model: Optional[str] = None,
+    ) -> None:
+        """Create LLM functions dynamically using client-provided token.
+        This allows clients to use their own HF token instead of the server's.
+        Args:
+            hf_token: HuggingFace API token
+            llm_model: Model to use (default: Qwen/Qwen3-Coder-480B-A35B-Instruct)
+        """
+        from concurrent.futures import ThreadPoolExecutor, as_completed
+        try:
+            from huggingface_hub import InferenceClient
+        except ImportError:
+            # huggingface_hub not installed, skip LLM functions
+            return
+        model = llm_model or os.environ.get(
+            "LLM_MODEL", "Qwen/Qwen3-Coder-480B-A35B-Instruct"
+        )
+        client = InferenceClient(model=model, token=hf_token)
+        def llm_query(prompt: str) -> str:
+            """Query the LLM with a prompt and return the response."""
+            try:
+                messages = [{"role": "user", "content": prompt}]
+                response = client.chat_completion(
+                    messages=messages,
+                    max_tokens=2048,
+                    temperature=0.7,
+                )
+                return response.choices[0].message.content or ""
+            except Exception as e:
+                return f"Error calling LLM: {e}"
+        def llm_query_batched(prompts: List[str]) -> List[str]:
+            """Query the LLM with multiple prompts in parallel."""
+            if not prompts:
+                return []
+            max_workers = min(len(prompts), 8)
+            results: List[str] = [""] * len(prompts)
+            with ThreadPoolExecutor(max_workers=max_workers) as executor:
+                future_to_idx = {
+                    executor.submit(llm_query, prompt): idx
+                    for idx, prompt in enumerate(prompts)
+                }
+                for future in as_completed(future_to_idx):
+                    idx = future_to_idx[future]
+                    try:
+                        results[idx] = future.result()
+                    except Exception as e:
+                        results[idx] = f"Error: {e}"
+            return results
+        self.llm_query_fn = llm_query
+        self.llm_batch_fn = llm_query_batched
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        context: Optional[str] = None,
+        task_prompt: Optional[str] = None,
+        hf_token: Optional[str] = None,
+        llm_model: Optional[str] = None,
+        **kwargs: Any,
+    ) -> REPLObservation:
+        """Reset the environment with optional new context.
+        Args:
+            seed: Optional random seed (for reproducibility)
+            episode_id: Optional episode identifier (if not provided, one is generated)
+            context: Context to load (overrides initial_context)
+            task_prompt: Task description (overrides initial_task_prompt)
+            hf_token: Optional HuggingFace token for llm_query/llm_query_batched.
+                      If provided, creates LLM functions using this token.
+            llm_model: Optional model name for LLM functions (default: from env or Qwen3-Coder)
+            **kwargs: Additional reset parameters
+        Returns:
+            Initial REPLObservation with environment ready message
+        """
+        effective_context = context or self.initial_context
+        effective_task_prompt = task_prompt or self.initial_task_prompt
+        # Create LLM functions if not already provided at init
+        # Priority: client hf_token > server HF_TOKEN env var
+        if not self.llm_query_fn:
+            effective_token = hf_token or os.environ.get("HF_TOKEN")
+            if effective_token:
+                self._create_llm_functions(effective_token, llm_model)
+        # Initialize state
+        self._state = REPLState(
+            episode_id=episode_id or str(uuid4()),
+            step_count=0,
+            context=effective_context,
+            task_prompt=effective_task_prompt,
+            iteration=0,
+            max_iterations=self.max_iterations,
+            namespace_keys=[],
+            final_answer=None,
+            total_execution_time=0.0,
+        )
+        # Initialize executor
+        self._executor = PythonExecutor(max_output_length=self.max_output_length)
+        # Initialize answer dict (Prime Intellect style)
+        self._executor.set_variable("answer", {"content": "", "ready": False})
+        # Load context into namespace if provided
+        if effective_context:
+            self._executor.set_context(effective_context)
+        # Inject LLM functions if provided
+        # Names: llm_query (single), llm_query_batched (official RLM), llm_batch (alias)
+        if self.llm_query_fn:
+            self._executor.inject_function("llm_query", self.llm_query_fn)
+        if self.llm_batch_fn:
+            self._executor.inject_function("llm_query_batched", self.llm_batch_fn)  # Official name
+            self._executor.inject_function("llm_batch", self.llm_batch_fn)  # Alias
+        # Inject FINAL helper function so both FINAL(x) and print(f'FINAL({x})') work
+        # Returns the FINAL pattern as a string so it appears in stdout for detection
+        def final_helper(value):
+            """Helper that returns FINAL(value) string for detection."""
+            return f"FINAL({value})"
+        self._executor.inject_function("FINAL", final_helper)
+        # Inject FINAL_VAR helper that looks up variable and returns FINAL(value)
+        # This matches official RLM behavior - strips quotes from var_name and looks up in namespace
+        executor = self._executor  # Capture for closure
+        def final_var_helper(var_name: str):
+            """Look up variable by name and return FINAL(value) for detection."""
+            # Strip quotes if present (handles both FINAL_VAR("x") and FINAL_VAR(x))
+            var_name_clean = str(var_name).strip().strip("\"'")
+            # Look up variable in executor namespace
+            value = executor.get_variable(var_name_clean)
+            if value is not None:
+                return f"FINAL({value})"
+            return f"FINAL_VAR({var_name_clean})"  # Fallback for regex detection
+        self._executor.inject_function("FINAL_VAR", final_var_helper)
+        # Update namespace keys
+        self._state.namespace_keys = self._executor.list_variables()
+        # Build initial message
+        message_parts = ["REPL environment initialized."]
+        if effective_context:
+            message_parts.append(
+                f"Context loaded ({len(effective_context)} chars). "
+                "Use 'context' variable to access it."
+            )
+        if effective_task_prompt:
+            message_parts.append(f"Task: {effective_task_prompt}")
+        message_parts.append(
+            "Use answer['content'] to store your answer, "
+            "and set answer['ready'] = True when done."
+        )
+        return REPLObservation(
+            result=CodeBlockResult(
+                stdout="\n".join(message_parts),
+                stderr="",
+                locals_snapshot={},
+                execution_time=0.0,
+                success=True,
+                exception=None,
+            ),
+            context_preview=(
+                effective_context[: self.context_preview_length]
+                if effective_context
+                else None
+            ),
+            context_length=len(effective_context) if effective_context else 0,
+            available_variables=self._state.namespace_keys,
+            iteration=0,
+            max_iterations=self.max_iterations,
+            done=False,
+            reward=0.0,
+            metadata={
+                "task_prompt": effective_task_prompt,
+                "message": "Environment ready.",
+            },
+        )
+    def step(
+        self,
+        action: REPLAction,
+        timeout_s: Optional[float] = None,
+        **kwargs: Any,
+    ) -> REPLObservation:
+        """Execute code and return observation.
+        Args:
+            action: REPLAction containing code to execute
+            timeout_s: Optional timeout in seconds (not currently used)
+            **kwargs: Additional step parameters
+        Returns:
+            REPLObservation with execution results
+        """
+        if self._state is None or self._executor is None:
+            raise RuntimeError("Environment not initialized. Call reset() first.")
+        self._state.step_count += 1
+        self._state.iteration += 1
+        # Check if agent explicitly signals final answer
+        if action.is_final:
+            self._state.final_answer = action.final_answer or ""
+            return self._create_final_observation(
+                success=True,
+                message="Final answer submitted.",
+                reward=self.reward_on_success,
+            )
+        # Check iteration limit
+        if self._state.iteration >= self.max_iterations:
+            # Check if there's a partial answer in the answer dict
+            answer_var = self._executor.get_variable("answer")
+            if isinstance(answer_var, dict) and answer_var.get("content"):
+                self._state.final_answer = str(answer_var.get("content", ""))
+            return self._create_final_observation(
+                success=False,
+                message=f"Maximum iterations ({self.max_iterations}) reached.",
+                reward=self.reward_on_failure,
+            )
+        # Execute code
+        result = self._executor.execute(action.code)
+        self._state.total_execution_time += result["execution_time"]
+        self._state.namespace_keys = self._executor.list_variables()
+        # Calculate reward
+        reward = self.reward_on_iteration
+        if not result["success"]:
+            reward += self.reward_on_error
+        # Check for final answer patterns
+        final_answer = self._extract_final_answer(result["stdout"])
+        done = final_answer is not None
+        if done:
+            self._state.final_answer = final_answer
+            reward = self.reward_on_success
+        return REPLObservation(
+            result=CodeBlockResult(
+                stdout=result["stdout"],
+                stderr=result["stderr"],
+                locals_snapshot=result["locals_snapshot"],
+                execution_time=result["execution_time"],
+                success=result["success"],
+                exception=result["exception"],
+            ),
+            context_preview=(
+                self._state.context[: self.context_preview_length]
+                if self._state.context
+                else None
+            ),
+            context_length=len(self._state.context) if self._state.context else 0,
+            available_variables=self._state.namespace_keys,
+            iteration=self._state.iteration,
+            max_iterations=self.max_iterations,
+            done=done,
+            reward=reward,
+            metadata={
+                "task_prompt": self._state.task_prompt,
+                "final_answer": final_answer,
+                "execution_time": result["execution_time"],
+            },
+        )
+    def _extract_final_answer(self, stdout: str) -> Optional[str]:
+        """Extract final answer from output.
+        Supports multiple patterns:
+        1. RLM-style: FINAL(answer) in stdout
+        2. RLM-style: FINAL_VAR(variable_name) in stdout
+        3. Prime Intellect style: answer = {"content": "...", "ready": True} in namespace
+        Args:
+            stdout: Standard output from code execution
+        Returns:
+            Final answer string or None if not found
+        """
+        # Pattern 1: RLM-style FINAL(answer)
+        final_match = re.search(r"FINAL\((.*?)\)", stdout, re.DOTALL)
+        if final_match:
+            return final_match.group(1).strip()
+        # Pattern 2: RLM-style FINAL_VAR(variable_name)
+        final_var_match = re.search(r"FINAL_VAR\((\w+)\)", stdout)
+        if final_var_match and self._executor:
+            var_name = final_var_match.group(1)
+            value = self._executor.get_variable(var_name)
+            if value is not None:
+                return str(value)
+        # Pattern 3: Prime Intellect style answer dict
+        if self._executor:
+            answer_var = self._executor.get_variable("answer")
+            if isinstance(answer_var, dict):
+                if answer_var.get("ready", False):
+                    return str(answer_var.get("content", ""))
+        return None
+    def _create_final_observation(
+        self, success: bool, message: str, reward: float
+    ) -> REPLObservation:
+        """Create observation for episode termination.
+        Args:
+            success: Whether the episode ended successfully
+            message: Termination message
+            reward: Final reward value
+        Returns:
+            Final REPLObservation with done=True
+        """
+        return REPLObservation(
+            result=CodeBlockResult(
+                stdout=message,
+                stderr="",
+                locals_snapshot={},
+                execution_time=0.0,
+                success=success,
+                exception=None,
+            ),
+            context_preview=None,
+            context_length=0,
+            available_variables=[],
+            iteration=self._state.iteration if self._state else 0,
+            max_iterations=self.max_iterations,
+            done=True,
+            reward=reward,
+            metadata={
+                "final_answer": self._state.final_answer if self._state else None,
+                "total_execution_time": (
+                    self._state.total_execution_time if self._state else 0
+                ),
+                "total_iterations": self._state.iteration if self._state else 0,
+            },
+        )
+    @property
+    def state(self) -> REPLState:
+        """Get the current environment state.
+        Returns:
+            Current REPLState
+        Raises:
+            RuntimeError: If environment not initialized
+        """
+        if self._state is None:
+            raise RuntimeError("Environment not initialized. Call reset() first.")
+        return self._state
+    def close(self) -> None:
+        """Cleanup resources."""
+        self._executor = None
+        self._state = None
+    def get_metadata(self) -> EnvironmentMetadata:
+        """Get environment metadata.
+        Returns:
+            EnvironmentMetadata with environment info
+        """
+        return EnvironmentMetadata(
+            name="repl_env",
+            description="Python REPL environment for RLM-style code execution",
+            version="0.1.0",
+        )

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff