arjeet commited on
Commit
e03aa3c
·
0 Parent(s):

DocSweeper v1

Browse files
Files changed (17) hide show
  1. .dockerignore +15 -0
  2. .gitignore +10 -0
  3. .python-version +1 -0
  4. Dockerfile +19 -0
  5. README.md +129 -0
  6. __init__.py +17 -0
  7. client.py +101 -0
  8. inference.py +118 -0
  9. main.py +6 -0
  10. models.py +77 -0
  11. openenv.yaml +7 -0
  12. pyproject.toml +15 -0
  13. requirements.txt +5 -0
  14. server/__init__.py +11 -0
  15. server/app.py +21 -0
  16. server/cust_env_environment.py +193 -0
  17. uv.lock +0 -0
.dockerignore ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .venv
2
+ .git
3
+ .gitignore
4
+ .env
5
+ __pycache__/
6
+ *.pyc
7
+ *.pyo
8
+ *.pyd
9
+ *.pyw
10
+ *.pyz
11
+ *.pywz
12
+ *.pyzw
13
+ *.pyzwz
14
+
15
+
.gitignore ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.10
Dockerfile ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app/env
4
+
5
+ RUN apt-get update && apt-get install -y \
6
+ git \
7
+ && rm -rf /var/lib/apt/lists/*
8
+
9
+ COPY . .
10
+
11
+ RUN pip install --no-cache-dir -r requirements.txt
12
+
13
+ EXPOSE 8000
14
+
15
+ ENV PYTHONUNBUFFERED=1
16
+ ENV ENABLE_WEB_INTERFACE=true
17
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
18
+
19
+ CMD ["python", "-m", "uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Doc Sweeper Environment
2
+
3
+ A virtual file system and text-editing environment for OpenEnv. This environment tasks autonomous LLM agents with acting as automated documentation engineers, requiring them to navigate a directory tree, read files, and apply precise string manipulations to complete complex refactoring tasks.
4
+
5
+ ## Overview
6
+
7
+ The Doc Sweeper environment provides a sandboxed, in-memory file system where agents can interact with dummy codebases and documentation. It evaluates an agent's ability to retain context, plan multi-step operations, and use tools correctly.
8
+
9
+ ### Features
10
+
11
+ * **Virtual File System**: In-memory directory tree with nested files.
12
+ * **Strict Tooling**: Requires agents to explicitly `open` files before applying `edit` commands.
13
+ * **Granular Feedback**: Provides immediate terminal feedback and linter issues upon illegal actions or formatting errors.
14
+ * **Three Distinct Scenarios**: Evaluates different logic flows (global search/replace, YAML refactoring, path resolution).
15
+
16
+ ### Task Rules
17
+
18
+ The environment supports three primary tasks:
19
+
20
+ * `version_bump`: The agent must find all outdated version numbers (e.g., `v1.0.0` or `v1.00`) across all files and update them to `v2.0.0`.
21
+ * `config_migration`: The agent must open docker-compose files, update the version to `3.8`, and migrate `links` keys to `networks`.
22
+ * `broken_links`: The agent must find broken relative markdown links and edit them to point to correct file paths.
23
+
24
+ ---
25
+
26
+ ## Quick Start
27
+
28
+ ### Running the Baseline Inference (Recommended)
29
+
30
+ The easiest way to test the environment is using the provided Chain-of-Thought agent script.
31
+
32
+ ```bash
33
+ # Export your required credentials
34
+ export HF_TOKEN="your_api_key_here"
35
+ export API_BASE_URL="[https://api.openai.com/v1](https://api.openai.com/v1)"
36
+ export MODEL_NAME="gpt-4o-mini"
37
+ ```
38
+
39
+ # Run the inference script across all tasks
40
+ python inference.py
41
+
42
+ ## Using Local Server
43
+ You can host the environment locally to manually test the API endpoints.
44
+
45
+ ```bash
46
+ # Install dependencies
47
+ pip install -r requirements.txt
48
+ ```
49
+
50
+
51
+ # Run server
52
+ ```bash
53
+ python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
54
+ ```
55
+ ## Actions
56
+
57
+ The action space is defined by the `DocAction` schema. The agent must provide a single JSON object with a `tool_name` and the corresponding required fields:
58
+
59
+ * **`open`**: Opens a file. Requires the `path` parameter.
60
+ * **`edit`**: Replaces text in the currently active file. Requires exact string matching via `old_str` and `new_str`.
61
+ * **`grep`**: Searches the active file (or directory). Requires `search_query`.
62
+ * **`done`**: Signals that the task is complete.
63
+
64
+ ## Observations
65
+
66
+ Each observation (`DocObservation`) returned by the environment includes:
67
+
68
+ * **`active_file`**: The file currently opened by the agent.
69
+ * **`terminal_feedback`**: Error messages, success logs, or system alerts resulting from the last action.
70
+ * **`directory_tree`**: A JSON representation of the current file system hierarchy.
71
+ * **`file_content`**: The textual content of the currently active file.
72
+ * **`issues_detected`**: A list of simulated linter errors (if the agent breaks a file's formatting).
73
+
74
+ ## Configuration
75
+
76
+ ### Reward Structure
77
+
78
+ The environment issues rewards based on the agent's efficiency and accuracy:
79
+
80
+ * **Valid Tool Usage**: `0.0` (Neutral, but advances the state).
81
+ * **Tool Misuse Penalty**: `-0.1` (e.g., trying to edit without opening a file, or providing a bad file path).
82
+ * **Task Completion**: `1.0` (Awarded only when `done` is called and all objective checks pass).
83
+ * **Early/Failed Completion**: `-1.0` (Calling `done` before fixing all required strings).
84
+
85
+ ## Building and Deployment
86
+
87
+ ### Build Docker Image
88
+
89
+ From the repository root:
90
+
91
+ # Build the environment image
92
+
93
+ ```bash
94
+ docker build -t doc-sweeper-env:latest .
95
+ ```
96
+
97
+ The Dockerfile uses pip install with requirements.txt for maximum compatibility with Hugging Face Spaces.
98
+
99
+ # Run the container locally
100
+
101
+ ```bash
102
+ docker run -p 8000:8000 doc-sweeper-env:latest
103
+ ```
104
+ The FastAPI OpenEnv endpoints will be available at `http://localhost:8000/reset` and `http://localhost:8000/step`.
105
+
106
+ ---
107
+
108
+ ## Dependencies
109
+
110
+ The Doc Sweeper environment requires:
111
+
112
+ * **`fastapi` & `uvicorn`**: For serving the OpenEnv endpoints.
113
+ * **`pydantic`**: For strict action and observation schema validation.
114
+ * **`openai` / `groq`**: For the baseline LLM inference script.
115
+
116
+ These are automatically installed when using Docker or installing via `pip install -r requirements.txt`.
117
+
118
+ ---
119
+
120
+ ## Example Evaluation Log Output
121
+
122
+ When running `inference.py`, the agent emits strictly formatted logs for the automated graders:
123
+
124
+ ```text
125
+ [START] task=version_bump model=gpt-4o-mini
126
+ [STEP] step=1 action=open reward=0.00 done=False thought="Opening setup.md to check for versions."
127
+ [STEP] step=2 action=edit reward=0.00 done=False thought="Replacing v1.0.0 with v2.0.0."
128
+ [STEP] step=3 action=done reward=1.00 done=True thought="All files have been checked."
129
+ [END] task=version_bump score=1.00 total_steps=3 runtime_seconds=4.2
__init__.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cust Env Environment."""
8
+
9
+ from client import DocSweeperEnv
10
+ from models import DocAction, DocObservation, DocState
11
+
12
+ __all__ = [
13
+ "DocSweeperEnv",
14
+ "DocObservation",
15
+ "DocAction",
16
+ "DocState"
17
+ ]
client.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Doc Sweeper Environment Client.
9
+
10
+ This module provides the client for connecting to a Doc Sweeper Environment server
11
+ via WebSocket for persistent sessions.
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ from typing import Any, Dict
17
+
18
+ from openenv.core.client_types import StepResult
19
+ from openenv.core.env_client import EnvClient
20
+
21
+ from models import DocAction, DocObservation, DocState
22
+
23
+
24
+ class DocSweeperEnv(EnvClient[DocAction, DocObservation, DocState]):
25
+ """
26
+ Client for Doc Sweeper Environment.
27
+
28
+ This client maintains a persistent WebSocket connection to the environment
29
+ server, enabling efficient multi-step interactions with lower latency.
30
+
31
+ Example:
32
+ >>> with DocSweeperEnv(base_url="http://localhost:8000") as client:
33
+ ... result = client.reset()
34
+ ... print(result.observation.terminal_feedback)
35
+ ...
36
+ ... result = client.step(DocAction(tool_name="open", path="/docs/setup.md"))
37
+ ... print(result.observation.file_content)
38
+ """
39
+
40
+ def _step_payload(self, action: DocAction) -> Dict[str, Any]:
41
+ """
42
+ Convert DocAction to JSON payload for step request.
43
+
44
+ Args:
45
+ action: DocAction instance with tool parameters.
46
+
47
+ Returns:
48
+ Dictionary representation suitable for JSON encoding.
49
+ """
50
+ return {
51
+ "tool_name": action.tool_name,
52
+ "path": action.path,
53
+ "old_str": action.old_str,
54
+ "new_str": action.new_str,
55
+ "search_query": action.search_query,
56
+ }
57
+
58
+ def _parse_result(self, payload: Dict[str, Any]) -> StepResult[DocObservation]:
59
+ """
60
+ Parse server response into StepResult[DocObservation].
61
+
62
+ Args:
63
+ payload: JSON response from server.
64
+
65
+ Returns:
66
+ StepResult with DocObservation.
67
+ """
68
+ obs_data = payload.get("observation", {})
69
+
70
+ observation = DocObservation(
71
+ active_file=obs_data.get("active_file", ""),
72
+ file_content=obs_data.get("file_content", ""),
73
+ directory_tree=obs_data.get("directory_tree", {}),
74
+ issues_detected=obs_data.get("issues_detected", []),
75
+ terminal_feedback=obs_data.get("terminal_feedback", ""),
76
+ done=obs_data.get("done", False),
77
+ reward=obs_data.get("reward", 0.0),
78
+ )
79
+
80
+ return StepResult(
81
+ observation=observation,
82
+ reward=observation.reward,
83
+ done=observation.done,
84
+ )
85
+
86
+ def _parse_state(self, payload: Dict[str, Any]) -> DocState:
87
+ """
88
+ Parse server response into DocState object.
89
+
90
+ Args:
91
+ payload: JSON response from /state endpoint.
92
+
93
+ Returns:
94
+ DocState object with environment state information.
95
+ """
96
+ return DocState(
97
+ episode_id=payload.get("episode_id", ""),
98
+ step_count=payload.get("step_count", 0),
99
+ vfs=payload.get("vfs", {}),
100
+ active_file=payload.get("active_file", ""),
101
+ )
inference.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Inference script for the Doc Sweeper Environment."""
2
+
3
+ import os
4
+ import json
5
+ import time
6
+ from openai import OpenAI
7
+ from server.cust_env_environment import DocSweeperEnvironment
8
+ from models import DocAction
9
+
10
+ def run_inference(task_name: str):
11
+
12
+ api_base_url = os.environ.get("API_BASE_URL")
13
+ model_name = os.environ.get("MODEL_NAME")
14
+ hf_token = os.environ.get("HF_TOKEN")
15
+
16
+ if not all([api_base_url, model_name, hf_token]):
17
+ raise ValueError("Missing required environment variables: API_BASE_URL, MODEL_NAME, or HF_TOKEN")
18
+
19
+
20
+ client = OpenAI(
21
+ api_key=hf_token,
22
+ base_url=api_base_url
23
+ )
24
+
25
+ env = DocSweeperEnvironment(task=task_name)
26
+ obs = env.reset()
27
+ done = False
28
+ total_reward = 0.0
29
+ step_count = 0
30
+
31
+
32
+ print(f"[START] task={task_name} model={model_name}")
33
+
34
+ system_prompt = f"""
35
+ You are an elite, systematic documentation engineer. You interact with a virtual file system via JSON tool calls.
36
+
37
+ YOUR CURRENT TASK: '{task_name}'
38
+ - If 'version_bump': Systematically OPEN EVERY SINGLE FILE in the directory tree. Check for 'v1.0.0' or 'v1.00'. If found, use 'edit' to update to 'v2.0.0'.
39
+ - If 'config_migration': Open docker-compose files. Update version to 3.8 and migrate 'links' to 'networks'.
40
+ - If 'broken_links': Find broken relative links and edit them to point to correct paths.
41
+
42
+ WORKFLOW RULES:
43
+ 1. PLAN FIRST: Use the 'thought' field to track which files you have checked and which remain.
44
+ 2. OPEN THEN EDIT: You MUST 'open' a file before you can 'edit' it.
45
+ 3. EDIT SAFELY: When editing, use 'old_str' (exact text to replace) and 'new_str'. Do NOT use 'path'.
46
+ 4. FINISH: Call 'done' ONLY when you have opened and verified EVERY file in the directory tree.
47
+
48
+ OUTPUT SCHEMA:
49
+ You MUST output ONLY a single raw JSON object EXACTLY matching this structure:
50
+ {{
51
+ "thought": "<Mandatory step-by-step reasoning>",
52
+ "tool_name": "<MUST be one of: 'open', 'edit', 'grep', 'done'>",
53
+ "path": "<Optional. File path for 'open'>",
54
+ "old_str": "<Optional. Exact match string for 'edit'>",
55
+ "new_str": "<Optional. Replacement string for 'edit'>",
56
+ "search_query": "<Optional. Text to search for 'grep'>"
57
+ }}
58
+ """
59
+
60
+ messages = [{"role": "system", "content": system_prompt}]
61
+
62
+ start_time = time.time()
63
+
64
+ while not done:
65
+ step_count += 1
66
+ current_state_prompt = f"""
67
+ [ENVIRONMENT OBSERVATION]
68
+ Active File: {obs.active_file or 'None'}
69
+ Terminal Feedback: {obs.terminal_feedback}
70
+ Directory Tree: {json.dumps(obs.directory_tree)}
71
+ File Content: {obs.file_content}
72
+ Linter Issues: {obs.issues_detected}
73
+ """
74
+ messages.append({"role": "user", "content": current_state_prompt})
75
+
76
+ try:
77
+ response = client.chat.completions.create(
78
+ model=model_name,
79
+ messages=messages,
80
+ response_format={"type": "json_object"}
81
+ )
82
+
83
+ raw_reply = response.choices[0].message.content
84
+ messages.append({"role": "assistant", "content": raw_reply})
85
+
86
+ action_json = json.loads(raw_reply)
87
+ if isinstance(action_json, list):
88
+ action_json = action_json[0] if len(action_json) > 0 else {"tool_name": "done"}
89
+
90
+ thought = action_json.pop("thought", "None")
91
+
92
+ valid_fields = DocAction.model_fields.keys()
93
+ safe_kwargs = {k: v for k, v in action_json.items() if k in valid_fields}
94
+
95
+ action = DocAction(**safe_kwargs)
96
+ obs = env.step(action)
97
+ total_reward += obs.reward
98
+ done = obs.done
99
+
100
+
101
+ print(f"[STEP] step={step_count} action={action.tool_name} reward={obs.reward:.2f} done={done} thought=\"{thought[:100]}...\"")
102
+
103
+ except Exception as e:
104
+ obs.terminal_feedback = f"SYSTEM ERROR: {str(e)}. Review the schema rules."
105
+ print(f"[STEP] step={step_count} action=error reward=0.0 done={done} error=\"{str(e)}\"")
106
+
107
+ runtime = time.time() - start_time
108
+
109
+
110
+ final_score = max(0.0, min(1.0, total_reward))
111
+
112
+ print(f"[END] task={task_name} score={final_score:.2f} total_steps={step_count} runtime_seconds={runtime:.1f}")
113
+
114
+
115
+ if __name__ == "__main__":
116
+ tasks = ["version_bump", "config_migration", "broken_links"]
117
+ for task in tasks:
118
+ run_inference(task)
main.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ def main():
2
+ print("Hello from cust-env!")
3
+
4
+
5
+ if __name__ == "__main__":
6
+ main()
models.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Data models for Doc Sweeper Environment.
9
+
10
+ This module defines the Action, Observation, and State types for the documentation
11
+ maintenance tasks via the OpenEnv interface.
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ from typing import Dict, List, Optional
17
+
18
+ from openenv.core.env_server import Action, Observation, State
19
+ from pydantic import Field
20
+
21
+
22
+ class DocAction(Action):
23
+ """
24
+ Action for Doc Sweeper environment.
25
+
26
+ Attributes:
27
+ tool_name: The command to run ('open', 'edit', 'grep', 'done').
28
+ path: File path for opening or editing.
29
+ old_str: Exact match string for safe replacement.
30
+ new_str: Replacement string.
31
+ search_query: String to search via grep.
32
+ """
33
+
34
+ tool_name: str = Field(..., description="'open', 'edit', 'grep', or 'done'")
35
+ path: Optional[str] = Field(None, description="File path for open/edit/grep")
36
+ old_str: Optional[str] = Field(None, description="Exact match string for safe replacement")
37
+ new_str: Optional[str] = Field(None, description="Replacement string")
38
+ search_query: Optional[str] = Field(None, description="String to search via grep")
39
+
40
+
41
+ class DocObservation(Observation):
42
+ """
43
+ Observation for Doc Sweeper environment.
44
+
45
+ Attributes:
46
+ active_file: Currently opened file path.
47
+ file_content: Full text of the opened file.
48
+ directory_tree: Virtual File System tree representation.
49
+ issues_detected: Linter output for current file.
50
+ terminal_feedback: Result of last action.
51
+ done: Whether the task is complete.
52
+ reward: Reward for the last action.
53
+ """
54
+
55
+ active_file: str = ""
56
+ file_content: str = ""
57
+ directory_tree: Dict[str, List[str]] = Field(default_factory=dict)
58
+ issues_detected: List[str] = Field(default_factory=list)
59
+ terminal_feedback: str = "Environment initialized."
60
+ done: bool = False
61
+
62
+
63
+ class DocState(State):
64
+ """
65
+ State for Doc Sweeper environment.
66
+
67
+ Attributes:
68
+ episode_id: Unique ID for the current task session.
69
+ step_count: Number of actions taken.
70
+ vfs: The current state of the virtual file system (hidden from direct observation).
71
+ active_file: The file currently open in the editor.
72
+ """
73
+
74
+ episode_id: str = ""
75
+ step_count: int = 0
76
+ vfs: Dict[str, str] = Field(default_factory=dict)
77
+ active_file: str = ""
openenv.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: cust_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
pyproject.toml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "cust-env"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ dependencies = [
8
+ "fastapi>=0.135.2",
9
+ "groq>=1.1.2",
10
+ "openai>=2.29.0",
11
+ "openenv-core>=0.2.3",
12
+ "openenv[core]>=0.1.13",
13
+ "pydantic>=2.12.5",
14
+ "uvicorn>=0.42.0",
15
+ ]
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ openenv-core
2
+ fastapi
3
+ uvicorn
4
+ pydantic
5
+ openai
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cust Env environment server components."""
8
+
9
+ from .cust_env_environment import DocSweeperEnvironment
10
+
11
+ __all__ = ["DocSweeperEnvironment"]
server/app.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """FastAPI application for the Doc Sweeper Environment."""
8
+
9
+ from openenv.core.env_server import create_app
10
+
11
+ from models import DocAction, DocObservation
12
+ from .cust_env_environment import DocSweeperEnvironment
13
+
14
+ # Create the FastAPI app
15
+ # Pass the class (factory) instead of an instance for WebSocket session support
16
+ app = create_app(DocSweeperEnvironment, DocAction, DocObservation, env_name="doc_sweeper")
17
+
18
+ if __name__ == "__main__":
19
+ import uvicorn
20
+
21
+ uvicorn.run(app, host="0.0.0.0", port=8000)
server/cust_env_environment.py ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Doc Sweeper Environment for OpenEnv.
9
+
10
+ This module provides an environment where an agent navigates a virtual file system
11
+ to fix outdated documentation, broken links, and deprecated configurations.
12
+ """
13
+
14
+ import uuid
15
+ from typing import Dict, List
16
+
17
+ from openenv.core.env_server import Environment
18
+
19
+ from models import DocAction, DocObservation, DocState
20
+
21
+
22
+ class DocSweeperEnvironment(Environment):
23
+ """
24
+ Doc Sweeper environment implementing the OpenEnv interface.
25
+
26
+ Simulates a Virtual File System (VFS) of markdown and config files. The agent
27
+ must find and replace deprecated patterns without destructive behavior.
28
+ """
29
+
30
+ def __init__(
31
+ self,
32
+ task: str = "version_bump",
33
+ max_steps: int = 30,
34
+ ):
35
+ """
36
+ Initialize the Doc Sweeper environment.
37
+
38
+ Args:
39
+ task: Task to run - "version_bump", "config_migration", or "broken_links".
40
+ max_steps: Maximum allowed actions before forced termination.
41
+ """
42
+ super().__init__(rubric=None)
43
+ self._task = task
44
+ self._max_steps = max_steps
45
+ self._state: DocState | None = None
46
+ self._terminal_feedback = ""
47
+ self.reset()
48
+
49
+ def reset(self, **kwargs):
50
+ """
51
+ Initialize a new task episode.
52
+
53
+ Returns:
54
+ Initial observation of the virtual file system.
55
+ """
56
+ episode_id = str(uuid.uuid4())
57
+ self._terminal_feedback = "Environment reset."
58
+
59
+ initial_vfs = {}
60
+ if self._task == "version_bump":
61
+ initial_vfs = {
62
+ "/docs/setup.md": "Welcome to our tool v1.0.0. To install v1.0.0, run the script.",
63
+ "/docs/api.md": "API Reference for v1.0.0.",
64
+ "/docs/troubleshoot.md": "If v1.00 fails, check logs."
65
+ }
66
+ elif self._task == "config_migration":
67
+ initial_vfs = {
68
+ "/docs/docker-compose.yml": "version: '2'\nservices:\n web:\n links:\n - db",
69
+ "/docs/readme.md": "Use the docker-compose to start."
70
+ }
71
+ else:
72
+ initial_vfs = {
73
+ "/docs/index.md": "Please read [Setup](setup.md) before continuing.",
74
+ "/docs/installation.md": "# Installation\nSteps go here.",
75
+ "/docs/advanced.md": "Advanced config in [Setup](setup.md)."
76
+ }
77
+
78
+ self._state = DocState(
79
+ episode_id=episode_id,
80
+ step_count=0,
81
+ vfs=initial_vfs,
82
+ active_file=""
83
+ )
84
+
85
+ return self._make_observation(reward=0.0, done=False)
86
+
87
+ def step(self, action: DocAction):
88
+ """
89
+ Execute an action and return the resulting state.
90
+
91
+ Args:
92
+ action: The tool action to execute (open, edit, grep, done).
93
+
94
+ Returns:
95
+ Observation with reward and done flag.
96
+ """
97
+ if self._state is None:
98
+ raise RuntimeError("Environment not initialized. Call reset() first.")
99
+
100
+ self._state.step_count += 1
101
+ reward = 0.0
102
+ done = False
103
+ self._terminal_feedback = ""
104
+
105
+ # Action Routing
106
+ if action.tool_name == "done":
107
+ done = True
108
+ reward += self._evaluate_final_grade()
109
+ self._terminal_feedback = "Task submitted for final grading."
110
+
111
+ elif action.tool_name == "open":
112
+ if action.path in self._state.vfs:
113
+ self._state.active_file = action.path
114
+ self._terminal_feedback = f"Opened {action.path}"
115
+ else:
116
+ self._terminal_feedback = f"Error: File {action.path} not found."
117
+ reward -= 0.1
118
+
119
+ elif action.tool_name == "grep":
120
+ if action.search_query:
121
+ results = [p for p, c in self._state.vfs.items() if action.search_query in c]
122
+ self._terminal_feedback = f"Found '{action.search_query}' in: {', '.join(results) or 'No files'}"
123
+ if self._task == "broken_links":
124
+ reward += 0.1
125
+ else:
126
+ self._terminal_feedback = "Error: search_query required for grep."
127
+
128
+ elif action.tool_name == "edit":
129
+ reward += self._handle_edit(action)
130
+
131
+ else:
132
+ self._terminal_feedback = f"Error: Unknown tool {action.tool_name}."
133
+ reward -= 0.1
134
+
135
+ # Check timeout
136
+ if self._state.step_count >= self._max_steps:
137
+ done = True
138
+ self._terminal_feedback = "Max steps reached."
139
+
140
+ return self._make_observation(reward=reward, done=done)
141
+
142
+ def _handle_edit(self, action: DocAction) -> float:
143
+ if not self._state.active_file:
144
+ self._terminal_feedback = "Error: No file is currently open."
145
+ return -0.1
146
+
147
+ content = self._state.vfs[self._state.active_file]
148
+
149
+ if action.old_str in ["```yaml", "# Title"] and not action.new_str:
150
+ self._terminal_feedback = "Error: Destructive action prevented."
151
+ return -1.0
152
+
153
+ if action.old_str and action.old_str in content:
154
+ self._state.vfs[self._state.active_file] = content.replace(action.old_str, action.new_str or "")
155
+ self._terminal_feedback = "Edit successful."
156
+ return 0.1
157
+ else:
158
+ self._terminal_feedback = f"Error: old_str '{action.old_str}' not found in file."
159
+ return -0.1
160
+
161
+ def _evaluate_final_grade(self) -> float:
162
+ # Simplified deterministic grader for example purposes
163
+ text = "".join(self._state.vfs.values())
164
+ if self._task == "version_bump":
165
+ target_count = text.count("v2.0.0")
166
+ penalty = text.count("v1.0.0") + text.count("v1.00")
167
+ return max(0.0, (target_count / 4.0) - (penalty * 0.5))
168
+ return 0.5
169
+
170
+ def _get_linter_issues(self) -> List[str]:
171
+ if not self._state.active_file:
172
+ return []
173
+ issues = []
174
+ content = self._state.vfs.get(self._state.active_file, "")
175
+ if self._task == "version_bump" and "v1.0.0" in content:
176
+ issues.append("Deprecated version 'v1.0.0' found.")
177
+ return issues
178
+
179
+ def _make_observation(self, reward: float = 0.0, done: bool = False):
180
+ return DocObservation(
181
+ active_file=self._state.active_file,
182
+ file_content=self._state.vfs.get(self._state.active_file, ""),
183
+ directory_tree={"/docs": list(self._state.vfs.keys())},
184
+ issues_detected=self._get_linter_issues(),
185
+ terminal_feedback=self._terminal_feedback,
186
+ reward=reward,
187
+ done=done,
188
+ )
189
+
190
+ @property
191
+ def state(self):
192
+ """Return the current episode state."""
193
+ return self._state
uv.lock ADDED
The diff for this file is too large to render. See raw diff