Spaces:
Sleeping
Sleeping
File size: 5,133 Bytes
de5eb3d e03aa3c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
title: Doc Sweeper Environment
emoji: 🧹
colorFrom: 'blue'
colorTo: 'green'
sdk: docker
pinned: false
app_port: 8000
tags:
- openenv
---
# Doc Sweeper Environment
A virtual file system and text-editing environment for OpenEnv. This environment tasks autonomous LLM agents with acting as automated documentation engineers, requiring them to navigate a directory tree, read files, and apply precise string manipulations to complete complex refactoring tasks.
## Overview
The Doc Sweeper environment provides a sandboxed, in-memory file system where agents can interact with dummy codebases and documentation. It evaluates an agent's ability to retain context, plan multi-step operations, and use tools correctly.
### Features
* **Virtual File System**: In-memory directory tree with nested files.
* **Strict Tooling**: Requires agents to explicitly `open` files before applying `edit` commands.
* **Granular Feedback**: Provides immediate terminal feedback and linter issues upon illegal actions or formatting errors.
* **Three Distinct Scenarios**: Evaluates different logic flows (global search/replace, YAML refactoring, path resolution).
### Task Rules
The environment supports three primary tasks:
* `version_bump`: The agent must find all outdated version numbers (e.g., `v1.0.0` or `v1.00`) across all files and update them to `v2.0.0`.
* `config_migration`: The agent must open docker-compose files, update the version to `3.8`, and migrate `links` keys to `networks`.
* `broken_links`: The agent must find broken relative markdown links and edit them to point to correct file paths.
---
## Quick Start
### Running the Baseline Inference (Recommended)
The easiest way to test the environment is using the provided Chain-of-Thought agent script.
```bash
# Export your required credentials
export HF_TOKEN="your_api_key_here"
export API_BASE_URL="[https://api.openai.com/v1](https://api.openai.com/v1)"
export MODEL_NAME="gpt-4o-mini"
```
# Run the inference script across all tasks
python inference.py
## Using Local Server
You can host the environment locally to manually test the API endpoints.
```bash
# Install dependencies
pip install -r requirements.txt
```
# Run server
```bash
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
## Actions
The action space is defined by the `DocAction` schema. The agent must provide a single JSON object with a `tool_name` and the corresponding required fields:
* **`open`**: Opens a file. Requires the `path` parameter.
* **`edit`**: Replaces text in the currently active file. Requires exact string matching via `old_str` and `new_str`.
* **`grep`**: Searches the active file (or directory). Requires `search_query`.
* **`done`**: Signals that the task is complete.
## Observations
Each observation (`DocObservation`) returned by the environment includes:
* **`active_file`**: The file currently opened by the agent.
* **`terminal_feedback`**: Error messages, success logs, or system alerts resulting from the last action.
* **`directory_tree`**: A JSON representation of the current file system hierarchy.
* **`file_content`**: The textual content of the currently active file.
* **`issues_detected`**: A list of simulated linter errors (if the agent breaks a file's formatting).
## Configuration
### Reward Structure
The environment issues rewards based on the agent's efficiency and accuracy:
* **Valid Tool Usage**: `0.0` (Neutral, but advances the state).
* **Tool Misuse Penalty**: `-0.1` (e.g., trying to edit without opening a file, or providing a bad file path).
* **Task Completion**: `1.0` (Awarded only when `done` is called and all objective checks pass).
* **Early/Failed Completion**: `-1.0` (Calling `done` before fixing all required strings).
## Building and Deployment
### Build Docker Image
From the repository root:
# Build the environment image
```bash
docker build -t doc-sweeper-env:latest .
```
The Dockerfile uses pip install with requirements.txt for maximum compatibility with Hugging Face Spaces.
# Run the container locally
```bash
docker run -p 8000:8000 doc-sweeper-env:latest
```
The FastAPI OpenEnv endpoints will be available at `http://localhost:8000/reset` and `http://localhost:8000/step`.
---
## Dependencies
The Doc Sweeper environment requires:
* **`fastapi` & `uvicorn`**: For serving the OpenEnv endpoints.
* **`pydantic`**: For strict action and observation schema validation.
* **`openai` / `groq`**: For the baseline LLM inference script.
These are automatically installed when using Docker or installing via `pip install -r requirements.txt`.
---
## Example Evaluation Log Output
When running `inference.py`, the agent emits strictly formatted logs for the automated graders:
```text
[START] task=version_bump model=gpt-4o-mini
[STEP] step=1 action=open reward=0.00 done=False thought="Opening setup.md to check for versions."
[STEP] step=2 action=edit reward=0.00 done=False thought="Replacing v1.0.0 with v2.0.0."
[STEP] step=3 action=done reward=1.00 done=True thought="All files have been checked."
[END] task=version_bump score=1.00 total_steps=3 runtime_seconds=4.2 |