DocSweeper / README.md
arjeet
commit2
de5eb3d
---
title: Doc Sweeper Environment
emoji: 🧹
colorFrom: 'blue'
colorTo: 'green'
sdk: docker
pinned: false
app_port: 8000
tags:
- openenv
---
# Doc Sweeper Environment
A virtual file system and text-editing environment for OpenEnv. This environment tasks autonomous LLM agents with acting as automated documentation engineers, requiring them to navigate a directory tree, read files, and apply precise string manipulations to complete complex refactoring tasks.
## Overview
The Doc Sweeper environment provides a sandboxed, in-memory file system where agents can interact with dummy codebases and documentation. It evaluates an agent's ability to retain context, plan multi-step operations, and use tools correctly.
### Features
* **Virtual File System**: In-memory directory tree with nested files.
* **Strict Tooling**: Requires agents to explicitly `open` files before applying `edit` commands.
* **Granular Feedback**: Provides immediate terminal feedback and linter issues upon illegal actions or formatting errors.
* **Three Distinct Scenarios**: Evaluates different logic flows (global search/replace, YAML refactoring, path resolution).
### Task Rules
The environment supports three primary tasks:
* `version_bump`: The agent must find all outdated version numbers (e.g., `v1.0.0` or `v1.00`) across all files and update them to `v2.0.0`.
* `config_migration`: The agent must open docker-compose files, update the version to `3.8`, and migrate `links` keys to `networks`.
* `broken_links`: The agent must find broken relative markdown links and edit them to point to correct file paths.
---
## Quick Start
### Running the Baseline Inference (Recommended)
The easiest way to test the environment is using the provided Chain-of-Thought agent script.
```bash
# Export your required credentials
export HF_TOKEN="your_api_key_here"
export API_BASE_URL="[https://api.openai.com/v1](https://api.openai.com/v1)"
export MODEL_NAME="gpt-4o-mini"
```
# Run the inference script across all tasks
python inference.py
## Using Local Server
You can host the environment locally to manually test the API endpoints.
```bash
# Install dependencies
pip install -r requirements.txt
```
# Run server
```bash
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
## Actions
The action space is defined by the `DocAction` schema. The agent must provide a single JSON object with a `tool_name` and the corresponding required fields:
* **`open`**: Opens a file. Requires the `path` parameter.
* **`edit`**: Replaces text in the currently active file. Requires exact string matching via `old_str` and `new_str`.
* **`grep`**: Searches the active file (or directory). Requires `search_query`.
* **`done`**: Signals that the task is complete.
## Observations
Each observation (`DocObservation`) returned by the environment includes:
* **`active_file`**: The file currently opened by the agent.
* **`terminal_feedback`**: Error messages, success logs, or system alerts resulting from the last action.
* **`directory_tree`**: A JSON representation of the current file system hierarchy.
* **`file_content`**: The textual content of the currently active file.
* **`issues_detected`**: A list of simulated linter errors (if the agent breaks a file's formatting).
## Configuration
### Reward Structure
The environment issues rewards based on the agent's efficiency and accuracy:
* **Valid Tool Usage**: `0.0` (Neutral, but advances the state).
* **Tool Misuse Penalty**: `-0.1` (e.g., trying to edit without opening a file, or providing a bad file path).
* **Task Completion**: `1.0` (Awarded only when `done` is called and all objective checks pass).
* **Early/Failed Completion**: `-1.0` (Calling `done` before fixing all required strings).
## Building and Deployment
### Build Docker Image
From the repository root:
# Build the environment image
```bash
docker build -t doc-sweeper-env:latest .
```
The Dockerfile uses pip install with requirements.txt for maximum compatibility with Hugging Face Spaces.
# Run the container locally
```bash
docker run -p 8000:8000 doc-sweeper-env:latest
```
The FastAPI OpenEnv endpoints will be available at `http://localhost:8000/reset` and `http://localhost:8000/step`.
---
## Dependencies
The Doc Sweeper environment requires:
* **`fastapi` & `uvicorn`**: For serving the OpenEnv endpoints.
* **`pydantic`**: For strict action and observation schema validation.
* **`openai` / `groq`**: For the baseline LLM inference script.
These are automatically installed when using Docker or installing via `pip install -r requirements.txt`.
---
## Example Evaluation Log Output
When running `inference.py`, the agent emits strictly formatted logs for the automated graders:
```text
[START] task=version_bump model=gpt-4o-mini
[STEP] step=1 action=open reward=0.00 done=False thought="Opening setup.md to check for versions."
[STEP] step=2 action=edit reward=0.00 done=False thought="Replacing v1.0.0 with v2.0.0."
[STEP] step=3 action=done reward=1.00 done=True thought="All files have been checked."
[END] task=version_bump score=1.00 total_steps=3 runtime_seconds=4.2