Spaces:

Arjs
/

DocSweeper

Sleeping

File size: 5,133 Bytes

de5eb3d
 
 
 
 
 
 
 
 
 
 
 
 
e03aa3c

---
title: Doc Sweeper Environment
emoji: 🧹
colorFrom: 'blue'
colorTo: 'green'
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
---


# Doc Sweeper Environment

A virtual file system and text-editing environment for OpenEnv. This environment tasks autonomous LLM agents with acting as automated documentation engineers, requiring them to navigate a directory tree, read files, and apply precise string manipulations to complete complex refactoring tasks.

## Overview

The Doc Sweeper environment provides a sandboxed, in-memory file system where agents can interact with dummy codebases and documentation. It evaluates an agent's ability to retain context, plan multi-step operations, and use tools correctly.

### Features

* **Virtual File System**: In-memory directory tree with nested files.
* **Strict Tooling**: Requires agents to explicitly `open` files before applying `edit` commands.
* **Granular Feedback**: Provides immediate terminal feedback and linter issues upon illegal actions or formatting errors.
* **Three Distinct Scenarios**: Evaluates different logic flows (global search/replace, YAML refactoring, path resolution).

### Task Rules

The environment supports three primary tasks:

* `version_bump`: The agent must find all outdated version numbers (e.g., `v1.0.0` or `v1.00`) across all files and update them to `v2.0.0`.
* `config_migration`: The agent must open docker-compose files, update the version to `3.8`, and migrate `links` keys to `networks`.
* `broken_links`: The agent must find broken relative markdown links and edit them to point to correct file paths.

---

## Quick Start

### Running the Baseline Inference (Recommended)

The easiest way to test the environment is using the provided Chain-of-Thought agent script.

```bash
# Export your required credentials
export HF_TOKEN="your_api_key_here"
export API_BASE_URL="[https://api.openai.com/v1](https://api.openai.com/v1)"
export MODEL_NAME="gpt-4o-mini"
```

# Run the inference script across all tasks
python inference.py

## Using Local Server
You can host the environment locally to manually test the API endpoints.

```bash
# Install dependencies
pip install -r requirements.txt
```


# Run server
```bash
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
## Actions

The action space is defined by the `DocAction` schema. The agent must provide a single JSON object with a `tool_name` and the corresponding required fields:

* **`open`**: Opens a file. Requires the `path` parameter.
* **`edit`**: Replaces text in the currently active file. Requires exact string matching via `old_str` and `new_str`.
* **`grep`**: Searches the active file (or directory). Requires `search_query`.
* **`done`**: Signals that the task is complete.

## Observations

Each observation (`DocObservation`) returned by the environment includes:

* **`active_file`**: The file currently opened by the agent.
* **`terminal_feedback`**: Error messages, success logs, or system alerts resulting from the last action.
* **`directory_tree`**: A JSON representation of the current file system hierarchy.
* **`file_content`**: The textual content of the currently active file.
* **`issues_detected`**: A list of simulated linter errors (if the agent breaks a file's formatting).

## Configuration

### Reward Structure

The environment issues rewards based on the agent's efficiency and accuracy:

* **Valid Tool Usage**: `0.0` (Neutral, but advances the state).
* **Tool Misuse Penalty**: `-0.1` (e.g., trying to edit without opening a file, or providing a bad file path).
* **Task Completion**: `1.0` (Awarded only when `done` is called and all objective checks pass).
* **Early/Failed Completion**: `-1.0` (Calling `done` before fixing all required strings).

## Building and Deployment

### Build Docker Image

From the repository root:

# Build the environment image

```bash
docker build -t doc-sweeper-env:latest .
```

The Dockerfile uses pip install with requirements.txt for maximum compatibility with Hugging Face Spaces.

# Run the container locally

```bash
docker run -p 8000:8000 doc-sweeper-env:latest
```
The FastAPI OpenEnv endpoints will be available at `http://localhost:8000/reset` and `http://localhost:8000/step`.

---

## Dependencies

The Doc Sweeper environment requires:

* **`fastapi` & `uvicorn`**: For serving the OpenEnv endpoints.
* **`pydantic`**: For strict action and observation schema validation.
* **`openai` / `groq`**: For the baseline LLM inference script.

These are automatically installed when using Docker or installing via `pip install -r requirements.txt`.

---

## Example Evaluation Log Output

When running `inference.py`, the agent emits strictly formatted logs for the automated graders:

```text
[START] task=version_bump model=gpt-4o-mini
[STEP] step=1 action=open reward=0.00 done=False thought="Opening setup.md to check for versions."
[STEP] step=2 action=edit reward=0.00 done=False thought="Replacing v1.0.0 with v2.0.0."
[STEP] step=3 action=done reward=1.00 done=True thought="All files have been checked."
[END] task=version_bump score=1.00 total_steps=3 runtime_seconds=4.2