Spaces:
Sleeping
Sleeping
File size: 7,563 Bytes
c1fd719 e2eb9d7 56724ad c1fd719 e2eb9d7 c1fd719 56724ad e2eb9d7 56724ad e2eb9d7 56724ad e2eb9d7 56724ad e2eb9d7 56724ad e2eb9d7 56724ad e2eb9d7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | ---
title: Agentops Gym Environment Server
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Agentops Gym: Optimizing Tool-Use Efficiency
**"LLMs burn tokens via inefficient tool usage."**
Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes **efficiency**βpenalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption.
## Quick Start
The simplest way to use the Agentops Gym environment is through the `AgentopsGymEnv` class:
```python
from agentops_gym import AgentopsGymAction, AgentopsGymEnv
from agentops_gym.models import ToolCall
try:
# Create environment from Docker image
agentops_gymenv = AgentopsGymEnv.from_docker_image("agentops_gym-env:latest")
# Reset to start a task
result = agentops_gymenv.reset(task_id="task_1")
print(f"Task: {result.observation.task_description}")
# Use tools to complete the task
# Example: Search for a pattern
action = AgentopsGymAction(
tool_call=ToolCall(tool="Grep", parameters={"pattern": "json"})
)
result = agentops_gymenv.step(action)
print(f"Grep Result: {result.observation.last_tool_result}")
finally:
# Always clean up
agentops_gymenv.close()
```
## Docker Build & Run
### 1. Build the Image
Build the environment server from the project root:
```bash
docker build -t agentops-gym -f agentops_gym/server/Dockerfile .
```
### 2. Run the Container
Start the server on port 8000:
```bash
# Remove existing container if necessary
docker stop agentops-gym && docker rm agentops-gym
# Run new container
docker run -d --name agentops-gym -p 8000:8000 agentops-gym
```
### 3. Verify & Logs
```bash
# Check health
curl http://localhost:8000/health
# Tail logs
docker logs -f agentops-gym
```
## Run Baseline Inference
The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration).
### Setup
```bash
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
export IMAGE_NAME=agentops-gym
# Optional overrides:
# export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
# export API_BASE_URL=https://router.huggingface.co/v1
```
### Run
```bash
python agentops_gym/inference.py
```
### Expected Output
```text
============================================================
AgentOps Gym β Baseline Inference
Model: gpt-4.1 | Server: http://localhost:8000
============================================================
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_1 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null
[STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null
[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null
...
[STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null
[END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_2 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null
[STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null
[STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null
[END] success=true steps=3 rewards=0.05,0.10,0.55
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_3 env=agentops-gym model=gpt-4.1
...
[STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null
[END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_4 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null
[STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null
[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null
[STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null
[STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null
[END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43
============================================================
BASELINE SUMMARY
============================================================
task_1 score=0.390 steps= 8 β FAIL
task_2 score=1.000 steps= 3 β
PASS
task_3 score=0.392 steps= 8 β FAIL
task_4 score=0.856 steps=10 β
PASS
Average score: 0.659
Solved: 2 / 4
============================================================
```
## Environment Details
### Action
**AgentopsGymAction**:
- `tool_call` (ToolCall) - The tool to execute (Grep, FileRead, FileWrite, Bash, TodoWrite, Submit)
- `reasoning` (str, optional) - Agent's explanation for the action
### Observation
**AgentopsGymObservation**:
- `task_description` (str) - The task objective
- `visible_files` (list[str]) - Files discovered so far
- `last_tool_result` (str) - Output of the last tool call
- `action_history` (list[str]) - Previous actions in this episode
- `step_count` (int) - Current step number
- `max_steps` (int) - Maximum allowed steps
- `done` (bool) - Whether the episode is complete
- `feedback` (str, optional) - Warnings or penalties from the environment
### Available Tools
- **Grep**: Search for patterns in the virtual filesystem.
- **FileRead**: Read file contents.
- **FileWrite**: Modify file contents.
- **Bash**: Run simulated commands (lint, test).
- **TodoWrite**: Save a plan for the task.
- **Submit**: Submit the final answer.
## Advanced Usage
### Using the Context Manager
```python
from agentops_gym import AgentopsGymAction, AgentopsGymEnv
from agentops_gym.models import ToolCall
with AgentopsGymEnv(base_url="http://localhost:8000") as env:
result = env.reset(task_id="task_1")
# Execute steps...
action = AgentopsGymAction(tool_call=ToolCall(tool="FileRead", parameters={"filename": "README.md"}))
result = env.step(action)
```
## Running Locally
Run the server locally for development:
```bash
cd agentops_gym
uvicorn server.app:app --reload
```
## Project Structure
```
agentops_gym/
βββ __init__.py # Module exports
βββ README.md # This file
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Project metadata and dependencies
βββ models.py # Action and Observation models
βββ server/
βββ __init__.py # Server module exports
βββ agentops_gym_environment.py # Core environment logic
βββ app.py # FastAPI application
βββ Dockerfile # Container image definition
```
|