agentops-gym / README.md
Revanth-ml's picture
Upload folder using huggingface_hub
56724ad verified
---
title: Agentops Gym Environment Server
emoji: πŸ”Š
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Agentops Gym: Optimizing Tool-Use Efficiency
**"LLMs burn tokens via inefficient tool usage."**
Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes **efficiency**β€”penalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption.
## Quick Start
The simplest way to use the Agentops Gym environment is through the `AgentopsGymEnv` class:
```python
from agentops_gym import AgentopsGymAction, AgentopsGymEnv
from agentops_gym.models import ToolCall
try:
# Create environment from Docker image
agentops_gymenv = AgentopsGymEnv.from_docker_image("agentops_gym-env:latest")
# Reset to start a task
result = agentops_gymenv.reset(task_id="task_1")
print(f"Task: {result.observation.task_description}")
# Use tools to complete the task
# Example: Search for a pattern
action = AgentopsGymAction(
tool_call=ToolCall(tool="Grep", parameters={"pattern": "json"})
)
result = agentops_gymenv.step(action)
print(f"Grep Result: {result.observation.last_tool_result}")
finally:
# Always clean up
agentops_gymenv.close()
```
## Docker Build & Run
### 1. Build the Image
Build the environment server from the project root:
```bash
docker build -t agentops-gym -f agentops_gym/server/Dockerfile .
```
### 2. Run the Container
Start the server on port 8000:
```bash
# Remove existing container if necessary
docker stop agentops-gym && docker rm agentops-gym
# Run new container
docker run -d --name agentops-gym -p 8000:8000 agentops-gym
```
### 3. Verify & Logs
```bash
# Check health
curl http://localhost:8000/health
# Tail logs
docker logs -f agentops-gym
```
## Run Baseline Inference
The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration).
### Setup
```bash
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
export IMAGE_NAME=agentops-gym
# Optional overrides:
# export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
# export API_BASE_URL=https://router.huggingface.co/v1
```
### Run
```bash
python agentops_gym/inference.py
```
### Expected Output
```text
============================================================
AgentOps Gym β€” Baseline Inference
Model: gpt-4.1 | Server: http://localhost:8000
============================================================
────────────────────────────────────────
[START] task=task_1 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null
[STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null
[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null
...
[STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null
[END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14
────────────────────────────────────────
[START] task=task_2 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null
[STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null
[STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null
[END] success=true steps=3 rewards=0.05,0.10,0.55
────────────────────────────────────────
[START] task=task_3 env=agentops-gym model=gpt-4.1
...
[STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null
[END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20
────────────────────────────────────────
[START] task=task_4 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null
[STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null
[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null
[STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null
[STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null
[END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43
============================================================
BASELINE SUMMARY
============================================================
task_1 score=0.390 steps= 8 ❌ FAIL
task_2 score=1.000 steps= 3 βœ… PASS
task_3 score=0.392 steps= 8 ❌ FAIL
task_4 score=0.856 steps=10 βœ… PASS
Average score: 0.659
Solved: 2 / 4
============================================================
```
## Environment Details
### Action
**AgentopsGymAction**:
- `tool_call` (ToolCall) - The tool to execute (Grep, FileRead, FileWrite, Bash, TodoWrite, Submit)
- `reasoning` (str, optional) - Agent's explanation for the action
### Observation
**AgentopsGymObservation**:
- `task_description` (str) - The task objective
- `visible_files` (list[str]) - Files discovered so far
- `last_tool_result` (str) - Output of the last tool call
- `action_history` (list[str]) - Previous actions in this episode
- `step_count` (int) - Current step number
- `max_steps` (int) - Maximum allowed steps
- `done` (bool) - Whether the episode is complete
- `feedback` (str, optional) - Warnings or penalties from the environment
### Available Tools
- **Grep**: Search for patterns in the virtual filesystem.
- **FileRead**: Read file contents.
- **FileWrite**: Modify file contents.
- **Bash**: Run simulated commands (lint, test).
- **TodoWrite**: Save a plan for the task.
- **Submit**: Submit the final answer.
## Advanced Usage
### Using the Context Manager
```python
from agentops_gym import AgentopsGymAction, AgentopsGymEnv
from agentops_gym.models import ToolCall
with AgentopsGymEnv(base_url="http://localhost:8000") as env:
result = env.reset(task_id="task_1")
# Execute steps...
action = AgentopsGymAction(tool_call=ToolCall(tool="FileRead", parameters={"filename": "README.md"}))
result = env.step(action)
```
## Running Locally
Run the server locally for development:
```bash
cd agentops_gym
uvicorn server.app:app --reload
```
## Project Structure
```
agentops_gym/
β”œβ”€β”€ __init__.py # Module exports
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ openenv.yaml # OpenEnv manifest
β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
β”œβ”€β”€ models.py # Action and Observation models
└── server/
β”œβ”€β”€ __init__.py # Server module exports
β”œβ”€β”€ agentops_gym_environment.py # Core environment logic
β”œβ”€β”€ app.py # FastAPI application
└── Dockerfile # Container image definition
```