Spaces:
Sleeping
title: Agentops Gym Environment Server
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Agentops Gym: Optimizing Tool-Use Efficiency
"LLMs burn tokens via inefficient tool usage."
Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes efficiencyβpenalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption.
Quick Start
The simplest way to use the Agentops Gym environment is through the AgentopsGymEnv class:
from agentops_gym import AgentopsGymAction, AgentopsGymEnv
from agentops_gym.models import ToolCall
try:
# Create environment from Docker image
agentops_gymenv = AgentopsGymEnv.from_docker_image("agentops_gym-env:latest")
# Reset to start a task
result = agentops_gymenv.reset(task_id="task_1")
print(f"Task: {result.observation.task_description}")
# Use tools to complete the task
# Example: Search for a pattern
action = AgentopsGymAction(
tool_call=ToolCall(tool="Grep", parameters={"pattern": "json"})
)
result = agentops_gymenv.step(action)
print(f"Grep Result: {result.observation.last_tool_result}")
finally:
# Always clean up
agentops_gymenv.close()
Docker Build & Run
1. Build the Image
Build the environment server from the project root:
docker build -t agentops-gym -f agentops_gym/server/Dockerfile .
2. Run the Container
Start the server on port 8000:
# Remove existing container if necessary
docker stop agentops-gym && docker rm agentops-gym
# Run new container
docker run -d --name agentops-gym -p 8000:8000 agentops-gym
3. Verify & Logs
# Check health
curl http://localhost:8000/health
# Tail logs
docker logs -f agentops-gym
Run Baseline Inference
The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration).
Setup
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
export IMAGE_NAME=agentops-gym
# Optional overrides:
# export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
# export API_BASE_URL=https://router.huggingface.co/v1
Run
python agentops_gym/inference.py
Expected Output
============================================================
AgentOps Gym β Baseline Inference
Model: gpt-4.1 | Server: http://localhost:8000
============================================================
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_1 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null
[STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null
[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null
...
[STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null
[END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_2 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null
[STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null
[STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null
[END] success=true steps=3 rewards=0.05,0.10,0.55
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_3 env=agentops-gym model=gpt-4.1
...
[STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null
[END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20
ββββββββββββββββββββββββββββββββββββββββ
[START] task=task_4 env=agentops-gym model=gpt-4.1
[STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null
[STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null
[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null
[STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null
[STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null
[END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43
============================================================
BASELINE SUMMARY
============================================================
task_1 score=0.390 steps= 8 β FAIL
task_2 score=1.000 steps= 3 β
PASS
task_3 score=0.392 steps= 8 β FAIL
task_4 score=0.856 steps=10 β
PASS
Average score: 0.659
Solved: 2 / 4
============================================================
Environment Details
Action
AgentopsGymAction:
tool_call(ToolCall) - The tool to execute (Grep, FileRead, FileWrite, Bash, TodoWrite, Submit)reasoning(str, optional) - Agent's explanation for the action
Observation
AgentopsGymObservation:
task_description(str) - The task objectivevisible_files(list[str]) - Files discovered so farlast_tool_result(str) - Output of the last tool callaction_history(list[str]) - Previous actions in this episodestep_count(int) - Current step numbermax_steps(int) - Maximum allowed stepsdone(bool) - Whether the episode is completefeedback(str, optional) - Warnings or penalties from the environment
Available Tools
- Grep: Search for patterns in the virtual filesystem.
- FileRead: Read file contents.
- FileWrite: Modify file contents.
- Bash: Run simulated commands (lint, test).
- TodoWrite: Save a plan for the task.
- Submit: Submit the final answer.
Advanced Usage
Using the Context Manager
from agentops_gym import AgentopsGymAction, AgentopsGymEnv
from agentops_gym.models import ToolCall
with AgentopsGymEnv(base_url="http://localhost:8000") as env:
result = env.reset(task_id="task_1")
# Execute steps...
action = AgentopsGymAction(tool_call=ToolCall(tool="FileRead", parameters={"filename": "README.md"}))
result = env.step(action)
Running Locally
Run the server locally for development:
cd agentops_gym
uvicorn server.app:app --reload
Project Structure
agentops_gym/
βββ __init__.py # Module exports
βββ README.md # This file
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Project metadata and dependencies
βββ models.py # Action and Observation models
βββ server/
βββ __init__.py # Server module exports
βββ agentops_gym_environment.py # Core environment logic
βββ app.py # FastAPI application
βββ Dockerfile # Container image definition